The bug was that if two push operations where queued together in the tx queue,
and the first push awakes pending blpop, then the PollExecution function would continue with the
second push before switching to blpop, which contradicts the spec.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Partially implements #6.
Before, each shard lazily updated its clock used for the expiry evaluation.
Now, the clock value is set during the transaction scheduling phase and is assigned
to each transaction. From now on DbSlice methods use this value when checking whether
the entry is expired via passed DbContext argument.
Also, implemented transactionally consistent TIME command and
verify that time is the same during the transaction. See
https://ably.com/blog/redis-keys-do-not-expire-atomically for motivation.
Still have not implemented any lamport style updates for background processes
(not sure if it's the right way to proceed).
Before this change Dragonfly evicted items only when it was low on memory and had to grow its main dictionary.
It is not enough because in many cases Dragonfly can grow in memory even when the main dictionary does not grow.
For example, when its dictionary is less than 90% utilized, but the newly added objects require lots of memory.
In addition, the dashtable adds additional segments, when other segments have enough available slots to fill the rest of the free memory.
This change adds another layer of defense that allows evicting items even when dictionary segments are not full.
The eviction is still local with respect to a segment. On my tests it seems that it's much harder to cross maxmemory limit than before.
In addition, we tightened the heuristic that allowes the dashtable to grow. Now it takes into account the average bytes per item
in order to project how much memory the full table takes before adding to it new segments.
This really improves dashtable utilization.
There are still things to improve:
1. the eviction behavior is rough. Once an insert does the eviction it tries to free enough objects to bring memory back.
This may result in very spiky insertion/eviction patterns.
2. The eviction procedure, even though it's limited to a single segment, is quite heavy because it goes over all buckets
in the segment. This may result in weird artifacts where we evict just enough to be under the limit, then add and evict
again and so on.
3. Therefore, we may need a periodic eviction that will compliment this emergency eviction step.
Fixes#224 and partially addresses #256
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
1. No entries data is written into a journal yet.
2. Introduced a state machine to start and stop journal using a new auxillary command "dfly".
3. string_family currently calls an universal command Journal::RecordEntry that should
save the current key/value of that entry. Please note that we won't use it for all the operations
because for some it's more efficient to record the action itself than the final value.
4. no locking information is recorded yet so atomicity of multi-key operations is not preserved for now.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
In rare cases a scheduled transaction is not scheduled correctly and we need
to remove it from the tx-queue in order to re-schedule. When we pull it from tx-queue
and it has been located at the head, we must poll-execute the next txs in the queue.
1. Fix the bug.
2. Improve verbosity loggings to make it easier to follow up on tx flow in release mode.
3. Introduce /txz handler that shows currently pending transactions in the queue.
4. Fix a typo in xdel() function.
5. Add a py-script that reproduces the bug.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
1. Found dangling transaction pointers that where left in the watch queue. Fix the state machine there.
2. Improved transaction code a bit, merged duplicated code into RunInShard function, got rid of RunNoop.
3. Improved BPopper::Run flow.
4. Added 'DEBUG WATCH' command. Also 'DEBUG OBJECT' now returns shard id and the lock status of the object.
1. Stabilized naming scheme for snapshot files.
The behavior is determined by dbfilename flag.
By default it's 'dump'. Dragonfly checks if the flag has an extension
if not, it automatically saves timestamped files (dump*.rdb) and loads the latest file that is available.
In case the flag has extension like 'dump.rdb' it fallbacks to redis behavior of loading and saving to the
same file.
2. Updated the logo url.
This id one by shifting right slots in a stash bucket of the full segment.
In addition, I added eviction related stats to memory and stats section.
I also updated README with the changes. Finally, I added a flag that allows
to disable http-admin console in dragonfly.
1. Make EngineShardSet to be a process singleton instead of a variable passed everywhere.
2. Implement a heuristic of updating free memory per shard.
3. Make sure that SET command returns OOM when a database reaches its limits.
The heart of this feature is ability to bump up entries that has been searched for.
When we bump up an entry withing a dash-table it's moved out of stash buckets or to lower slot ids
within normal buckets. The entry they replace is moved to the place of bumped-up entries.
The next change will be to implement the actual eviction heuristic that will evict keys from the stash buckets.
This will give stash buckets an addition responsibility for caching mode - they will serve as a probation buffer
that holds low-priority or newly added items.
1. Add ExternalAllocator that provides files ranges to write to.
2. Add IoMgr that wraps files and allows sending asynchronous requests.
3. Add POC that writes string values when a new entry is added.
The original values kept are still kept in memory.
1. Fix crash when calling BLPOP on the same key several times.
2. Extend RENAME functionality to cover all data-types.
Before that it worked only for strings and that also was incorrect.
Bring back application level used-memory tracking.
Use internal mimalloc api for extracting comitted memory stats.
This is much better performance-wise because calling mi_heap_visit_blocks
becomes very slow for larger database sizes.