* chore: change Namespaces to be a global pointer
Before the namespaces object was defined globally.
However it has non-trivial d'tor that is being called after main exits.
It's quite dangerous to have global non-POD objects being defined globally.
For example, if we used LOG(INFO) inside the Clear function , that would crash dragonfly on exit.
Ths PR changes it to be a global pointer.
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
There are some problematic flows. First we did not handle deletions, so all sorts of consistency issues could arise while calling DbSlice::Traverse() and DbSlice::Del(). Second, we did not handle FlushAll (same as before, Traverse() preempts and FlushAll() kicks in. Third we did not handle expirations.
---------
Signed-off-by: kostas <kostas@dragonflydb.io>
* chore: add timeout fo replication sockets
Master will stop the replication flow if writes could not progress for more than K millis.
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Signed-off-by: Roman Gershman <romange@gmail.com>
Co-authored-by: Shahar Mike <chakaz@users.noreply.github.com>
* chore: reorganize EngineShard::Heartbeat
1. Simplify CacheStats by using accessorts directly provided by DbSlice
2. Separate eviction for tiering as tiering can be done on replica.
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Now unit tests will run the same Hearbeat fiber like in prod.
The whole feature was redundant, with just few explicit settings of maxmemory_limit
I succeeeded to make all unit tests pass.
In addition, this change allows passing a global handler that is called by heartbeat from a single thread.
This is not used yet - preparation for the next PR to break hung up replication connections on a master.
Finally, this change has some non-functional clean-ups and warning fixes to improve code quality.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
1. Add background offloading stats
2. remove direct_fd override - helio is already updated with default=false, so it's not needed anymore.
3. remove redundant tiered_storage_memory_margin flag
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
DastTable::Traverse is error prone when the callback passed preempts because the segment might change. This is problematic and we need atomicity while traversing segments with preemption. The fix is to add Traverse in DbSlice and protect the traversal via ThreadLocalMutex.
* add ConditionFlag to DbSlice
* add Traverse in DbSlice and protect it with the ConditionFlag
* remove condition flag from snapshot
* remove condition flag from streamer
---------
Signed-off-by: kostas <kostas@dragonflydb.io>
1. Fully support tiered_experimental_cooling for all operations
2. Offset cool storage usage when computing memory pressure situations in Hearbeat.
3. Introduce realtime entry counting per db_slice and provide DCHECK to verify it vs the old approach.
Later we will switch to realtime entry and free memory computations when computing bytes per object,
and remove the old approach in CacheStats().
4. Show hit rate during the run of dfly_bench loadtest.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
1. Use introsive::list for CoolQueue.
2. Make sure that we ignore cool memory usage when computing average object size to
prevent evictions during dashtable growth attempts.
3. Remove items from the cool storage before evicting them from the dash table.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* chore: reenable evictions upon insertion to avoid OOM rejections
Before: when running dragonfly with --cache_mode we could get OOM rejections
even though the eviction policy allowed to evict items to free memory.
Ideally, dragonfly in cache mode should not respond with the OOM error.
This PR reuses the same Eviction step we have in the Heartbeat and conditionally applies it
during the insertion. In my test the OOM errors went from 500K to 0 and the server
still respected memory limit.
Also, remove the old heuristics that has never been used.
Test:
./dfly_bench --key_prefix=bar: -d 1024 --ratio=1:0 --qps=200 -n 3000
./dragonfly --dbfilename= --proactor_threads=2 --maxmemory=600M --cache_mode
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* chore: introduce a cool queue that gradually retires cool items
This PR introduces a new state in which the offloaded value is not freed from memory but instead stays
in the cool queue.
Upon Read we convert the cool value back to hot table and delete it from storage.
When we low on memory we retire oldest cool values until we are above the threshold.
The PR does not fully finish the feature but it is workable enough to start (load)testing.
Missing:
a) Handle Modify operations
b) Retire cool items in more cases where we are low on memory. Specifically, refrain from evictions as long as cool items exist.
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* chore: simplify computation of used_mem_current
Before - each thread updated its own variable and then,
the global "used_mem_current" was updated by summing used memory from each thread.
Now, each thread updates used_mem_current directly. The code is simpler and also provides more precise
results more frequently.
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* chore: clean up TaskQueue since we do not need multiple fibers for it
Implement TaskQueue as a wrapper around FiberQueue.
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* feat(namespaces): Initial support for multi-tenant #3050
This PR introduces a way to create multiple, separate and isolated
namespaces in Dragonfly. Each user can be associated with a single
namespace, and will not be able to interact with other namespaces.
This is still experimental, and lacks some important features, such as:
* Replication and RDB saving completely ignores non-default namespaces
* Defrag and statistics either use the default namespace or all
namespaces without separation
To associate a user with a namespace, use the `ACL` command with the
`TENANT:<namespace>` flag:
```
ACL SETUSER user TENANT:namespace1 ON >user_pass +@all ~*
```
For more examples and up to date info check
`tests/dragonfly/acl_family_test.py` - specifically the
`test_namespaces` function.
There are no functional changes in this PR.
ReportXXX functions are renamed to NotifyXXX
Some functions were moved to private, and some pulled out from the class as being stateless.
This is preparational change before doing changes in the tiered storage code.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* chore: get rid of lock keys
1. Introduce LockTag a type representing the part of the key that is used for locking.
2. Hash keys once in each transaction.
3. Expose swap_memory_bytes metric.
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
The main change here is introduction of the strong type LockTag
that differentiates from a string_view key.
Also, some testing improvements to improve the footprint of the next PR.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* chore: LockTable tracks fingerprints of keys
It's a first step that will probably simplify dependencies in many places
where we need to keep key strings for that. A second step will be to reduce the CPU load
of multi-key operations like MSET and precompute Fingerprints once.
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
A self-laundering iterator will enable us to, eventually, yield from fibers while holding an iterator. For example:
```cpp
auto it1 = db_slice.Find(...);
Yield(); // Until now - this could have invalidated `it1`
auto it2 = db_slice.Find(...);
```
Why is this a good idea? Because it will enable yielding inside PreUpdate() which will allow breaking down of writing huge entries in small quantities to disk/network, eliminating the need to allocate huge chunks of memory just for serialization.
Also, it'll probably unlock future developments as well, as yielding can be useful in other contexts.
This commit generalizes the machanism of running transaction callbacks during scheduling, removing the need for specialized ScheduleUniqueShard/RunQuickie. Instead, transactions can be run now during ScheduleInShard - called "immediate" runs - if the transaction is concluding and either only a single shard is active or the operation can be safely repeated if scheduling failed (idempotent commands, like MGET).
Updates transaction stats to mirror the new changes more closely.
---------
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
1. Replaces run_barrier as a synchronization point with is_armed + an embedded blocking counter for awaiting running jobs
2. Replaces IsArmedInShard + GetLocalMask + is_armed.exchange chain with a single DisarmInShard() / DisarmInShardWhen
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
This should reduce allocations in a common case (not multi).
In addition, rename Transaction::args_ to kv_args_.
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
Co-authored-by: Vladislav <vlad@dragonflydb.io>