Commit graph

122 commits

Author SHA1 Message Date
Shahar Mike
3fdd9877f4
fix: Do not bump elements during RDB load (#4507)
fix: Do not bump elements during RDB load #4507

The Issue

Before this PR, when loading an RDB, we modified fetched_items_ as part of the loading process. This has little effect, unless the next issued command calls FLUSHALL / FLUSHDB (could happen in DFLY LOAD, REPLICAOF or just calling FLUSHALL directly). In such a case, a CHECK() fails.

The Fix

While load is not run as a command (in a transaction), it still uses APIs that assume that they are called in the context of a command. As such, it indirectly used DbSlice::FindInternal(), which bumps elements when called.

This PR adds another sub-mode to DbSlice, named load_in_progress_. When true, we treat DbSlice as if it is not in cache mode, ignoring cache_mode_.

BTW this PR also renames caching_mode_ to cache_mode_ as we generally use the term cache mode and not caching mode, including in the --cache_mode flag.

Fixes #4497
2025-01-26 23:18:28 +02:00
Borys
933c9f0b1c
refactor: remove transaction lib on cluster code dependency (#4417) 2025-01-08 09:38:13 +00:00
adiholden
a3ef239ac7
feat(server): refactor allow preempt on journal record (#4393)
* feat server: refactor allow preempt on journal record

Signed-off-by: adi_holden <adi@dragonflydb.io>
2025-01-02 12:16:21 +02:00
Roman Gershman
7a68528022
chore: minor refactorings around dense_set deletions (#4390)
chore: refactorings around deletions

Done as a preparation to introduce asynchronous deletions for sets/zsets/hmaps.
1. Restrict the interface around DbSlice::Del. Now it requires for the iterator to be valid and the checks should
be explicit before the call. Most callers already provides a valid iterator.

2. Some minor refactoring in compact_object_test.
3. Expose DenseSet::ClearStep to allow iterative deletions of elements.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2025-01-02 11:35:55 +02:00
Roman Gershman
c22c9448b5
fix: do not check-fail in OpRestore (#4332)
fix: do not check-fail OpRestore

In some rare cases we reach inconsistent state inside OpRestore where a key already exists, though it should not.
In that case log the error instead of crashing the server. In addition, we update the existing entry to the latest restored value.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-12-18 09:53:03 +02:00
Stepan Bagritsevich
5483d1d05e
fix(eviction): Tune eviction threshold in cache mode (#4142)
* fix(eviction): Tune eviction threshold in cache mode

fixes #4139

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>

* refactor: small fix in tiered_storage_test

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>

* refactor: address comments

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>

* chore(dragonfly_test): Remove ResetService

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>

* refactor: fix test_cache_eviction_with_rss_deny_oom test

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>

* refactor: address comments

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>

* fix(dragonfly_test): Fix DflyEngineTest.Bug207

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>

* fix(dragonfly_test): Increase string size in the test Bug207

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>

* refactor: address comments 3

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>

* refactor: address comments 4

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>

* fix: Fix failing tests

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>

* refactor: address comments 5

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>

* refactor: resolve conficts

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>

---------

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>
2024-12-05 12:26:59 +00:00
Kostas Kyrimis
267d5ab370
chore: remove DbSlice mutex and add ConditionFlag in SliceSnapshot (#4073)
* remove DbSlice mutex
* add ConditionFlag in SliceSnapshot
* disable compression when big value serialization is on
* add metrics

---------

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-12-05 13:24:23 +02:00
Kostas Kyrimis
d8fda40d4d
chore: split RecordExpiry preemptive and non-preemptive flows (#4252)
* add FiberGuard to RecordExpiry for non-preemptive flows

---------

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-12-04 15:46:00 +02:00
Roman Gershman
ab6088f5d6
chore: simplify BumpUps deduplication (#4098)
* chore: simplify BumpUps deduplication

This pr #2474 introduced iterator protection by
tracking which keys where bumped up during the transaction operation.
This was done by managing keys view set. However, this can be simplified
using fingerprints. Also, fingerprints do not require that the original keys exist.

In addition, this #3241 PR introduces FetchedItemsRestorer that tracks bumped set and
saves it to protect against fiber context switch. My claim is that it's redundant.
Since we only keep the auto-laundering iterators, when a fiber preempts these iterators recognize it
(see IteratorT::LaunderIfNeeded) and refresh themselves anyway.

To summarize: fetched_items_ protect us from iterator invalidation during atomic scenarios,
and auto-laundering protects us from everything else, so fetched_items_ can be cleared in that case.
---------

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-11-13 17:27:53 +02:00
Kostas Kyrimis
ed11c8d3a4
chore: allow config set notify_keyspace_events (#3790)
We do not allow notify_keyspace_events to be set at runtime via config set command.

* allow notify_keyspace_events in config set command
* add tests

---------

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-09-30 09:54:02 +03:00
Kostas Kyrimis
41f7b611d0
chore: enable -Werror=thread-safety and add missing annotations (part 2/2) (#3595)
* add missing annotations
* small mutex fixes
* enable -Werror=thread-safety for clang builds

---------

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-08-30 15:42:30 +03:00
Kostas Kyrimis
839b1be82d
chore: add -Wthread-analysis and annotate (part 1/2) (#3502)
* enable -Wthread-analysis
* add missing annotations
* small fixes

---------

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-08-26 18:22:38 +03:00
Kostas Kyrimis
9c3d69e0ec
fix: delete empty dense sets in HGetGeneric (#3543)
* remove DelEmptyPrimeValue
* delete empty dense set in HGetGeneric
* const qualify FindReadOnly

---------

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-08-26 11:33:57 +03:00
Roman Gershman
caf677ea76
fix: string compatibility issues (#3564)
1. strlen should return 0 for non existing types.
2. reject both EX and PX options in SET
3. prevent overflow of expiry time that is too large

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-08-25 08:08:14 +03:00
Kostas Kyrimis
b979994025
fix: skip empty objects on load and replication (#3514)
* skip empty objects in rdb save
* skip empty objects in rdb load
* delete empty keys in FindReadOnly

---------

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-08-20 09:44:41 +03:00
Kostas Kyrimis
1c9e9c5922
fix: big value serialization corner cases (#3430)
There are some problematic flows. First we did not handle deletions, so all sorts of consistency issues could arise while calling DbSlice::Traverse() and DbSlice::Del(). Second, we did not handle FlushAll (same as before, Traverse() preempts and FlushAll() kicks in. Third we did not handle expirations.

---------

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-08-11 14:17:32 +03:00
Roman Gershman
7c84b8e524
chore: change how we track memory_budget during evictions (#3457)
* chore: change how we track memory_budget during evictions

We compared memory_budget vs 0 before inserting a new item in DbSlice,
and retired cool pages if we are low on memory.

The problem - when we decide whether we allow growing a table, we estimate the possible object size increase due to the future table growth.
And the memory check described before was not consistent with the actual logic that rejected the insertion.

Moreover, the memory_budget tracking interaction with EvictionPolicy was over-complicated: we passed the memory_budget counter to the evp object
and then read it back, even though evp did not track object deletions memory impact during objects evictions.

Now, we remove the responsibility from evp to update memory_budget_ so it's solely updated by DbSlice.
We also update memory_budget_ during deletions, and when we pass it to evp, we add cool memory size as potential memory resource to avoid
rejections in case we have lots of cool memory.

Fixes #3456
---------

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-08-07 15:43:45 +03:00
Kostas Kyrimis
aa02070e3d
chore: add db_slice lock to protect segments from preemptions (#3406)
DastTable::Traverse is error prone when the callback passed preempts because the segment might change. This is problematic and we need atomicity while traversing segments with preemption. The fix is to add Traverse in DbSlice and protect the traversal via ThreadLocalMutex.

* add ConditionFlag to DbSlice
* add Traverse in DbSlice and protect it with the ConditionFlag
* remove condition flag from snapshot
* remove condition flag from streamer

---------

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-07-30 15:02:54 +03:00
Roman Gershman
6b67f44e29
chore: tiering - make Modify work with cool storage (#3395)
1. Fully support tiered_experimental_cooling for all operations
2. Offset cool storage usage when computing memory pressure situations in Hearbeat.
3. Introduce realtime entry counting per db_slice and provide DCHECK to verify it vs the old approach.
   Later we will switch to realtime entry and free memory computations when computing bytes per object,
   and remove the old approach in CacheStats().
4. Show hit rate during the run of dfly_bench loadtest.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-07-27 14:31:29 +03:00
Roman Gershman
0a26a06065
chore: tiered fixes (#3393)
1. Use introsive::list for CoolQueue.
2. Make sure that we ignore cool memory usage when computing average object size to
   prevent evictions during dashtable growth attempts.
3. Remove items from the cool storage before evicting them from the dash table.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-07-25 23:38:44 +03:00
Kostas Kyrimis
79d7f57b67
fix: disable inline transactions when db_slice has registered callbacks (#3391)
Inline transactions do not acquire any locks and therefore they should not preempt. This is no longer true when db_slice has registered callbacks.

* disable inline transactions when db_slice has registered callbacks

---------

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-07-25 16:06:35 +00:00
Kostas Kyrimis
a95cf2e857
chore: do not preempt on db_slice::RegisterOnChange (#3388)
For big value serialization it is required to support preemption when db_slice::RegisterOnChange is called to avoid UB when a code path is iterating over the change_cb_ and preempts because it serializes a big value. As this is problematic and can lead to data inconsistencies I replace the std::vector with std::list and bound the iteration of change_cb_ on paths that preempt.

* replace std::vector with std::list for change_cb_
* bound iteration of change_cb_ on paths that preempt

---------

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-07-25 16:08:02 +03:00
Kostas Kyrimis
4b851be57a
fix: remove fiber guard from non atomic section (#3381)
We might preempt when we serialize a big value and the code in journal was protected by an atomic guard triggering a check failed.

* remove fiber guard from non atomic section
* move LocalBlockingCounter to common

---------

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-07-25 16:06:35 +03:00
Roman Gershman
e2d65a0900
chore: reenable evictions upon insertion to avoid OOM rejections (#3387)
* chore: reenable evictions upon insertion to avoid OOM rejections

Before: when running dragonfly with --cache_mode we could get OOM rejections
even though the eviction policy allowed to evict items to free memory.
Ideally, dragonfly in cache mode should not respond with the OOM error.

This PR reuses the same Eviction step we have in the Heartbeat and conditionally applies it
during the insertion. In my test the OOM errors went from 500K to 0 and the server
still respected memory limit.

Also, remove the old heuristics that has never been used.

Test:

./dfly_bench --key_prefix=bar: -d 1024 --ratio=1:0 --qps=200 -n 3000
./dragonfly --dbfilename=  --proactor_threads=2 --maxmemory=600M --cache_mode

---------

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-07-25 15:28:57 +03:00
Roman Gershman
03b3f86aed
chore: Track db_slice table memory instantly (#3375)
We update table_memory upon each deletion and insertion of an element.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-07-24 14:13:08 +03:00
Vladislav
f81a893368
chore(tiering): Range functions + small refactoring (#3207) 2024-07-22 18:36:11 +03:00
Kostas Kyrimis
93ae2ef0e6
chore: small rename and add dcheck on LocalBlockingCounter (#3356)
* add suffix _ to members
* add dcheck in destructor

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-07-22 06:40:57 +00:00
Kostas Kyrimis
8a2d6ad1f4
fix: ub in RegisterOnChange and regression tests for big values (#3336)
* fix replication test flag name for big values
* fix a bug that triggers ub when RegisterOnChange is called on flows that iterate over the callbacks and preempt
* add a stress test for big value serialization

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-07-19 07:03:17 +00:00
Vladislav
22756eeb81
fix(migration): Use transactions! (#3266)
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-07-16 14:06:34 +03:00
Shahar Mike
d7351b315e
refactor: Use DbContext, OpArgs and Transaction to access DbSlice (#3311)
This is a refactor that will put us closer to adding namespaces, see
included `docs/namespaces.md`
2024-07-12 08:13:16 +03:00
Roman Gershman
fba902d0ac
fix: properly clean tiered state upon flash (#3281)
* fix: properly clean tiered state upon flash

The bug was around io pending entries that have not been properly cleaned during flush.
This PR simplified the logic around tiered storage handling during flush, it always performs the
cleaning in the synchronous part of the command.

In addition, this PR improves error logging in tests if dragonfly process exits with an error.
Finally, a test is added that makes sure pending tiered items are flushed during the flash call.

Fixes #3252
---------

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-07-08 10:43:11 +03:00
Vladislav
4357933775
feat(server): expiry notifications (#3154)
Adds basic support for keyspace notifications, only Ex

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-06-24 16:23:40 +03:00
Borys
d75c79ce5c
fix: fix RegisterOnChange methods for journal and db_slice (#3171)
* fix: fix RegisterOnChange methods for journal and db_slice. Call db_slice and journal callbacks atomically. Made a hack to avoid deadlock during SAVE
2024-06-20 12:37:37 +03:00
Vladislav
137bd313ef
fix(server): Sync FLUSH with tiering (#3098)
* fix(server): Sync FLUSH with tiering

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-05-30 16:50:12 +03:00
Vladislav
08983c181f
chore: small tiering fixes (#2966)
* chore: tiering fixes

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-05-02 18:00:46 +00:00
Vladislav
82dd05fe30
chore: Remove TieringV1 (#2962)
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-04-26 11:01:05 +03:00
Roman Gershman
89b1d7d52a
chore: Introduce ShardArgs as a distinct type (#2952)
Done in preparation to make ShardArgs a smart iterable type,
but currently it's just a wrapper aroung ArgSlice.
Also refactored common.{h,cc} into tx_base.{h,cc}

In addition, fixed a bug in key tracking, where we wrongly created weak_ref
in a shard thread instead of doing this in the coordinator thread.
Finally, identified another bug (not fixed yet) where we track all the arguments
instead of tracking keys only.

Besides this, no functional changes around the moved code.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-04-24 13:36:34 +03:00
Borys
2230397a12
refactor: add cluster namespace (#2948)
* refactor: add cluster namespace, remove extra includes
2024-04-22 21:45:43 +03:00
Roman Gershman
2ff7ff9841
chore: get rid of lock keys (#2894)
* chore: get rid of lock keys

1. Introduce LockTag a type representing the part of the key that is used for locking.
2. Hash keys once in each transaction.
3. Expose swap_memory_bytes metric.

---------

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-04-21 11:34:42 +03:00
Roman Gershman
8030ee96b5
chore: preparation step for lock fingerprints (#2899)
The main change here is introduction of the strong type LockTag
that differentiates from a string_view key.

Also, some testing improvements to improve the footprint of the next PR.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-04-16 19:23:50 +03:00
Roman Gershman
da5c51d1dd
chore: LockTable tracks fingerprints of keys (#2839)
* chore: LockTable tracks fingerprints of keys

It's a first step that will probably simplify dependencies in many places
where we need to keep key strings for that. A second step will be to reduce the CPU load
of multi-key operations like MSET and precompute Fingerprints once.

---------

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-04-10 17:52:53 +03:00
Shahar Mike
54c9633cb8
feat(dbslice): Add self-laundering iterator in DbSlice (#2815)
A self-laundering iterator will enable us to, eventually, yield from fibers while holding an iterator. For example:

```cpp
auto it1 = db_slice.Find(...);
Yield();  // Until now - this could have invalidated `it1`
auto it2 = db_slice.Find(...);
```

Why is this a good idea? Because it will enable yielding inside PreUpdate() which will allow breaking down of writing huge entries in small quantities to disk/network, eliminating the need to allocate huge chunks of memory just for serialization.

Also, it'll probably unlock future developments as well, as yielding can be useful in other contexts.
2024-04-09 12:00:52 +03:00
adiholden
2ad7439128
feat(server): support cluster replication (#2748)
* feat(server): support cluster replication

Signed-off-by: adi_holden <adi@dragonflydb.io>
2024-03-26 15:26:19 +02:00
adiholden
32e8d49123
feat(tiering): add background offload step (#2504)
* feat(tiering): add background offload step

Signed-off-by: adi_holden <adi@dragonflydb.io
2024-02-14 14:28:41 +02:00
Roman Gershman
d608ec9c62
chore: Introduce LockKey for LockTable (#2463)
This should reduce allocations in a common case (not multi).
In addition, rename Transaction::args_ to kv_args_.

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
Co-authored-by: Vladislav <vlad@dragonflydb.io>
2024-01-28 12:19:15 +02:00
adiholden
9f4c4353b5
fix(server): mget crash on same key get (#2474)
* fix(server): mget crash on same key get

fix: #2465
the bug: on cache mode mget bumps up items. When executing mget with the same key several times i.e mget key key we will invalidate the iterator when we bump up the item in dash table.
the fix: bump up/down items only once by using bumped_items set
This PR also reverts c225113
and updates the bumped stats and bumped_items set if the item was bumped

Signed-off-by: adi_holden <adi@dragonflydb.io>
2024-01-28 11:45:35 +02:00
Kostas Kyrimis
517be2005e
refactor: return OpResult in DbSlice::AddOrFind instead of throwing std::bad_alloc (#2427)
* return OpResult in AddOrFind instead of throwing bad_alloc
* small refactor
2024-01-23 14:16:03 +02:00
Yue Li
d1db48d9d4
feat(server): Tracking memory usage for client tracking table (#2431)
Tracking memory usage for client tracking table using C++ memory resource and polymorphic allocator.
2024-01-17 13:20:23 -08:00
Vladislav
bf89c7eac2
chore(transaction): Clean up scheduling code (#2422)
* chore(transction): Clean scheduling code
2024-01-17 17:33:48 +03:00
adiholden
9f3b118b87
server(tiering): load data on read (#2415)
* server(tiering): load data on read

Signed-off-by: adi_holden <adi@dragonflydb.io>
2024-01-17 16:13:56 +02:00