Commit graph

153 commits

Author SHA1 Message Date
Kostas Kyrimis
b1063f7823
refactor client tracking, fix atomicity, squashing and multi/exec (#2970)
* add partial support for CLIENT CACHING TRUE (only to be used with TRACKING OPTIN)
* add OPTIN to CLIENT TRACKING command
* refactor client tracking to respect transactional atomicity
* fixed multi/exec and disabled squashing with client tracking
* add tests
2024-06-03 22:14:30 +03:00
Vladislav
137bd313ef
fix(server): Sync FLUSH with tiering (#3098)
* fix(server): Sync FLUSH with tiering

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-05-30 16:50:12 +03:00
Vladislav
68d1a8680c
fix(tiering): Async delete for small bins (#3068)
* fix(tiering): Async delete for small bins

---------

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-05-28 12:08:59 +03:00
Vladislav
9e3748421b
fix(tiering): rename v2 + max_file_size (#3004)
* fix: rename + max_file_size

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-05-03 21:36:18 +03:00
Vladislav
08983c181f
chore: small tiering fixes (#2966)
* chore: tiering fixes

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-05-02 18:00:46 +00:00
Roman Gershman
9bda5b1d4b
chore: another preparation commit to get rid of kv_args in transaction (#2996)
This changes Entry::Payload to struct instead of variant.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-05-02 09:59:45 +03:00
Vladislav
82dd05fe30
chore: Remove TieringV1 (#2962)
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-04-26 11:01:05 +03:00
adiholden
d5cd0ed204
fixes for v1.18.0 (#2956)
* fix server: change table_growth_margin default value

---------

Signed-off-by: adi_holden <adi@dragonflydb.io>
2024-04-24 18:21:14 +03:00
Roman Gershman
89b1d7d52a
chore: Introduce ShardArgs as a distinct type (#2952)
Done in preparation to make ShardArgs a smart iterable type,
but currently it's just a wrapper aroung ArgSlice.
Also refactored common.{h,cc} into tx_base.{h,cc}

In addition, fixed a bug in key tracking, where we wrongly created weak_ref
in a shard thread instead of doing this in the coordinator thread.
Finally, identified another bug (not fixed yet) where we track all the arguments
instead of tracking keys only.

Besides this, no functional changes around the moved code.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-04-24 13:36:34 +03:00
Borys
2230397a12
refactor: add cluster namespace (#2948)
* refactor: add cluster namespace, remove extra includes
2024-04-22 21:45:43 +03:00
Roman Gershman
2ff7ff9841
chore: get rid of lock keys (#2894)
* chore: get rid of lock keys

1. Introduce LockTag a type representing the part of the key that is used for locking.
2. Hash keys once in each transaction.
3. Expose swap_memory_bytes metric.

---------

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-04-21 11:34:42 +03:00
Roman Gershman
8030ee96b5
chore: preparation step for lock fingerprints (#2899)
The main change here is introduction of the strong type LockTag
that differentiates from a string_view key.

Also, some testing improvements to improve the footprint of the next PR.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-04-16 19:23:50 +03:00
Vladislav
4fe00a071e
chore(tiering): Update Get, Set, Del (#2897)
* chore(tiering): Update Get, Set and Del


---------

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-04-16 19:20:24 +03:00
Roman Gershman
da5c51d1dd
chore: LockTable tracks fingerprints of keys (#2839)
* chore: LockTable tracks fingerprints of keys

It's a first step that will probably simplify dependencies in many places
where we need to keep key strings for that. A second step will be to reduce the CPU load
of multi-key operations like MSET and precompute Fingerprints once.

---------

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-04-10 17:52:53 +03:00
Shahar Mike
54c9633cb8
feat(dbslice): Add self-laundering iterator in DbSlice (#2815)
A self-laundering iterator will enable us to, eventually, yield from fibers while holding an iterator. For example:

```cpp
auto it1 = db_slice.Find(...);
Yield();  // Until now - this could have invalidated `it1`
auto it2 = db_slice.Find(...);
```

Why is this a good idea? Because it will enable yielding inside PreUpdate() which will allow breaking down of writing huge entries in small quantities to disk/network, eliminating the need to allocate huge chunks of memory just for serialization.

Also, it'll probably unlock future developments as well, as yielding can be useful in other contexts.
2024-04-09 12:00:52 +03:00
Borys
84d451fbed
fix: #2745 don't start migration process again after apply the same the same config is applied (#2822)
* fix: #2745 don't start a migration process again after the same config is applied
refactor: remove extra includes
2024-04-03 10:21:27 +03:00
Shahar Mike
1d04683c48
fix(cluster): Don't miss updates in FLUSHSLOTS (#2783)
* fix(flushslots): Don't miss updates in `FLUSHSLOTS`

This PR registers for PreUpdate() from inside the `FLUSHSLOTS` fiber so
that any attempt to update a to-be-deleted key will work as expected
(first delete, then apply the change).

This fixes several issues:

* Any attempt to touch bucket B (like insert a key), where another key
  in B should be removed, caused us to _not_ remove the latter key
* Commands which use an existing value but not completely override then,
  like `APPEND` and `LPUSH` did not treat the key as removed but instead
  used the original value

Fixes #2771

* fix flushslots syntax in test

* EXPECT_EQ(key:0, xxxx)

* dbsize
2024-03-31 15:47:38 +03:00
Roman Gershman
9e23f85e6b
chore: expose SBF via compact_object (#2797)
* chore: expose SBF via compact_object
---------

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-03-30 22:35:22 +03:00
Kostas Kyrimis
370f334baf
chore: remove duplicate code from dash and simplify (#2765)
* rename all Policy members for consistency
* remove duplicate code
2024-03-29 11:14:58 +02:00
Vladislav
c8724adddf
chore: Fix memcached flags not updated (#2787)
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-03-29 12:10:58 +03:00
Kostas Kyrimis
4025b4a6af
fix: fiber preempts on read path and OnCbFinish() clears fetched_items_ (#2763)
* cache fetched_items_ before preemption such that OnCbFinish does not affect it
2024-03-26 16:38:47 +02:00
adiholden
2ad7439128
feat(server): support cluster replication (#2748)
* feat(server): support cluster replication

Signed-off-by: adi_holden <adi@dragonflydb.io>
2024-03-26 15:26:19 +02:00
Roman Gershman
954780edd1
Remove check-fail in ExpireIfNeeded and introduce DFLY LOAD (#2699)
* chore: prevent crashing upon inconsistent expiry table

Also, introduce "DFLY LOAD <filename>" command in addition to "DEBUG LOAD"
as an official command to load snapshots into the running server.


---------

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-03-12 16:54:13 +02:00
Roman Gershman
b38024ba4f
chore: add malloc-based stats and decommit (#2692)
* chore: add malloc-based stats and decommit

Provides more stats and control with glibc-malloc based allocator.
For example,
with v1.15.0 (--proactor_threads=2), empty database, `info memory`returns

```
used_memory:614576
used_memory_human:600.2KiB
used_memory_peak:614576
used_memory_peak_human:600.2KiB
used_memory_rss:19922944
used_memory_rss_human:19.00MiB
```

then during `memtier_benchmark  -n 300000  --key-maximum 100000 --ratio 0:1 --threads=30 -c 100` (i.e GET-only with 3k connections):

```
used_memory:614576
used_memory_human:600.2KiB
used_memory_peak:614576
used_memory_peak_human:600.2KiB
used_memory_rss:59985920
used_memory_rss_human:57.21MiB
used_memory_peak_rss:59985920
```

connections overhead grows by ~39MB.
when the traffic stops, `used_memory_rss_human` becomes `30.35MiB`
and we do not know where 11MB gets lost and `MEMORY DECOMMIT` does not reduce the RSS.

With this change, `memory malloc-stats` return during the memtier traffic
```
malloc arena: 394862592
malloc fordblks: 94192
```
i.e. 395MB virtual memory was allocated by malloc and only 94KB is chunks available for reuse.
395MB is arena virtual memory, and not RSS obviously, but at least we have some visibility into malloc reservations.
The RSS usage is the same ~57MB and the difference between virtual and RSS is due to the fact we reserve fiber stacks of size 131KB but we touch less.
After the traffic stops, `arena` is reduced to 134520832 bytes, and fordblks are 133016592, i.e. majority of reserved ranges are also free (available to reuse) in the malloc pools.
RSS goes down similarly to before to ~31MB.

So far, this PR only demonstrated the increased visibility to mmapped ranges reserved by glibc malloc.
The additional functional change is in `MEMORY DECOMMIT` that now trims malloc RSS usage from reserved but unused (fordblks) pages
by calling `malloc_trim`.
After the call, RSS is: `used_memory_rss_human:20.29MiB` which is almost the same as when we started the empty process.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>

* chore: fix build for older glibc environments

Disable these extensions for alpine and use legacy version
for older glibc libraries.
---------

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-03-06 13:11:44 +00:00
Shahar Mike
35b0ab101e
fix(flushall): Decommit memory after releasing tables. (#2691)
In the fiber we used to call `mi_heap_collect()` when we're done
deleting items. But since that fiber captures a `vector` of intrusive
pointers to `DbTable`s, it can't free all memory used by the tables
themselves.

A local test shows that this fix helps almost entirely: when occupying a
5gb DB, `FLUSHALL` will reduce RSS by 4.7gb, leaving 300mb still used. A
follow up `MEMORY DECOMMIT` *will* indeed remove these 300mb, but I'm
still not sure why they are not released immediately. Still looking...

Addresses (1) of #2690
2024-03-05 15:45:13 +02:00
adiholden
7c443f3a15
feat(server): introduce table_growth_margin flag (#2678)
* feat(server): introduce table_growth_margin flag

Signed-off-by: adi_holden <adi@dragonflydb.io>
2024-03-03 13:02:18 +00:00
Borys
8771ab32a6
refactor: create one type for slots set #2459 (#2645)
* refactor: create one type for slot ranges #2459
2024-02-23 14:10:42 +02:00
Roman Gershman
fa75360227
chore: get rid of object.c and robj* in cc code (#2610)
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-02-18 16:52:23 +02:00
adiholden
32e8d49123
feat(tiering): add background offload step (#2504)
* feat(tiering): add background offload step

Signed-off-by: adi_holden <adi@dragonflydb.io
2024-02-14 14:28:41 +02:00
Roman Gershman
4000adf57f
fix: do not migrate during connection close (#2570)
* fix: do not migrate during connection close

Fixes #2569
Before the change we had a corner case where Dragonfly would call
OnPreMigrateThread but would not call CancelOnErrorCb because OnBreakCb has already been called
(it resets break_cb_engaged_)

On the other hand in OnPostMigrateThread we called RegisterOnErrorCb if breaker_cb_ which resulted in double registration.
This change simplifies the logic by removing break_cb_engaged_ flag since CancelOnErrorCb is safe to call if nothing is registered.
Moreover, we now skip Migrate flow if a socket is being closed.

---------

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-02-12 16:03:34 +02:00
adiholden
503891b1fa
fix(server): update post updater iterator in tiering (#2497)
* fix(server): update post updater iterator in tiering

Signed-off-by: adi_holden <adi@dragonflydb.io>
2024-01-30 11:46:00 +00:00
Roman Gershman
d608ec9c62
chore: Introduce LockKey for LockTable (#2463)
This should reduce allocations in a common case (not multi).
In addition, rename Transaction::args_ to kv_args_.

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
Co-authored-by: Vladislav <vlad@dragonflydb.io>
2024-01-28 12:19:15 +02:00
adiholden
9f4c4353b5
fix(server): mget crash on same key get (#2474)
* fix(server): mget crash on same key get

fix: #2465
the bug: on cache mode mget bumps up items. When executing mget with the same key several times i.e mget key key we will invalidate the iterator when we bump up the item in dash table.
the fix: bump up/down items only once by using bumped_items set
This PR also reverts c225113
and updates the bumped stats and bumped_items set if the item was bumped

Signed-off-by: adi_holden <adi@dragonflydb.io>
2024-01-28 11:45:35 +02:00
Kostas Kyrimis
517be2005e
refactor: return OpResult in DbSlice::AddOrFind instead of throwing std::bad_alloc (#2427)
* return OpResult in AddOrFind instead of throwing bad_alloc
* small refactor
2024-01-23 14:16:03 +02:00
Shahar Mike
b66db852f9
fix: Invalid memory access (#2435)
The (subtle) bug is that the previous code uses an `initializer_list` c'tor, which copies the
`string_view` locally. Then it keeps that reference to the `string_view`,
but it goes out of scope in the next line
2024-01-17 23:23:44 +02:00
Yue Li
d1db48d9d4
feat(server): Tracking memory usage for client tracking table (#2431)
Tracking memory usage for client tracking table using C++ memory resource and polymorphic allocator.
2024-01-17 13:20:23 -08:00
adiholden
9f3b118b87
server(tiering): load data on read (#2415)
* server(tiering): load data on read

Signed-off-by: adi_holden <adi@dragonflydb.io>
2024-01-17 16:13:56 +02:00
Kostas Kyrimis
3031e7a3ee
fix: non reset fields in command config resetstat (#2425) 2024-01-17 08:21:39 +02:00
Shahar Mike
f4c1e33d48
cleanup: Remove unused PerformDeletion() overloads (#2418) 2024-01-15 11:15:35 +00:00
Yue Li
8d09478474
bug(server): log evicted keys in journal in PrimeEvictionPolicy. (#2302)
fixes #2296

added a regression test that tests both policy based eviction as well as heart beat eviction.

---------

Signed-off-by: Yue Li <61070669+theyueli@users.noreply.github.com>
2024-01-11 01:45:29 -08:00
Shahar Mike
4874da8b5b
feat(cluster): Add RestoreStreamer. (#2390)
* feat(cluster): Add `RestoreStreamer`.

`RestoreStreamer`, like `JournalStreamer`, streams journal changes to a
sink. However, in addition, it traverses the DB like `RdbSerializer` and
sends existing entries as `RESTORE` commands.

Adding it required a bit of plumbing to get all journal changes to be
slot-aware.

In a follow-up PR I will remove the now unneeded `SerializerBase`.

* Fix build

* Fix bug

* Remove unimplemented function

* Iterate DB, drop support for db1+

* Send FULL-SYNC-CUT
2024-01-10 15:10:21 +02:00
Roman Gershman
1cab6695d7
chore: improvements in dash code (#2387)
chore: cosmetic improvements in dash code

1. Better naming
2. Improve improving the interface of ForEachSlot command
3. Wrap the repeating code of updating the bucket version into the UpdateVersion function

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-01-08 20:21:52 +02:00
adiholden
014a86fc88
feat(lru): add generic lru class (#2351)
Signed-off-by: adi_holden <adi@dragonflydb.io>
2024-01-07 21:51:46 +02:00
Shahar Mike
40f2a7190e
feat(getslotsinfo): Add memory usage per slot (#2355)
It's a good thing we waited with this feature until after the recent
refactors. Now it's trivial and safer!

Fixes #1478
2024-01-01 09:15:05 +02:00
Roman Gershman
1fb0a486ac
chore: transaction simplification (#2347)
chore: simplify transaction multi-locking

Also, add the ananlysis routine that determines whether the schewduled transaction is contended with other transaction in a
shard thread.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2023-12-31 17:02:12 +02:00
Roman Gershman
ce7497071c
feat: introduce 'debug tx' command and periodic overload logs (#2333)
This command shows the current state of transaction queues,
specifically how many armed (ready to run) transactions there,
how loaded these queue are and how many locks there are in each shard.

In addition, if a tx queue becomes too long, we will output warning logs about
the state of the queue, in order to be able to identify
the bottlenecks post-factum.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2023-12-25 11:48:55 +02:00
Shahar Mike
a360b308c9
refactor(server): Privatize PreUpdate() and PostUpdate() (#2322)
* refactor(server): Privatize `PreUpdate()` and `PostUpdate()`

While at it:
* Make `PreUpdate()` not decrease object size
* Remove redundant leftover call to `PreUpdate()` outside `DbSlice`

* Add pytest

* Test delete leads to 0 counters

* Improve test

* fixes

* comments
2023-12-25 07:49:57 +00:00
Roman Gershman
f90317a795
feat: add keyspace_mutations metric (#2329)
* feat: add keyspace_mutations metric

Currently we expose hits/misses for read only commands only (compatible with redis).
`keyyspace_mutations` complement this providing number of key operations for write commands.
It's interesting because now we can learn the number of key_ops vs API ops, where
key_ops = misses + hits + mutations

Signed-off-by: Roman Gershman <roman@dragonflydb.io>

* chore: address fixes

Signed-off-by: Roman Gershman <roman@dragonflydb.io>

---------

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2023-12-24 10:21:36 +02:00
Yue Li
6905389d60
feat(server): Support CLIENT TRACKING subcommand (2/2) (#2280)
fixes https://github.com/dragonflydb/dragonfly/issues/2139

This is part two that implements the logic which notifies tracking clients by sending invalidation messages:

- The client tracking state is set by CLIENT TRACKING subcommand as well
as upon client disconnection.

- Track the keys of a readonly command by maintaining mapping that maps
keys to the sets of tracking clients.

- Send invalidation messages to clients when their tracked keys are
updated.

- Make PerformDeletion a member function of DbSlice, and send 
invalidation message within the function.

- Mock the function for sending invalidation message to avoid test
crash due to lack of real listener in the testing framework.

- Add functional (some) tests for client tracking based on the mocked interfaces.

---------

Signed-off-by: Yue Li <61070669+theyueli@users.noreply.github.com>
2023-12-21 04:40:21 -08:00
adiholden
6398a73942
fix(bug): access invalid prime table iterator (#2300)
The bug:
When running dragonfly in cache mode we bump up items on dash table when we find them. If we access few items on the callback that reside next to each other we will invalidate the first found iterator.

The fix:
After we bump up entry we insert the prime table ref to bump set. When checking if we can bump down an item we check the item is not in this set. Once we finish running the transaction callback we clear the set.

Signed-off-by: adi_holden <adi@dragonflydb.io>
2023-12-20 13:05:29 +02:00