Commit graph

47 commits

Author SHA1 Message Date
Kostas Kyrimis
267d5ab370
chore: remove DbSlice mutex and add ConditionFlag in SliceSnapshot (#4073)
* remove DbSlice mutex
* add ConditionFlag in SliceSnapshot
* disable compression when big value serialization is on
* add metrics

---------

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-12-05 13:24:23 +02:00
Borys
071e299971
refactor: remove redundant allocations for streamer (#4225)
* refactor: remove redundant allocations for streamer
2024-12-05 08:15:31 +00:00
Shahar Mike
779bba71f9
fix: Fix test_network_disconnect_during_migration test (#4224)
There are actually a few failures fixed in this PR, only one of which is a test bug:

* `db_slice_->Traverse()` can yield, causing `fiber_cancelled_`'s value to change
* When a migration is cancelled, it may never finish `WaitForInflightToComplete()` because it has `in_flight_bytes_` that will never reach destination due to the cancellation
* `IterateMap()` with numeric key/values overrode the key's buffer with the value's buffer

Fixes #4207
2024-12-02 15:55:23 +02:00
Borys
dc04b196d5
test: fix and unskip test_migration_timeout_on_sync (#4216) 2024-11-28 14:54:17 +02:00
Borys
43c83d29fa
feat: cluster migrations restarts immediately if timeout happens (#4081)
* feat: cluster migrations restarts immediately if timeout happens

* feat: add DEBUG MIGRATION PAUSE command
2024-11-25 16:02:22 +02:00
Shahar Mike
3c65651c69
feat: Huge values breakdown in cluster migration (#4144)
* feat: Huge values breakdown in cluster migration

Before this PR we used `RESTORE` commands for transferring data between
source and target nodes in cluster slots migration.

While this _works_, it has a side effect of consuming 2x memory for huge
values (i.e. if a single key's value takes 10gb, serializing it will
take 20gb or even 30gb).

With this PR we break down huge keys into multiple commands (`RPUSH`,
`HSET`, etc), respecting the existing `--serialization_max_chunk_size`
flag.

Part of #4100
2024-11-25 15:58:18 +02:00
Borys
e4b468d953
fix: reduce memory consumption during migration (#4017)
* refactor: reduce memory consumption for RestoreStreamer
* fix: add Throttling into RestoreStreamer::WriteBucket
2024-11-03 17:03:45 +02:00
Roman Gershman
f1f8ee17dc
fix: make snapshotting process more responsive (#3759)
* fix: improve BreakStalledFlowsInShard heuristic

Before this change - we wrote in a single call whatever record chunks we pulled from the channel.
This can be problematic for 1GB chunks for example, which might take 10sec to write.

Lately we added a replication breaker on the master side that breaks the fully sync after
a predefined threshold has passed. By default it was 10sec. To improve the robustness of this
breaker, we now write chunks of upto 1MB and update last_write_time_ns_ more frequently.

Also, we added more logs to make replication delays on both sides more visible.
We also added logs of breaking the replication on the master sides.

Unfortunately, this did not help making BreakStalledFlowsInShard more robust because now the
problem moved to replica side which may take 20s+ seconds to parse huge values.
Therefore, I increased the threshold for breaking the replication to 30s.

Finally, instrument GetMetrics call as it takes sometimes more than 1 sec.

---------

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-09-22 17:05:28 +03:00
Roman Gershman
a2e63f144c
fix: clang warnings (#3509)
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-08-14 09:39:56 +00:00
Roman Gershman
1cbfcd4912
chore: add timeout to replication sockets (#3434)
* chore: add timeout fo replication sockets

Master will stop the replication flow if writes could not progress for more than K millis.

---------

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Signed-off-by: Roman Gershman <romange@gmail.com>
Co-authored-by: Shahar Mike <chakaz@users.noreply.github.com>
2024-08-07 16:33:03 +03:00
Kostas Kyrimis
aa02070e3d
chore: add db_slice lock to protect segments from preemptions (#3406)
DastTable::Traverse is error prone when the callback passed preempts because the segment might change. This is problematic and we need atomicity while traversing segments with preemption. The fix is to add Traverse in DbSlice and protect the traversal via ThreadLocalMutex.

* add ConditionFlag to DbSlice
* add Traverse in DbSlice and protect it with the ConditionFlag
* remove condition flag from snapshot
* remove condition flag from streamer

---------

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-07-30 15:02:54 +03:00
Roman Gershman
7b2603aa46
fix: corruption in replication stream (#3344)
Before it was possible to issue several concurrent AsyncWrite requests.
But these are not atomic, which leads to replication stream corruption.
Now we wait for the previous request to finish before sending the next one.

ThrottleIfNeeded is now takes into account pending buffer size for throttling.

Fixes #3329

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-07-20 13:50:21 -04:00
Vladislav
22756eeb81
fix(migration): Use transactions! (#3266)
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-07-16 14:06:34 +03:00
Shahar Mike
4e3bd94358
fix(cluster): Join on specified attempt id (#3305)
**The Bug**

Before this fix, source nodes would send `FIN` entries to target nodes
(in all thread flows), and would then send a `DFLYMIGRATE ACK` command
to verify that all flows received the `FIN` in time.

If they didn't, the source node would retry this logic in a loop, until
successful.

The problem is that, in some rear cases, one or more of the flows would
indeed be in a `FIN` state, _but of a previous `FIN` that is already
outdated_. If that's indeed the case, all data between that `FIN` and
the next `FIN`(s) will be lost.

**The Fix**

We already have an attempt id that we send in the `DFLYMIGRATE ACK`
command, and return it in the response. This fix utilizes the same
attempt id to be sent to all flows, and then when joined, we make sure
we join on the correct (latest) attempt id.

Unfortunately, we can't use `FIN` opcode now, because the protocol does
not send any additional metadata for this opcode. I chose to use LSN
because it has exactly the fields that we need, and one could possibly
think of Log Sequence Number as an attempt id, but I could change that
if it's unclear or too hacky.

**Testing**

To reproduce this, one needs to lower
`--slot_migration_connection_timeout_ms` significantly, say to 500ms.
This would fail, on my laptop, every ~2 runs.

With this fix, it runs hundreds of times and never reproduces.
2024-07-14 11:20:01 +03:00
Kostas Kyrimis
bf2e5fd3f5
feat: yield when serialization is in progress (#3220)
* allow preemption when we serialize buckets
* add condition variable to protect interleaved preemptions

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-07-11 16:55:53 +03:00
Shahar Mike
6024d79bd6
feat(cluster): Support STICK bit in slot migration (#3200) 2024-06-21 08:18:03 +03:00
Shahar Mike
f66ee5f47d
fix(cluster): Support FLUSHALL while slot migration is in progress (#3173)
* fix(cluster): Support `FLUSHALL` while slot migration is in progress

Fixes #3132

Also do a small refactor to move cancellation logic into
`RestoreStreamer`.
2024-06-20 11:40:23 +03:00
Roman Gershman
a80063189e
chore: Streamer is rewritten with async interface (#3108)
* chore: Streamer is rewritten with async interface

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
---------

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-06-13 12:29:06 +03:00
Borys
7606af706f
fix: fix RestoreStreamer to prevent buckets skipping #2830 (#3119)
* fix: fix RestoreStreamer to prevent bucket skipping #2830
2024-06-04 11:50:03 +03:00
Roman Gershman
9bda5b1d4b
chore: another preparation commit to get rid of kv_args in transaction (#2996)
This changes Entry::Payload to struct instead of variant.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-05-02 09:59:45 +03:00
Borys
2230397a12
refactor: add cluster namespace (#2948)
* refactor: add cluster namespace, remove extra includes
2024-04-22 21:45:43 +03:00
Borys
7666aae6dc
Slot migration cancel crash fix (#2934)
fix(cluster): crash #2928
2024-04-19 14:31:42 +03:00
Shahar Mike
56965edbe1
feat(cluster): Migration cancellation support (#2869) 2024-04-17 13:19:31 +03:00
Borys
9bed3390d7
feat(cluster): add repeated ACK if an error is happened (#2892) 2024-04-12 16:20:19 +03:00
Borys
84d451fbed
fix: #2745 don't start migration process again after apply the same the same config is applied (#2822)
* fix: #2745 don't start a migration process again after the same config is applied
refactor: remove extra includes
2024-04-03 10:21:27 +03:00
adiholden
bb242a7894
bug(server): do not write lsn opcode to journal (#2814)
Signed-off-by: adi_holden <adi@dragonflydb.io>
2024-04-02 09:51:42 +03:00
Kostas Kyrimis
b2e2ad6e04
feat(server): check master journal lsn in replica (#2778)
Send journal lsn to replica and compare the lsn value against number of records received in replica side

Signed-off-by: kostas <kostas@dragonflydb.io>
Co-authored-by: adi_holden <adi@dragonflydb.io>
2024-04-01 17:51:31 +03:00
Borys
3ec43afd30
DFLYMIGRATE ACK refactoring (#2790)
* refactor: #2743 send dflymigrate flow from source
* refactor: DFLYMIGRATE ACK is sent from source node #2744
2024-04-01 12:29:17 +03:00
Borys
dfedaf7e6e
refactor: remove FULL-SYNC-CUT cmd #2687 (#2688)
* refactor: remove FULL-SYNC-CUT cmd #2687
2024-03-06 14:26:35 +02:00
Borys
8771ab32a6
refactor: create one type for slots set #2459 (#2645)
* refactor: create one type for slot ranges #2459
2024-02-23 14:10:42 +02:00
Shahar Mike
9baf7c2645
fix(streamer): Do not yield from the Traverse callback. (#2638)
* fix(streamer): Do not yield from the Traverse callback.

Yielding inside the callback can move entries within the bucket, which
is unsupported.

* fix
2024-02-21 20:26:16 +02:00
Shahar Mike
6d11f86091
test(cluster-migration): Fix some bugs and add cluster migration fuzzy tests (#2572)
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
Co-authored-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-02-12 13:47:34 +02:00
Borys
5189dae118
feat(cluster): add migration finalization (#2507)
* feat(cluster): add migration finalization
2024-02-01 17:24:54 +02:00
Borys
1dee082f86
fix(cluster): fix incorrect version checking and resource double free (#2499)
* fix(cluster): fix incorrect version checking and resource double free
2024-01-29 14:14:21 +02:00
Borys
43808da27f
fix(cluster): fix slot filtration to RestoreStreamer (#2477)
* fix(cluster): fix slot filtration to RestoreStreamer

* test: add cluster data migration test
2024-01-28 12:29:54 +02:00
Borys
a16b940a65
feat(cluster): add tx execution in cluster_shard_migration (#2385)
* feat(cluster): add tx execution in cluster_shard_migration
refactor(replication): move code that is common for cluster and
replica into a separate file, add full-sync-cut cmd
2024-01-22 21:19:39 +02:00
Shahar Mike
7debe3c685
fix(RestoreStreamer): Fix a few glitches (#2452) 2024-01-22 10:37:16 +02:00
Shahar Mike
409d22b1e6
feat(cluster): Add params to slot migration full sync cut (#2403) 2024-01-11 10:56:09 +00:00
Shahar Mike
4874da8b5b
feat(cluster): Add RestoreStreamer. (#2390)
* feat(cluster): Add `RestoreStreamer`.

`RestoreStreamer`, like `JournalStreamer`, streams journal changes to a
sink. However, in addition, it traverses the DB like `RdbSerializer` and
sends existing entries as `RESTORE` commands.

Adding it required a bit of plumbing to get all journal changes to be
slot-aware.

In a follow-up PR I will remove the now unneeded `SerializerBase`.

* Fix build

* Fix bug

* Remove unimplemented function

* Iterate DB, drop support for db1+

* Send FULL-SYNC-CUT
2024-01-10 15:10:21 +02:00
Roy Jacobson
db21b735f6
feat(replication): Use a ring buffer with messages to serve replication. (#1835)
* feat(replication): Use a ring buffer with messages to serve replication.

* Fix libraries dep graph

* Address PR feedback

* nits

* add a comment

* Lower the default log length
2023-09-18 13:59:41 +03:00
Roy Jacobson
4001a94b22
chore: Add names to fibers that were missing them (#1667) 2023-08-08 13:01:50 +02:00
Roy Jacobson
cbb2afc792
feat: Use journal LSNs for absolute replication offsets (#1242)
* feat: Use journal LSNs for absolute replication offsets

* 1 - Address small CR comments
2 - Simplify the offset accounting so that we send the correct offset
    in `SliceSnapshot::Stop` instead of counting in RdbLoader. This
    allows us to revert the changes to slice journaling of EXEC
    commands, for example.

* Store int with absl::little_endian

* Document the offset management
2023-05-22 15:34:32 +03:00
Roman Gershman
0cbd5f0348
chore: remove fiber related names from the codebase (#1018)
Remove Boost.Fibers mentions and remove fibers_ext mentions.
Done in preparation to switch to helio-native fb2 implementation.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2023-03-31 13:33:20 +03:00
Roman Gershman
c271e13176
chore: import fiber related primitives under dfly namespace (#1012)
This change removes most mentions of boost::fibers or util::fibers_ext.
Instead it introduces "core/fibers.h" file that incorporates most of
the primitives under dfly namespace. This is done in preparation to
switching from Boost.Fibers to helio native fibers.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2023-03-30 13:26:59 +03:00
adiholden
0312b66244
bug(replication): fix deadlock in cancle replication flow (#1007)
Signed-off-by: adi_holden <adi@dragonflydb.io>
2023-03-29 12:11:56 +03:00
adiholden
50f50c8380
feat(server): write journal record with optional await based on flag… (#791)
* feat(server): write journal recorod with optional await based on flag issue #788

Signed-off-by: adi_holden <adi@dragonflydb.io>
2023-02-15 09:34:24 +02:00
adiholden
9c9ae84493
feat(server): add dfly replica offset command (#780)
* feat(server): add dfly replica offset command

Signed-off-by: adi_holden <adi@dragonflydb.io>
2023-02-13 11:11:33 +02:00