Commit graph

84 commits

Author SHA1 Message Date
Roman Gershman
95cd9dfb4c
chore: update helio and improve our stack overflow resiliency (#4349)
1. Run CI/Regression tests with HELIO_STACK_CHECK=4096.
   This will crash if a fiber stack usage goes below this limit.
2. Increase shard queue stack size to 64KB
3. Increase fiber stack size to 40KB on Debug builds.
4. Updated helio has some changes around the TLS socket code.
   In addition we add a helper script to generate self-signed certificates helpful for local development work.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-12-23 08:13:45 +00:00
Stepan Bagritsevich
612d50df3b
refactor(rdb_saver): Add SnapshotDataConsumer to SliceSnapshot (#4287)
* refactor(rdb_saver): Add SnapshotDataConsumer to SliceSnapshot

fixes #4218

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>

* refactor: address comments

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>

---------

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>
2024-12-23 08:42:13 +04:00
adiholden
e462fc0401
fix(server): use compression for non big values (#4331)
* fix server: use compression for non big values
---------

Signed-off-by: adi_holden <adi@dragonflydb.io>
2024-12-18 22:03:45 +02:00
adiholden
330d007d56
feat server: support config set serialization_max_chunk_size (#4274)
Signed-off-by: adi_holden <adi@dragonflydb.io>
2024-12-09 09:12:18 +02:00
Kostas Kyrimis
267d5ab370
chore: remove DbSlice mutex and add ConditionFlag in SliceSnapshot (#4073)
* remove DbSlice mutex
* add ConditionFlag in SliceSnapshot
* disable compression when big value serialization is on
* add metrics

---------

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-12-05 13:24:23 +02:00
adiholden
90b4fea0d9
bug(replication): snapshot cleanup fix in transition to stable sync (#4211)
Signed-off-by: adi_holden <adi@dragonflydb.io>
2024-11-28 08:38:36 +02:00
adiholden
3ab244616a
feat(server) : snapshot traverse physical buckets (#4084)
Signed-off-by: adi_holden <adi@dragonflydb.io>
2024-11-11 14:49:20 +02:00
adiholden
ae3faf59fb
feat(server): dont use channel for replication / save df (#4041)
* feat server: dont use channel for replication / save df

Signed-off-by: adi_holden <adi@dragonflydb.io>
2024-11-05 16:50:01 +02:00
adiholden
e71f679386
fix(server): fix replication master deadlock on cancelation flow (#3686)
* fix server: fix replication deadlock on cancelation

Signed-off-by: adi_holden <adi@dragonflydb.io>
2024-09-10 14:13:38 +03:00
Roman Gershman
257749263b
chore: adjust RdbChannel sizes (#3676)
Fixes #3658

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-09-09 22:21:42 +03:00
Roman Gershman
b7b96424e4
deprecate RecordsPopper and serialize channel records during push (#3667)
chore: deprecate RecordsPopper and serialize channel records during push

Records channel is redundant for DFS/replication because we have single producer/consumer
scenario and both running on the same thread. Unfortunately we need it for RDB snapshotting.

For non-rdb cases we could just pass a io sink to the snapshot producer,
so that it would use it directly instead of StringFile inside FlushChannelRecord.

This would reduce memory usage, eliminate yet another memory copy and generally would make everything simpler.
For that to work, we must serialize the order of FlushChannelRecord, and that's implemented by
this PR. Also fixes #3658.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-09-09 06:19:04 +00:00
Roman Gershman
264835e9c4
chore: cosmetic changes around Snapshot functions (#3652)
* chore: cosmetic changes around Snapshot functions

Some renames and added comments. Refactored StartIncremental into a separate function
without any functional changes.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>

* chore: fix comments

---------

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-09-08 09:25:41 +03:00
Borys
8266c8d026
fix: MC flags size and serialization #3134 (#3538) 2024-08-21 18:31:03 +03:00
Kostas Kyrimis
d3a893f2a6
fix: disable ThreadLocalMutex when big value ser is off (#3521)
* fix: disable ThreadLocalMutex when big value ser is off

* refactor: address comments

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-28-29.ec2.internal>
Co-authored-by: Borys <borys@dragonflydb.io>
2024-08-15 22:19:01 +03:00
Kostas Kyrimis
1c9e9c5922
fix: big value serialization corner cases (#3430)
There are some problematic flows. First we did not handle deletions, so all sorts of consistency issues could arise while calling DbSlice::Traverse() and DbSlice::Del(). Second, we did not handle FlushAll (same as before, Traverse() preempts and FlushAll() kicks in. Third we did not handle expirations.

---------

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-08-11 14:17:32 +03:00
Kostas Kyrimis
3f08a60148
chore: reset serialization_max_chunk_size to 0 (#3432)
* reset serialization_max_chunk_size to 0
* reword flag information

---------

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-08-05 09:36:23 +03:00
Kostas Kyrimis
aa02070e3d
chore: add db_slice lock to protect segments from preemptions (#3406)
DastTable::Traverse is error prone when the callback passed preempts because the segment might change. This is problematic and we need atomicity while traversing segments with preemption. The fix is to add Traverse in DbSlice and protect the traversal via ThreadLocalMutex.

* add ConditionFlag to DbSlice
* add Traverse in DbSlice and protect it with the ConditionFlag
* remove condition flag from snapshot
* remove condition flag from streamer

---------

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-07-30 15:02:54 +03:00
Vladislav
1a8c12225b
chore(tiering): Move cool entry warmup to DbSlice (#3397)
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-07-28 17:30:41 +03:00
Kostas Kyrimis
929222a7df
chore: add mem test for big values and default the flag (#3369)
* default serialization_max_chunk_size to 10 mb
* add test for big values
* small rename of enum to conform style guide

---------

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-07-24 16:07:27 +03:00
Kostas Kyrimis
bcdfccc039
fix: protect OnJournalEntry with ConditionGuard (#3367)
* add ConditionGuard on JournalEntry such that the stream state stays consistent

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-07-23 09:05:34 +00:00
Kostas Kyrimis
cd0e03a737
chore: disable compression on big values (#3358)
*  compression when we chunk big values

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-07-23 08:57:21 +00:00
Kostas Kyrimis
bfa5df5d6c
feat: add an option to flush serialized entries on threshold limit (#3241)
* serialize big slots in chunks
* allow preemption on large slots
* disable big entries serialization for RDB files
* add test

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-07-18 10:15:41 +00:00
Kostas Kyrimis
bf2e5fd3f5
feat: yield when serialization is in progress (#3220)
* allow preemption when we serialize buckets
* add condition variable to protect interleaved preemptions

Signed-off-by: kostas <kostas@dragonflydb.io>
2024-07-11 16:55:53 +03:00
adiholden
ccada875e0
feat(server): master stop sending exec opcode to replica (#3289)
* feat server: master stop sending exec opcode to replica

Signed-off-by: adi_holden <adi@dragonflydb.io>
2024-07-09 14:48:31 +03:00
Vladislav
6a873b4f1c
feat(tiering): Simple snapshotting (#3073)
* feat(tiering): Simple snapshotting

---------

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-06-04 17:15:21 +03:00
adiholden
6e33261402
fix(server): fix compatibility with rdb snapshot (#3121)
* fix server: fix compatibility with rdb snapshot


Signed-off-by: adi_holden <adi@dragonflydb.io>
2024-06-04 06:28:18 +00:00
adiholden
b2213b05d1
fix(replication): fullsync phase write to sync on noop (#3084)
* fix replication: fullsync phase write to sync on noop

Signed-off-by: adi_holden <adi@dragonflydb.io>
2024-05-27 17:52:07 +03:00
Shahar Mike
54c9633cb8
feat(dbslice): Add self-laundering iterator in DbSlice (#2815)
A self-laundering iterator will enable us to, eventually, yield from fibers while holding an iterator. For example:

```cpp
auto it1 = db_slice.Find(...);
Yield();  // Until now - this could have invalidated `it1`
auto it2 = db_slice.Find(...);
```

Why is this a good idea? Because it will enable yielding inside PreUpdate() which will allow breaking down of writing huge entries in small quantities to disk/network, eliminating the need to allocate huge chunks of memory just for serialization.

Also, it'll probably unlock future developments as well, as yielding can be useful in other contexts.
2024-04-09 12:00:52 +03:00
adiholden
bb242a7894
bug(server): do not write lsn opcode to journal (#2814)
Signed-off-by: adi_holden <adi@dragonflydb.io>
2024-04-02 09:51:42 +03:00
Kostas Kyrimis
b2e2ad6e04
feat(server): check master journal lsn in replica (#2778)
Send journal lsn to replica and compare the lsn value against number of records received in replica side

Signed-off-by: kostas <kostas@dragonflydb.io>
Co-authored-by: adi_holden <adi@dragonflydb.io>
2024-04-01 17:51:31 +03:00
Shahar Mike
66b87e16c2
feat(server): Account for serializer's temporary buffer size (#2689)
* feat(server): Account for serializer's temporary buffer size

* gh comments
2024-03-06 13:39:32 +02:00
Kostas Kyrimis
d54f2201ae
feat: add current_fork_perc in info all command (#2640)
* add field current_snapshot_perc (instead of current_fork_perc)
* add field current_save_keys_processed
* add field current_save_keys_total
2024-02-26 11:17:31 +02:00
adiholden
15b3fb13b6
fix(server): saving is not a server state (#2613)
* fix(server): saving is not a server state

Signed-off-by: adi_holden <adi@dragonflydb.io>
2024-02-19 15:20:48 +00:00
Roman Gershman
fa75360227
chore: get rid of object.c and robj* in cc code (#2610)
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-02-18 16:52:23 +02:00
Shahar Mike
bc9b214ae4
fix(server): Do not yield in journal if not allowed (#2540)
* fix(server): Do not yield in journal if not allowed

* Add pytest

* Compare keys

* check_all_replicas_finished
2024-02-06 12:35:00 +02:00
Shahar Mike
d45ded3b76
fix(server): Fix crash when using MEMORY STATS commands (#2240)
* fix(server): Fix crash when using MEMORY STATS commands

* Fix
2023-12-01 21:39:32 +02:00
Roman Gershman
0c5bb7b894
fix: regression test failures (#2226)
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2023-11-27 21:41:52 +02:00
Shahar Mike
d6292ba6fd
feat(server): Better connection memory tracking (#2205) 2023-11-26 14:51:52 +02:00
Shahar Mike
5ca2be1185
feat: Memory stats (#2162) 2023-11-13 13:58:29 +02:00
Vladislav
e84d9a65d8
fix(server): Add additional metrics (#1975)
* fix(server): Clean up metrics collection
* feat(server): Replication memory metrics
* fix(server): Limit dispatch queue size

---------

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2023-10-06 14:16:22 +03:00
Roy Jacobson
d50b492e1f
feat(replication): First iteration on partial sync. (#1836)
First iteration on partial sync.
2023-09-26 10:35:50 +03:00
Roy Jacobson
db21b735f6
feat(replication): Use a ring buffer with messages to serve replication. (#1835)
* feat(replication): Use a ring buffer with messages to serve replication.

* Fix libraries dep graph

* Address PR feedback

* nits

* add a comment

* Lower the default log length
2023-09-18 13:59:41 +03:00
Roy Jacobson
4001a94b22
chore: Add names to fibers that were missing them (#1667) 2023-08-08 13:01:50 +02:00
Roy Jacobson
cbb2afc792
feat: Use journal LSNs for absolute replication offsets (#1242)
* feat: Use journal LSNs for absolute replication offsets

* 1 - Address small CR comments
2 - Simplify the offset accounting so that we send the correct offset
    in `SliceSnapshot::Stop` instead of counting in RdbLoader. This
    allows us to revert the changes to slice journaling of EXEC
    commands, for example.

* Store int with absl::little_endian

* Document the offset management
2023-05-22 15:34:32 +03:00
Roy Jacobson
6632261a2d
fix(server): Prevent preemption inside SerializeBucket (#1111)
* fix(server): Prevent preemption inside SerializeBucket

* Modifications after speaking to Adi
2023-04-20 10:27:47 +03:00
Roy Jacobson
d9f45f369a
fix(server): Allow interrupting REPLICAOF commands during full sync (#1058)
* fix(server): Allow interrupting REPLICAOF commands during full sync
* fix(server): Fix a race condition and UB in a debug assert
2023-04-18 11:44:44 +03:00
Roman Gershman
0cbd5f0348
chore: remove fiber related names from the codebase (#1018)
Remove Boost.Fibers mentions and remove fibers_ext mentions.
Done in preparation to switch to helio-native fb2 implementation.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2023-03-31 13:33:20 +03:00
Roman Gershman
c271e13176
chore: import fiber related primitives under dfly namespace (#1012)
This change removes most mentions of boost::fibers or util::fibers_ext.
Instead it introduces "core/fibers.h" file that incorporates most of
the primitives under dfly namespace. This is done in preparation to
switching from Boost.Fibers to helio native fibers.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2023-03-30 13:26:59 +03:00
Roman Gershman
12abe0bc12
chore: update helio dependency (#984)
Also remove direct references for boost fibers from the code.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2023-03-24 18:04:05 +03:00
Roman Gershman
b7abe269f1
fix: send a single RDB_OPCODE_FULLSYNC_END from a snapshot (#920)
Fixes #917 by appending a blob of 8 bytes during serialization and consuming
it during the parsing phase.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2023-03-08 13:25:12 +02:00