Commit graph

206 commits

Author SHA1 Message Date
adiholden
b1e688b33f
bug(server): set connection flags block/pause flag on all blocking commands (#2816)
* bug((server)): set connecttion blocking and puash flags on all blocking commands

Signed-off-by: adi_holden <adi@dragonflydb.io>
2024-04-09 09:49:33 +03:00
Vladislav
fbc55bb82d
feat(transaction): Idempotent callbacks (immediate runs) (#2453)
This commit generalizes the machanism of running transaction callbacks during scheduling, removing the need for specialized ScheduleUniqueShard/RunQuickie. Instead, transactions can be run now during ScheduleInShard - called "immediate" runs - if the transaction is concluding and either only a single shard is active or the operation can be safely repeated if scheduling failed (idempotent commands, like MGET).

Updates transaction stats to mirror the new changes more closely.

---------

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-04-03 23:06:57 +03:00
Kostas Kyrimis
b2e2ad6e04
feat(server): check master journal lsn in replica (#2778)
Send journal lsn to replica and compare the lsn value against number of records received in replica side

Signed-off-by: kostas <kostas@dragonflydb.io>
Co-authored-by: adi_holden <adi@dragonflydb.io>
2024-04-01 17:51:31 +03:00
Vladislav
3aa4a29834
chore(transaction): Introduce RunCallback (#2760)
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-03-22 16:02:02 +03:00
Vladislav
9c6e6a96b7
fix(transaction): Replace with armed sync point (#2708)
1. Replaces run_barrier as a synchronization point with is_armed + an embedded blocking counter for awaiting running jobs
2. Replaces IsArmedInShard + GetLocalMask + is_armed.exchange chain with a single DisarmInShard() / DisarmInShardWhen

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-03-14 14:40:32 +00:00
Vladislav
292c5bcd71
chore: little transaction cleanup (#2608)
Make renabled_autojournal a regular bool, simplify CancelShardCb logic

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>

---------

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-03-08 09:50:09 +03:00
adiholden
7e4527098b
fix(server): client pause work while blocking commands run (#2584)
fix #2576
fix #2661

Signed-off-by: adi_holden <adi@dragonflydb.io>
2024-02-28 11:07:03 +00:00
Vladislav
1b51e82e55
chore(transaction): Add debug stats for fail printing (#2600)
* chore(transaction): Add debug stats for per shard data

---------

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-02-18 15:36:14 +03:00
Vladislav
4d4fed6fec
chore(transaction): Untie scheduling from multi status (#2590)
* chore(transaction): Untie scheduling from multi status

Idea: We decide whether we have to schedule not based on our multi status (atomic multi), but solely based on the fact if COORD_SCHED is set

Goal: Being able to use ScheduleSingleHop()/Schedule() for multi transactions, and thus later allow single hop multi transactions
---------

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-02-18 12:25:57 +03:00
Vladislav
bbbcddfdd6
chore(transaction): Copy poll flags (#2596)
* chore(transaction): Copy poll flags

Copying poll flags prevents concurrent data access to PerShardData::local_mask when dispatching poll tasks
---------

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-02-15 11:55:02 +03:00
Vladislav
963023f07c
chore(transaction): Simplify armed state (#2508)
* chore(transaction): Simplify armed state

Remove atomic is_armed variable and turn it into a regular local state flag. This is now possible because we have clearly defined phases with the phased barrier and baton barrier for blocking commands

---------

Signed-off-by: Vladislav <vlad@dragonflydb.io>
2024-02-11 12:06:36 +03:00
Shahar Mike
9912df09ae
fix(server): Init tx time for all multi/lua transactions (#2562)
* fix(server): Return correct `TIME` under unscheduled tx

Fixes #2555

* Init tx time in all multi / lua cases

* init ctor
2024-02-08 14:47:07 +02:00
Vladislav
e0f86697f9
fix: fix script replication (#2531)
* fix: fix script replication

Single key script replication was previously broken because the EXEC entry wasn't sent. Send it manually

---------

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-02-04 20:28:44 +03:00
Vladislav
40d08a3c67
fix(transaction): Add special barrier for blocking tx (#2512)
Refactor blocking transaction code. Introduce BatonBarrier for managing atomic and exclusive wakeup notifications that don't conflict with neither expiration nor cancelling

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-02-01 23:37:36 +03:00
Roman Gershman
b5b5093165
fix: fix BLOCKING/REVERSE_MAPPING flags for some commands (#2516)
* fix: BLOCKING/REVERSE_MAPPING flags for some commands

Also, simplify interfaces around REVERSE_MAPPING in the internal tx code.
---------

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-02-01 15:39:14 +00:00
Borys
5189dae118
feat(cluster): add migration finalization (#2507)
* feat(cluster): add migration finalization
2024-02-01 17:24:54 +02:00
Roman Gershman
adeac6bd27
Pr1 (#2517)
* fix: Remove a stale reference to blocking watch queue

1. Remove the duplicated FinalizeWatched function
2. Identify the case where we delete the watched queue while we may still have awakedened_keys pointing to it.
3. Add a test reproducing the issue of having in awakened_keys an untangled key.

Properly fixes #2514

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
---------

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-02-01 14:19:08 +02:00
Vladislav
90a9f05e36
chore(transaction): Use PhasedBarrier for easier synchronization (#2455)
chore(transaction): Use PhasedBarrier for easier synchronization

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-01-30 19:43:06 +03:00
Roman Gershman
d608ec9c62
chore: Introduce LockKey for LockTable (#2463)
This should reduce allocations in a common case (not multi).
In addition, rename Transaction::args_ to kv_args_.

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
Co-authored-by: Vladislav <vlad@dragonflydb.io>
2024-01-28 12:19:15 +02:00
Roman Gershman
3ebb32df3f
chore: lock keys when going through fast-path execution (#2491)
This is needed if we want to allow asynchronous transactional operations during the callback execution.
Also update actions versions.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-01-27 19:02:53 +02:00
Vladislav
675b3889a4
chore(transaction): Launder copied keys in multi transactions (#2478)
* chore(transaction): Launder copied keys in multi transactions

---------

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-01-27 12:24:42 +02:00
Vladislav
7b6181641c
chore(transaction): Simplify PollExecution() (#2457)
* chore(transaction): Simplify PollExecution()

Remove seqlock_ from transaction. This change is possible because:
- We don't re-use shard_data[0] for multi transactions anymore
- We disarm atomically and poll callbacks are stateless

This makes it safe to call PollExecution() unconditionally that will determine on it's own whether the caller needs to run or is already expired

---------

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-01-26 14:34:17 +03:00
Vladislav
08d2fa52e1
fix: fixes for v1.14.0 (#2473)
* fix: fixes for v1.14.0

Stop writing to the replication ring_buffer
Stop allocating in TopKeys
Tighter CHECKs around tx execution.

---------

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-01-25 11:23:14 +00:00
Vladislav
aeb2b00ac8
fix(transaction): Improve ACTIVE flags management (#2458)
* fix(transaction): Improve ACTIVE flags management

---------

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-01-23 15:40:11 +03:00
Vladislav
07a6dc0712
feat(transaction): Independent out of order execution (#2426)
Previously, transactions would run out of order only when all shards determined that the keys locks were free. With this change, each shard might decide to run out of order independently if the locks are free. COORD_OOO is now deprecated and the OUT_OF_ORDER per-shard flag should is used to indicate it

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-01-22 10:38:10 +03:00
Shahar Mike
2f0287429d
fix(replication): Correctly replicate commands even when OOM (#2428)
* fix(replication): Correctly replicate commands even when OOM

Before this change, OOM in shard callbacks could have led to data
inconsistency between the master and the replica. For example, commands
which mutated data on 1 shard but failed on another, like `LMOVE`.

After this change, callbacks that result in an OOM will correctly
replicate their work (none, partial or complete) to replicas.

Note that `MSET` and `MSETNX` required special handling, in that they are
the only commands that can _create_ multiple keys, and so some of them
can fail.

Fixes #2381

* fixes

* test fix

* RecordJournal

* UNDO idiotnessness

* 2 shards

* fix pytest
2024-01-18 12:29:59 +02:00
Vladislav
bf89c7eac2
chore(transaction): Clean up scheduling code (#2422)
* chore(transction): Clean scheduling code
2024-01-17 17:33:48 +03:00
adiholden
9f3b118b87
server(tiering): load data on read (#2415)
* server(tiering): load data on read

Signed-off-by: adi_holden <adi@dragonflydb.io>
2024-01-17 16:13:56 +02:00
Vladislav
1fb3c74933
fix(rdb): Remove transaction from pre/post load search index rebuild (#2419) 2024-01-16 10:08:16 +03:00
Vladislav
de817098a7
feat(transaction): Single hop blocking, callback flags (#2393)
* feat(transaction): Single hop blocking, callback flags
2024-01-15 21:13:22 +03:00
Vladislav
078db5caae
fix(tx): guard parallel writes to local result (#2417) 2024-01-15 13:51:30 +03:00
Roman Gershman
7054fc56b1
chore: remove atomic<> from ReplicaInfo::state (#2409)
* chore: remove atomic<> from ReplicaInfo::state

This field is protected by ReplicaInfo::mu so non-protected access to it shows a design problem.
Indeed, it was done for being able to access this field without a mutex inside ReplicationLags() function.

I moved the access to this field to GetReplicasRoleInfo where we need to lock ReplicaRoleInfo anyways.
Also, done some cleanups in the file.

Finally, raised a threshold for "tx queue too long" warnings.

---------

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-01-13 18:03:29 +02:00
Shahar Mike
4874da8b5b
feat(cluster): Add RestoreStreamer. (#2390)
* feat(cluster): Add `RestoreStreamer`.

`RestoreStreamer`, like `JournalStreamer`, streams journal changes to a
sink. However, in addition, it traverses the DB like `RdbSerializer` and
sends existing entries as `RESTORE` commands.

Adding it required a bit of plumbing to get all journal changes to be
slot-aware.

In a follow-up PR I will remove the now unneeded `SerializerBase`.

* Fix build

* Fix bug

* Remove unimplemented function

* Iterate DB, drop support for db1+

* Send FULL-SYNC-CUT
2024-01-10 15:10:21 +02:00
Vladislav
b8af49cfe5
chore(transaction): Avoid COORD_SCHED_EXEC ambiguity with multi transactions (#2392)
* chore(transaction): Avoid COORD_SCHED_EXEC ambiguity

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2024-01-10 11:31:11 +03:00
Roman Gershman
cb3e366459
feat: introduce user timeout (#2361)
* feat: introduce user timeout

* feat: introduce tcp_user_timeout flag.

See TCP_USER_TIMEOUT flag in tcp(7) man page.
This linux-only setting allows fail faster during the send operation
if for some reason the remote socket is unresponsive and does not send ACKs for
the transmitted segments.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>

* Update src/facade/dragonfly_listener.cc

Co-authored-by: Shahar Mike <chakaz@users.noreply.github.com>
Signed-off-by: Roman Gershman <romange@gmail.com>

---------

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Signed-off-by: Roman Gershman <romange@gmail.com>
Co-authored-by: Shahar Mike <chakaz@users.noreply.github.com>
2024-01-03 08:06:25 +02:00
Borys
5b905452b3
fix: unblock transactions only if requirements are correct (#2345)
fixes #2294

bug: we unblock XREADGROUP cmd even if we don't have new values

fix: added check with custom requirements for blocking comands
2024-01-02 14:55:06 +02:00
Roman Gershman
fc1a70598d
fix "debug exec" command (#2354)
fix: fix "debug exec" command

It used mutex lock inside Await callback which is prohibited.

In addition, we improved loggings across the transaction code.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2024-01-01 18:29:20 +02:00
Roman Gershman
1fb0a486ac
chore: transaction simplification (#2347)
chore: simplify transaction multi-locking

Also, add the ananlysis routine that determines whether the schewduled transaction is contended with other transaction in a
shard thread.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2023-12-31 17:02:12 +02:00
Roman Gershman
5035d4e1e3
chore: expose the multi length in slowlog (#2339)
1. Fix AnalyzeTxQueue to stop crashing for various transaction types.
2. Pass exec command length to slowlog

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2023-12-27 16:11:14 +02:00
Vladislav
8c873fd71c
fix: Invalid key lock strings with squashing (#2341)
* fix: clear shard data in squash preparation
2023-12-27 11:48:43 +02:00
Roman Gershman
ce7497071c
feat: introduce 'debug tx' command and periodic overload logs (#2333)
This command shows the current state of transaction queues,
specifically how many armed (ready to run) transactions there,
how loaded these queue are and how many locks there are in each shard.

In addition, if a tx queue becomes too long, we will output warning logs about
the state of the queue, in order to be able to identify
the bottlenecks post-factum.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2023-12-25 11:48:55 +02:00
Roman Gershman
bbe3d9303b
feat: introduce transaction statistics in the info output (#2328)
1. How many transactions we processed by type
2. How many transactions we processed by width (number of unique shards).

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2023-12-23 13:18:49 +02:00
adiholden
6398a73942
fix(bug): access invalid prime table iterator (#2300)
The bug:
When running dragonfly in cache mode we bump up items on dash table when we find them. If we access few items on the callback that reside next to each other we will invalidate the first found iterator.

The fix:
After we bump up entry we insert the prime table ref to bump set. When checking if we can bump down an item we check the item is not in this set. Once we finish running the transaction callback we clear the set.

Signed-off-by: adi_holden <adi@dragonflydb.io>
2023-12-20 13:05:29 +02:00
Vladislav
aaf01d4244
feat(cluster): Cancel blocking commands on cluster update (#2255)
Handle blocking commands during cluster config update

---------

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2023-12-17 15:32:35 +03:00
Roman Gershman
d88b2422de
chore: eliminate most of clang++ warnings (#2288)
Not all of them but 90% is done.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
2023-12-11 12:47:53 +02:00
Yue Li
64bbfc7063
feat(server): Support CLIENT TRACKING subcommand (1/2) (#2277)
The client tracking state is set by CLIENT TRACKING subcommand as well
as upon client disconnection.

Track the keys of a readonly command by maintaining mapping that maps
keys to the sets of tracking clients.
2023-12-08 23:13:55 -08:00
Kostas Kyrimis
8323c82dc5
feat(acl): add acl keys to acl save/load (#2273)
* add acl keys to acl savel/load
* add tests
2023-12-08 16:08:33 +00:00
Kostas Kyrimis
2703d4635d
feat(acl): add validation for acl keys (#2272)
* add validation for acl keys
* add tests
2023-12-08 17:28:53 +02:00
zixuan zhao
3f7e42b099
Add store test case for GeoRadiusByMember (#2210)
* Add store test case for GeoRadiusByMember;Parsing code for STORE and STOREDIST

---------

Signed-off-by: azuredream <zhaozixuan67@gmail.com>
2023-11-27 13:01:22 +02:00
Vladislav
46292968ad
fix(search): Fix replication (#2159)
* fix(search): Support replication

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>

---------

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
2023-11-13 11:58:54 +03:00