This commit generalizes the machanism of running transaction callbacks during scheduling, removing the need for specialized ScheduleUniqueShard/RunQuickie. Instead, transactions can be run now during ScheduleInShard - called "immediate" runs - if the transaction is concluding and either only a single shard is active or the operation can be safely repeated if scheduling failed (idempotent commands, like MGET).
Updates transaction stats to mirror the new changes more closely.
---------
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
Send journal lsn to replica and compare the lsn value against number of records received in replica side
Signed-off-by: kostas <kostas@dragonflydb.io>
Co-authored-by: adi_holden <adi@dragonflydb.io>
1. Replaces run_barrier as a synchronization point with is_armed + an embedded blocking counter for awaiting running jobs
2. Replaces IsArmedInShard + GetLocalMask + is_armed.exchange chain with a single DisarmInShard() / DisarmInShardWhen
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
* chore(transaction): Untie scheduling from multi status
Idea: We decide whether we have to schedule not based on our multi status (atomic multi), but solely based on the fact if COORD_SCHED is set
Goal: Being able to use ScheduleSingleHop()/Schedule() for multi transactions, and thus later allow single hop multi transactions
---------
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
* chore(transaction): Simplify armed state
Remove atomic is_armed variable and turn it into a regular local state flag. This is now possible because we have clearly defined phases with the phased barrier and baton barrier for blocking commands
---------
Signed-off-by: Vladislav <vlad@dragonflydb.io>
Refactor blocking transaction code. Introduce BatonBarrier for managing atomic and exclusive wakeup notifications that don't conflict with neither expiration nor cancelling
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
* fix: BLOCKING/REVERSE_MAPPING flags for some commands
Also, simplify interfaces around REVERSE_MAPPING in the internal tx code.
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* fix: Remove a stale reference to blocking watch queue
1. Remove the duplicated FinalizeWatched function
2. Identify the case where we delete the watched queue while we may still have awakedened_keys pointing to it.
3. Add a test reproducing the issue of having in awakened_keys an untangled key.
Properly fixes#2514
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
This should reduce allocations in a common case (not multi).
In addition, rename Transaction::args_ to kv_args_.
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
Co-authored-by: Vladislav <vlad@dragonflydb.io>
This is needed if we want to allow asynchronous transactional operations during the callback execution.
Also update actions versions.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* chore(transaction): Simplify PollExecution()
Remove seqlock_ from transaction. This change is possible because:
- We don't re-use shard_data[0] for multi transactions anymore
- We disarm atomically and poll callbacks are stateless
This makes it safe to call PollExecution() unconditionally that will determine on it's own whether the caller needs to run or is already expired
---------
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
* fix: fixes for v1.14.0
Stop writing to the replication ring_buffer
Stop allocating in TopKeys
Tighter CHECKs around tx execution.
---------
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
Previously, transactions would run out of order only when all shards determined that the keys locks were free. With this change, each shard might decide to run out of order independently if the locks are free. COORD_OOO is now deprecated and the OUT_OF_ORDER per-shard flag should is used to indicate it
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
* fix(replication): Correctly replicate commands even when OOM
Before this change, OOM in shard callbacks could have led to data
inconsistency between the master and the replica. For example, commands
which mutated data on 1 shard but failed on another, like `LMOVE`.
After this change, callbacks that result in an OOM will correctly
replicate their work (none, partial or complete) to replicas.
Note that `MSET` and `MSETNX` required special handling, in that they are
the only commands that can _create_ multiple keys, and so some of them
can fail.
Fixes#2381
* fixes
* test fix
* RecordJournal
* UNDO idiotnessness
* 2 shards
* fix pytest
* chore: remove atomic<> from ReplicaInfo::state
This field is protected by ReplicaInfo::mu so non-protected access to it shows a design problem.
Indeed, it was done for being able to access this field without a mutex inside ReplicationLags() function.
I moved the access to this field to GetReplicasRoleInfo where we need to lock ReplicaRoleInfo anyways.
Also, done some cleanups in the file.
Finally, raised a threshold for "tx queue too long" warnings.
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* feat(cluster): Add `RestoreStreamer`.
`RestoreStreamer`, like `JournalStreamer`, streams journal changes to a
sink. However, in addition, it traverses the DB like `RdbSerializer` and
sends existing entries as `RESTORE` commands.
Adding it required a bit of plumbing to get all journal changes to be
slot-aware.
In a follow-up PR I will remove the now unneeded `SerializerBase`.
* Fix build
* Fix bug
* Remove unimplemented function
* Iterate DB, drop support for db1+
* Send FULL-SYNC-CUT
* feat: introduce user timeout
* feat: introduce tcp_user_timeout flag.
See TCP_USER_TIMEOUT flag in tcp(7) man page.
This linux-only setting allows fail faster during the send operation
if for some reason the remote socket is unresponsive and does not send ACKs for
the transmitted segments.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* Update src/facade/dragonfly_listener.cc
Co-authored-by: Shahar Mike <chakaz@users.noreply.github.com>
Signed-off-by: Roman Gershman <romange@gmail.com>
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Signed-off-by: Roman Gershman <romange@gmail.com>
Co-authored-by: Shahar Mike <chakaz@users.noreply.github.com>
fix: fix "debug exec" command
It used mutex lock inside Await callback which is prohibited.
In addition, we improved loggings across the transaction code.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
chore: simplify transaction multi-locking
Also, add the ananlysis routine that determines whether the schewduled transaction is contended with other transaction in a
shard thread.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
1. Fix AnalyzeTxQueue to stop crashing for various transaction types.
2. Pass exec command length to slowlog
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
This command shows the current state of transaction queues,
specifically how many armed (ready to run) transactions there,
how loaded these queue are and how many locks there are in each shard.
In addition, if a tx queue becomes too long, we will output warning logs about
the state of the queue, in order to be able to identify
the bottlenecks post-factum.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
1. How many transactions we processed by type
2. How many transactions we processed by width (number of unique shards).
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
The bug:
When running dragonfly in cache mode we bump up items on dash table when we find them. If we access few items on the callback that reside next to each other we will invalidate the first found iterator.
The fix:
After we bump up entry we insert the prime table ref to bump set. When checking if we can bump down an item we check the item is not in this set. Once we finish running the transaction callback we clear the set.
Signed-off-by: adi_holden <adi@dragonflydb.io>
The client tracking state is set by CLIENT TRACKING subcommand as well
as upon client disconnection.
Track the keys of a readonly command by maintaining mapping that maps
keys to the sets of tracking clients.