fix(cluster): Don't miss keys when migrating slots (#3218)

In rare cases, the fuzzy cluster migration test detected missing keys.
It turns out that the missing keys were skipped at the source side due
to contention:
* The OnDbChange callback registered and got a `snapshot_id`
* It then blocked on a mutex, and could not add itself to the list of
  callbacks
* When the mutex was released, it registered, but it missed all changes
  that happened between registering (`snapshot_id`) and the moment it
  registered
This commit is contained in:
Shahar Mike 2024-06-25 15:41:17 +03:00 committed by GitHub
parent 847e2edc09
commit f28bd93854
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -1095,15 +1095,18 @@ void DbSlice::ExpireAllIfNeeded() {
}
uint64_t DbSlice::RegisterOnChange(ChangeCallback cb) {
uint64_t ver = NextVersion();
// TODO rewrite this logic to be more clear
// this mutex lock is needed to check that this method is not called simultaneously with
// change_cb_ calls and journal_slice::change_cb_arr_ calls.
// It can be unlocked anytime because DbSlice::RegisterOnChange
// and journal_slice::RegisterOnChange calls without preemption
std::lock_guard lk(cb_mu_);
uint64_t ver = NextVersion();
change_cb_.emplace_back(ver, std::move(cb));
DCHECK(std::is_sorted(change_cb_.begin(), change_cb_.end(),
[](auto& a, auto& b) { return a.first < b.first; }));
return ver;
}