fix(cluster): Join on specified attempt id (#3305)

**The Bug**

Before this fix, source nodes would send `FIN` entries to target nodes
(in all thread flows), and would then send a `DFLYMIGRATE ACK` command
to verify that all flows received the `FIN` in time.

If they didn't, the source node would retry this logic in a loop, until
successful.

The problem is that, in some rear cases, one or more of the flows would
indeed be in a `FIN` state, _but of a previous `FIN` that is already
outdated_. If that's indeed the case, all data between that `FIN` and
the next `FIN`(s) will be lost.

**The Fix**

We already have an attempt id that we send in the `DFLYMIGRATE ACK`
command, and return it in the response. This fix utilizes the same
attempt id to be sent to all flows, and then when joined, we make sure
we join on the correct (latest) attempt id.

Unfortunately, we can't use `FIN` opcode now, because the protocol does
not send any additional metadata for this opcode. I chose to use LSN
because it has exactly the fields that we need, and one could possibly
think of Log Sequence Number as an attempt id, but I could change that
if it's unclear or too hacky.

**Testing**

To reproduce this, one needs to lower
`--slot_migration_connection_timeout_ms` significantly, say to 500ms.
This would fail, on my laptop, every ~2 runs.

With this fix, it runs hundreds of times and never reproduces.
This commit is contained in:
Shahar Mike 2024-07-14 11:20:01 +03:00 committed by GitHub
parent 8355569d46
commit 4e3bd94358
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
6 changed files with 47 additions and 24 deletions

View file

@ -231,9 +231,9 @@ void RestoreStreamer::Start(util::FiberSocketBase* dest, bool send_lsn) {
} while (cursor);
}
void RestoreStreamer::SendFinalize() {
VLOG(1) << "RestoreStreamer FIN opcode for : " << db_slice_->shard_id();
journal::Entry entry(journal::Op::FIN, 0 /*db_id*/, 0 /*slot_id*/);
void RestoreStreamer::SendFinalize(long attempt) {
VLOG(1) << "RestoreStreamer LSN opcode for : " << db_slice_->shard_id() << " attempt " << attempt;
journal::Entry entry(journal::Op::LSN, attempt);
io::StringSink sink;
JournalWriter writer{&sink};