fix(regTests): Wait between ACTIVE until `stable_sync (#2111)

Regression test sometimes fails because for a short period of time after `wait_available_async()` returns, the result of `ROLE` could still be different from `stable_sync` [Failure example](1828275961 (step):6:1863) We change our state from `LOADING` to `ACTIVE` [here](d08d7f13b4/src/server/replica.cc (L426)), but then we change the sync state 2 times: 1. `!R_SYNCING` [here](d08d7f13b4/src/server/replica.cc (L427C28-L427C37)) 2. And only later to `R_SYNC_OK` (meaning `stable_sync`) [here](d08d7f13b4/src/server/replica.cc (L221)) This is easy to reproduce by adding a sleep right after the set of state to `ACTIVE`, either before or after the flipping of `R_SYNCING` (with different returned states). BTW without that added sleep I was not able to reproduce, having tried 1000s of times in various configurations. We could change the order of things such that we first change `state_mask_` and only then switch state from `LOADING` to `ACTIVE` (which is probably the right thing to do), but that would require a subtle refactor, as we change these in a couple of places. But we should keep in mind that this has no effect on users. So a simple sleep on the test side should fix this fairly well.
2025-05-10 18:05:44 +02:00 · 2023-11-02 13:09:42 +02:00 · 2023-11-02 13:09:42 +02:00 · 169c9d3975
commit 169c9d3975
parent d08d7f13b4
1 changed files with 2 additions and 0 deletions
--- a/tests/dragonfly/replication_test.py
+++ b/tests/dragonfly/replication_test.py
@ -918,6 +918,8 @@ async def test_role_command(df_local_factory, n_keys=20):
    await c_replica.execute_command(f"REPLICAOF localhost {master.port}")
    await wait_available_async(c_replica)

+    await asyncio.sleep(1)
+
    assert await c_master.execute_command("role") == [
        b"master",
        [[b"127.0.0.1", bytes(str(replica.port), "ascii"), b"stable_sync"]],