From 169c9d3975a259d38f31e15e38fee5f64b20a9ee Mon Sep 17 00:00:00 2001 From: Shahar Mike Date: Thu, 2 Nov 2023 13:09:42 +0200 Subject: [PATCH] fix(regTests): Wait between `ACTIVE` until `stable_sync (#2111) Regression test sometimes fails because for a short period of time after `wait_available_async()` returns, the result of `ROLE` could still be different from `stable_sync` [Failure example](https://github.com/dragonflydb/dragonfly/actions/runs/6726461923/job/18282759612#step:6:1863) We change our state from `LOADING` to `ACTIVE` [here](https://github.com/dragonflydb/dragonfly/blob/d08d7f13b450be2048d59d5e0af9f4554978b235/src/server/replica.cc#L426), but then we change the sync state 2 times: 1. `!R_SYNCING` [here](https://github.com/dragonflydb/dragonfly/blob/d08d7f13b450be2048d59d5e0af9f4554978b235/src/server/replica.cc#L427C28-L427C37) 2. And only later to `R_SYNC_OK` (meaning `stable_sync`) [here](https://github.com/dragonflydb/dragonfly/blob/d08d7f13b450be2048d59d5e0af9f4554978b235/src/server/replica.cc#L221) This is easy to reproduce by adding a sleep right after the set of state to `ACTIVE`, either before or after the flipping of `R_SYNCING` (with different returned states). BTW without that added sleep I was not able to reproduce, having tried 1000s of times in various configurations. We could change the order of things such that we first change `state_mask_` and only then switch state from `LOADING` to `ACTIVE` (which is probably the right thing to do), but that would require a subtle refactor, as we change these in a couple of places. But we should keep in mind that this has no effect on users. So a simple sleep on the test side should fix this fairly well. --- tests/dragonfly/replication_test.py | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tests/dragonfly/replication_test.py b/tests/dragonfly/replication_test.py index 4987848c0..91a7ccef3 100644 --- a/tests/dragonfly/replication_test.py +++ b/tests/dragonfly/replication_test.py @@ -918,6 +918,8 @@ async def test_role_command(df_local_factory, n_keys=20): await c_replica.execute_command(f"REPLICAOF localhost {master.port}") await wait_available_async(c_replica) + await asyncio.sleep(1) + assert await c_master.execute_command("role") == [ b"master", [[b"127.0.0.1", bytes(str(replica.port), "ascii"), b"stable_sync"]],