fix(regTests): Wait between ACTIVE until `stable_sync (#2111)

Regression test sometimes fails because for a short period of time after `wait_available_async()` returns, the result of `ROLE` could still be different from `stable_sync`

[Failure example](1828275961 (step):6:1863)

We change our state from `LOADING` to `ACTIVE` [here](d08d7f13b4/src/server/replica.cc (L426)), but then we change the sync state 2 times:
1. `!R_SYNCING` [here](d08d7f13b4/src/server/replica.cc (L427C28-L427C37))
2. And only later to `R_SYNC_OK` (meaning `stable_sync`) [here](d08d7f13b4/src/server/replica.cc (L221))

This is easy to reproduce by adding a sleep right after the set of state to `ACTIVE`, either before or after the flipping of `R_SYNCING` (with different returned states).

BTW without that added sleep I was not able to reproduce, having tried 1000s of times in various configurations.

We could change the order of things such that we first change `state_mask_` and only then switch state from `LOADING` to `ACTIVE` (which is probably the right thing to do), but that would require a subtle refactor, as we change these in a couple of places.

But we should keep in mind that this has no effect on users. So a simple sleep on the test side should fix this fairly well.
This commit is contained in:
Shahar Mike 2023-11-02 13:09:42 +02:00 committed by GitHub
parent d08d7f13b4
commit 169c9d3975
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -918,6 +918,8 @@ async def test_role_command(df_local_factory, n_keys=20):
await c_replica.execute_command(f"REPLICAOF localhost {master.port}")
await wait_available_async(c_replica)
await asyncio.sleep(1)
assert await c_master.execute_command("role") == [
b"master",
[[b"127.0.0.1", bytes(str(replica.port), "ascii"), b"stable_sync"]],