fix: make snapshotting process more responsive (#3759)

mirror of https://github.com/dragonflydb/dragonfly.git synced 2025-05-11 10:25:47 +02:00

* fix: improve BreakStalledFlowsInShard heuristic

Before this change - we wrote in a single call whatever record chunks we pulled from the channel.
This can be problematic for 1GB chunks for example, which might take 10sec to write.

Lately we added a replication breaker on the master side that breaks the fully sync after
a predefined threshold has passed. By default it was 10sec. To improve the robustness of this
breaker, we now write chunks of upto 1MB and update last_write_time_ns_ more frequently.

Also, we added more logs to make replication delays on both sides more visible.
We also added logs of breaking the replication on the master sides.

Unfortunately, this did not help making BreakStalledFlowsInShard more robust because now the
problem moved to replica side which may take 20s+ seconds to parse huge values.
Therefore, I increased the threshold for breaking the replication to 30s.

Finally, instrument GetMetrics call as it takes sometimes more than 1 sec.

---------

Signed-off-by: Roman Gershman <roman@dragonflydb.io>

This commit is contained in:

Roman Gershman

2024-09-22 17:05:28 +03:00

• committed by

GitHub

parent 2e9b133ea0

commit f1f8ee17dc

No known key found for this signature in database

GPG key ID: B5690EEEBB952194

6 changed files with 56 additions and 13 deletions

									
										2

src/server/journal/streamer.cc
									
										View file
										
				@ -13,7 +13,7 @@

				using namespace facade;

				ABSL_FLAG(uint32_t, replication_timeout, 10000,

				ABSL_FLAG(uint32_t, replication_timeout, 30000,

				          "Time in milliseconds to wait for the replication writes being stuck.");

				ABSL_FLAG(uint32_t, replication_stream_output_limit, 64_KB,

Rows
Columns

fix: make snapshotting process more responsive (#3759)

2 src/server/journal/streamer.cc Unescape Escape View file

2

src/server/journal/streamer.cc

View file