dragonfly

mirror of https://github.com/dragonflydb/dragonfly.git synced 2025-05-11 10:25:47 +02:00

Author	SHA1	Message	Date
Roman Gershman	6d30baa20b	chore: Pipelining fixes (#4994 ) Fixes #4998. 1. Reduces agressive yielding when reading multiple requests since it humpers pipeline efficiency. Now we yield consistently based on cpu time spend since the last resume point (via flag with sane defaults). 2. Increases socket read buffer size effectively allowing processing more requests in bulk. `./dragonfly --cluster_mode=emulated` latencies (usec) for pipeline sizes 80-199: p50: 1887, p75: 2367, p90: 2897, p99: 6266 `./dragonfly --cluster_mode=emulated --experimental_cluster_shard_by_slot` latencies (usec) for pipeline sizes 80-199: p50: 813, p75: 976, p90: 1216, p99: 3528 Signed-off-by: Roman Gershman <roman@dragonflydb.io>	2025-04-27 20:48:02 +03:00
Tarun Pothulapati	7d0530656f	feat(tools/replay): Add pipeline latency distribution data (#4990 ) feat(replay): add latency distributions * also add avg latency * also include pipeline range * display both at the end	2025-04-24 19:23:43 +03:00
Roman Gershman	7ffe812967	feat(dfly_bench): allow regulated throughput in 3 modes (#4962 ) * feat(dfly_bench): allow regulated throughput in 3 modes 1. Coordinated omission - with --qps=0, each request is sent and then we wait for the response and so on. For pipeline mode, k requests are sent and then we wait for them to return to send another k 2. qps > 0: we schedule sending requests at frequency "qps" per connection but if pending requests count crosses a limit we slow down by throttling request sending. This mode enables gentle uncoordinated omission, where the schedule converges to the real throughput capacity of the backend (if it's slower than the target throughput). 3. qps < 0, similar as (2) but does not adjust its scheduling and may overload the server if target QPS is too high. Signed-off-by: Roman Gershman <roman@dragonflydb.io> * chore: change pipelining and coordinated omission logic Before that the uncoordinated omission only worked without pipelining. Now, with pipelining mode with send a burst of P requests and then: a) For coordinated omission - wait for all of them to complete before proceeding further b) For non-coordinated omission - we sleep to pace our single connection throughput as defined by the qps setting. Signed-off-by: Roman Gershman <roman@dragonflydb.io> --------- Signed-off-by: Roman Gershman <roman@dragonflydb.io>	2025-04-21 09:56:33 +03:00
Roman Gershman	220f20bac6	feat: expose table capacities instead of number of buckets (#4956 ) Also, add a local dashboard demonstrating prime table load per db. Signed-off-by: Roman Gershman <roman@dragonflydb.io>	2025-04-18 10:30:04 +03:00
Roman Gershman	5a2192dfdf	fix: local dashboard show rapid changes in QPS (#4886 ) Helps investigating #4787 Signed-off-by: Roman Gershman <roman@dragonflydb.io>	2025-04-03 20:12:26 +03:00
Roman Gershman	460690855e	fix: add version id for dev container builds (#4878 ) now it looks like this: ``` > docker run --rm ghcr.io/dragonflydb/dragonfly-dev:ubuntu-f767d82 --version dragonfly f767d82-f767d82ce78ccbc90ddfb525f4ad4bd9aafcfbed ``` fixes #4830 Signed-off-by: Roman Gershman <roman@dragonflydb.io>	2025-04-02 20:21:19 +03:00
Roman Gershman	2a3a1567b9	feat(cluster_mgr): add populate command (#4816 ) * feat(cluster_mgr): add populate command We further simplify the code around cluster config Also - add a command that populates all the cluster ranges in the cluster using the "populate" command. `--size` and `--valsize` arguments are also added. Signed-off-by: Roman Gershman <roman@dragonflydb.io> * chore: fixes --------- Signed-off-by: Roman Gershman <roman@dragonflydb.io>	2025-03-25 10:47:10 +02:00
Roman Gershman	72fb25694b	chore(cluster_mgr): introduce SlotRange class (#4814 ) Before: slot merging/splitting logic was mixed with business logic. Also, slots were represented as dictionary, which made the code less readable. Now, SlotRange handles the low-level logic, which makes the high-level code simpler to understand. Signed-off-by: Roman Gershman <roman@dragonflydb.io>	2025-03-23 08:41:44 +02:00
Roman Gershman	e01aec2a21	fix(dfly_bench): track hit rate for mget command (#4723 ) Also, clean up the code a bit, reduce nesting. Signed-off-by: Roman Gershman <roman@dragonflydb.io>	2025-03-07 07:09:59 +00:00
mkaruza	debb2eb9e8	feat(cluster_mgr): Add argument to set path to dragonfly binary (#4695 ) Add optional argument to cluster_mgr script so that we can run cluster with different builds. Signed-off-by: mkaruza <mario@dragonflydb.io>	2025-03-04 12:52:24 +01:00
Roman Gershman	52d88c2372	chore: introduce docker release pipeline (#4618 ) * chore: introduce docker release pipeline The whole flow is reimplemented using native arm64/amd64 runners. Signed-off-by: Roman Gershman <roman@dragonflydb.io> * Update .github/workflows/docker-release2.yml Co-authored-by: Kostas Kyrimis <kostas@dragonflydb.io> Signed-off-by: Roman Gershman <romange@gmail.com> * chore: comments --------- Signed-off-by: Roman Gershman <roman@dragonflydb.io> Signed-off-by: Roman Gershman <romange@gmail.com> Co-authored-by: Kostas Kyrimis <kostas@dragonflydb.io>	2025-02-17 12:24:24 +02:00
Roman Gershman	e433ef87bf	fix: debian path in dragonfly.service (#4594 ) Split the rpm service file from debian. Fixes #4593 Signed-off-by: Roman Gershman <roman@dragonflydb.io>	2025-02-12 09:18:51 +02:00
Roman Gershman	b0b9a72dbd	feat: introduce more options for traffic logger (#4571 ) 1. Provide clear usage instructions 2. Add "pace" option, which when false, sends traffic as quickly as possible (default true). 3. Add skip option that sometimes can be useful to remove unneeded noise Signed-off-by: Roman Gershman <roman@dragonflydb.io>	2025-02-07 11:10:13 +02:00
Roman Gershman	bafb427a09	fix: rpm package setup (#4506 ) Also, fix the deadlock problem on shutdown on Oracle Linux 5.15 Fixes #4505 Signed-off-by: Roman Gershman <roman@dragonflydb.io>	2025-01-26 12:40:33 +02:00
Roman Gershman	904775cfe6	chore: new docker build pipeline (#4503 ) Our previous weekly pipeline used qemu, was very slow and over-complicated. This one uses matrix with proper parallelization and the latest arm64 github runners. now it takes less than 30 minutes to build everything. lets make it daily.	2025-01-26 12:03:42 +02:00
Roman Gershman	6265f52bff	feat(dev): allow monitoring a valkey server on localhost (#4467 )	2025-01-18 10:46:14 +02:00
Roman Gershman	95cd9dfb4c	chore: update helio and improve our stack overflow resiliency (#4349 ) 1. Run CI/Regression tests with HELIO_STACK_CHECK=4096. This will crash if a fiber stack usage goes below this limit. 2. Increase shard queue stack size to 64KB 3. Increase fiber stack size to 40KB on Debug builds. 4. Updated helio has some changes around the TLS socket code. In addition we add a helper script to generate self-signed certificates helpful for local development work. Signed-off-by: Roman Gershman <roman@dragonflydb.io>	2024-12-23 08:13:45 +00:00
Roman Gershman	904d21d666	fix: add content-type for metrics response (#4340 ) chore: add content-type for metrics response. Also, update the local stack to use prometheus 3.0 Finally, hex-escape arguments when logging an error for a command. Fixes #4277 Signed-off-by: Roman Gershman <roman@dragonflydb.io>	2024-12-18 19:12:00 +00:00
Borys	d6f2b76666	fix: cluster_mgr script (#4210 )	2024-11-27 14:09:19 +00:00
Roman Gershman	63742dd0cf	fix: stop using openssl for container healthchecks (#4181 ) Dragonfly responds to ascii based requests to tls port with: `-ERR Bad TLS header, double check if you enabled TLS for your client.` Therefore, it is possible to test now both tls and non-tls ports with a plain-text PING. Fixes #4171 Also, blacklist the bloom-filter test that Dragonfly does not support yet. Signed-off-by: Roman Gershman <roman@dragonflydb.io>	2024-11-25 17:41:17 +02:00
s13k	ff2359af30	fix(tools): Prevent dragonfly.logrotate to stop logrotate service (#4176 ) Update dragonfly.logrotate If multiple logs are being rotated and one of them fails (due to exit 1), the other logs that follow won't be rotated either, unless logrotate is run again. If you want to prevent the rotation of a specific log file and not affect the rest of the logs, you'll want to handle the condition properly to ensure that logrotate doesn't abort due to the failure of the prerotate script. To prevent the rotation of a specific log file without causing issues for other logs, you can use exit 0 to prevent rotation cleanly or design your prerotate script to handle conditions carefully. Signed-off-by: s13k <s13k@pm.me>	2024-11-24 17:27:05 +00:00
Sebastian Struß	cfca3e798d	adjusted grafana dashboard to be more user friendly (#4165 )	2024-11-24 09:16:00 +02:00
dependabot[bot]	86b64d910a	chore(deps): bump github.com/redis/go-redis/v9 from 9.5.1 to 9.7.0 in /tools/replay (#4062 ) chore(deps): bump github.com/redis/go-redis/v9 in /tools/replay Bumps [github.com/redis/go-redis/v9](https://github.com/redis/go-redis) from 9.5.1 to 9.7.0. - [Release notes](https://github.com/redis/go-redis/releases) - [Changelog](https://github.com/redis/go-redis/blob/master/CHANGELOG.md) - [Commits](https://github.com/redis/go-redis/compare/v9.5.1...v9.7.0) --- updated-dependencies: - dependency-name: github.com/redis/go-redis/v9 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-11-04 22:31:01 +02:00
dependabot[bot]	ceb474fbda	chore(deps): bump numpy from 1.24.1 to 2.1.3 in /tools (#4063 ) Bumps [numpy](https://github.com/numpy/numpy) from 1.24.1 to 2.1.3. - [Release notes](https://github.com/numpy/numpy/releases) - [Changelog](https://github.com/numpy/numpy/blob/main/doc/RELEASE_WALKTHROUGH.rst) - [Commits](https://github.com/numpy/numpy/compare/v1.24.1...v2.1.3) --- updated-dependencies: - dependency-name: numpy dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-11-04 22:30:34 +02:00
Roman Gershman	4012ad1855	fix: prevents Dragonfly from blocking in epoll during snapshotting (#3911 ) The problem - we used file write in non-direct mode when writing snapshots in epoll mode. As a result - lots of data was cached into OS memory. But then during the rename operation, when we rename "xxx.dfs.tmp" into "xxx.dfs", the OS flushes the file caches and the thread is stuck in OS system call rename for a long time. The fix - to use DIRECT mode and to avoid caching the data into OS caches at all. Fixes #3895 Signed-off-by: Roman Gershman <roman@dragonflydb.io>	2024-10-12 18:26:12 +03:00
Roman Gershman	c9a2334f6d	fix: allow the healthcheck run in non-privileged containers as well (#3731 ) fix: allow the healthcheck running in non-privileged containers as well Fixes #3644 (again). Signed-off-by: Roman Gershman <roman@dragonflydb.io>	2024-09-20 05:41:06 +00:00
Shahar Mike	1c6be62a0b	fix: Fix `cluster_mgr.py` (#3730 ) We updated the reply of `SLOT-MIGRATION-STATUS`, so `cluster_mgr.py` needs to be adjusted as well.	2024-09-18 11:44:15 +03:00
Roman Gershman	3cdc8fa128	chore: add a script that parses allocator tracking logs (#3687 )	2024-09-10 07:26:44 +00:00
Tarun Pothulapati	65f96e3bb5	fix(docker/healthcheck): run netstat port retreival command as dfly (#3647 ) * fix(docker/healthcheck): run netstat port retreival command as dfly	2024-09-04 14:34:35 +00:00
Sebastian Struß	06f6dcafcd	fix(grafana): Fix grafana dragonfly dashboard datasource (#3608 ) fix: grafana dragonfly dashboard datasource	2024-08-30 17:15:51 +00:00
dependabot[bot]	e8a8d534f9	chore(deps): bump gopkg.in/yaml.v3 from 3.0.0-20210107192922-496545a6307b to 3.0.0 in /tools/replay (#3603 ) chore(deps): bump gopkg.in/yaml.v3 in /tools/replay Bumps gopkg.in/yaml.v3 from 3.0.0-20210107192922-496545a6307b to 3.0.0. --- updated-dependencies: - dependency-name: gopkg.in/yaml.v3 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-08-29 16:40:37 +03:00
Roman Gershman	cec3659b51	fix: named volume permissions in docker (#3518 ) Fixes #2917 The problem is described in this "working as intended" issue https://github.com/moby/moby/issues/3124 So the advised approach of using "USER dfly" directive does not really work because it requires that the host will also define 'dfly' user with the same id. It's unrealistic expectation. Therefore, we revert the fix done in #1775 and follow valkey approach: https://github.com/valkey-io/valkey-container/blob/mainline/docker-entrypoint.sh#L12 1. we run the entrypoint in the container as root which later spawns the dragonfly process 2. if we run as root: a. we chmod files under /data to dfly. b. use setpriv to exec ourselves as dfly. 3. if we do not run as root we execute the docker command. So even though the process starts as root, the server runs as dfly and only the bootstrap part has elevated permissions is used to fix the volume access. While we are at it, we also switched to setpriv following the change of https://github.com/valkey-io/valkey-container/pull/24/files Signed-off-by: Roman Gershman <roman@dragonflydb.io>	2024-08-22 11:33:29 +03:00
Vladislav	84a697dd75	chore(traffic loger): use pipelining and print/analyze commands (#3527 ) Add run, print, analyze commands to traffic logger; add support for pipelines	2024-08-20 09:32:15 +03:00
Roman Gershman	93f6773297	chore: reduce pipelining latency by reusing existing shard fibers (#3494 ) * chore: reduce pipelining latency by reusing existing shard fibers To prove the benefits, run `./dfly_bench --pipeline=50 -n 20000 --ratio 0:1 --qps=0 --key_maximum=1` Before: the average pipelining latency was 10ms After: the average pipelining latency is 5ms. Avg latency: pipelined_latency_usec / total_pipelined_squashed_commands Also, improved counting of squashed commands - to count actual squashed ones. --------- Signed-off-by: Roman Gershman <roman@dragonflydb.io>	2024-08-14 14:45:54 +03:00
Borys	48a28c3ea3	refactor: set info_replication_valkey_compatible=true (#3467 ) * refactor: set info_replication_valkey_compatible=true * test: mark test_cluster_replication_migration as skipped because it's broken	2024-08-08 21:42:58 +03:00
Shahar Mike	38fba1d398	fix: cluster_mgr.py to use `CLUSTER MYID` (#3444 )	2024-08-05 07:29:31 +00:00
adiholden	e3eb8518fd	feat(test): Improve benchmark workflow (#3330 ) Signed-off-by: adi_holden <adi@dragonflydb.io>	2024-07-17 14:34:48 +03:00
Roman Gershman	374a5f529e	chore: print effective QPS of the server. (#3274 ) Also refactor ReceiveFB into multiple functions. Finally, fix the memcached command in local monitoring stack. Signed-off-by: Roman Gershman <roman@dragonflydb.io>	2024-07-07 06:26:14 +00:00
Roman Gershman	8240c7f19e	chore(monitoring): add more dashboards + memcached (#3268 )	2024-07-05 07:12:13 +00:00
Shahar Mike	5b731f163c	feat(cluster_mgr): Fix migration action (#3124 )	2024-06-04 13:27:42 +03:00
Shahar Mike	bcbcc5a2c6	feat(cluster_mgr): Take over command (#3120 )	2024-06-04 11:39:08 +03:00
Shahar Mike	6e6c91aeaf	feat(cluster_mgr): Improvements to `cluster_mgr.py` (#3118 ) Make sure attached node is in right mode Enable detaching nodes	2024-06-03 19:05:17 +00:00
Roman Gershman	0394387a5f	chore: export pipeline related metrics (#3104 ) * chore: export pipeline related metrics Export in /metrics 1. Total pipeline queue length 2. Total pipeline commands 3. Total pipelined duration Signed-off-by: Roman Gershman <roman@dragonflydb.io> --------- Signed-off-by: Roman Gershman <roman@dragonflydb.io>	2024-05-30 19:10:35 +03:00
Shahar Mike	d1e3c82eaa	feat(cluster_mgr): Allow attaching replicas (#3105 )	2024-05-30 15:29:58 +03:00
Vladislav	fd5ece09fb	chore: small replayer fixes (#3081 )	2024-05-25 22:48:29 +03:00
Roman Gershman	8a0007d761	chore: add replication memory stats to the dashboard (#3065 )	2024-05-22 08:11:54 +03:00
Jirapong Pansak	3babe99cf6	<chore>!: Update grafana panel (#3064 ) update panel	2024-05-19 15:56:44 +00:00
Roman Gershman	fd74fd5b4b	chore: Export replication memory stats (#3062 )	2024-05-18 22:40:14 +03:00
Borys	3dd6c4959c	feat: add defragment command (#3003 ) * feat: add defragment command and improve auto defragmentation algorithm	2024-05-08 14:26:42 +03:00
adiholden	186ff31e29	Fix benchmark (#3017 ) * fix(benchmark): fix lag check Signed-off-by: adi_holden <adi@dragonflydb.io> --------- Signed-off-by: adi_holden <adi@dragonflydb.io>	2024-05-06 18:38:13 +03:00

1 2 3

133 commits