* chore: add ability to track connections stuck at send
Add send_delay_seconds/send_delay_ms metrics.
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Signed-off-by: Roman Gershman <romange@gmail.com>
Co-authored-by: Shahar Mike <chakaz@users.noreply.github.com>
Before: if socket data arrived in small bits, then CheckForHttpProto would grow
io_buf_ capacity exponentially with each iteration. For example, test_match_http test
easily causes OOM.
This PR ensures that there is always a buffer available - but it grows linearly with the input size.
Currently, the total input in CheckForHttpProto is limited to 1024.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
The regression was caused by #3947 and it causes crashes in bullmq.
It has not been found till now because python client sends commands in uppercase.
Fixes#4113
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Co-authored-by: Kostas Kyrimis <kostas@dragonflydb.io>
* fix: Do not publish to connections without context
This is a rare case where a closed connection is kept alive while the
handling fiber yields, therefore leaving `cc_` (the connection context)
pointing to null for other fibers to see.
As far as I can see, this can only happen during server shutdown, but
there could be other cases that I have missed.
The test on its own does _not_ reproduce the crash, however with added
`ThisFiber::SleepFor()`s I could reproduce the crash:
* Right before `DispatchBrief()`
[here](e3214cb603/src/server/channel_store.cc (L154))
* Right after connection context `reset()`
[here](2ab480e160/src/facade/dragonfly_connection.cc (L750))
In any case, calling `SendPubMessageAsync()` to a connection where `cc_`
is null is a bug, and we fix that here.
* rewording
A common case is that we need to clean up a connection before we exit a test via .close() method. This is needed because otherwise the connection will raise a warning that it is left unclosed. However, remembering to call .close() at each connection at the end of the test is cumbersome! Luckily, fixtures in python can be marked as async which allow us to:
* cache all clients created by DflyInstance.client()
* clean them all at the end of the fixture in one go
---------
Signed-off-by: kostas <kostas@dragonflydb.io>
We do not allow notify_keyspace_events to be set at runtime via config set command.
* allow notify_keyspace_events in config set command
* add tests
---------
Signed-off-by: kostas <kostas@dragonflydb.io>
We disable address space randomization when building the binary
and use addr2line to symbolize the stacktrace if it exists.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* chore: fix test_parser_memory_stats flakiness
1. Added a robust assert_eventually decorator for pytests
2. Improved the assertion condition in TieredStorageTest.BackgroundOffloading
3. Added total_uploaded stats for tiering that tells how many times offloaded values
were promoted back to RAM.
* chore: skip test_cluster_fuzzymigration
Leave only connection memory usage in memory stats.
We should think how we can move it also to /metrics.
In addition, added a test verifying that redis parser memory
usage is tracked.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* chore: Introduce pipeline back-pressure
Also, improve synchronization primitives and replace them with
thread-local variations.
Before the change, on my local machine with the dragonfly running with 8 threads,
`memtier_benchmark -c 10 --threads 8 --command="PING" --key-maximum 100000000 --hide-histogram --distinct-client-seed --pipeline=20 --test-time=10`
reached 10M qps with 0.327ms p99.9.
After the change, the same command showed 13.8M qps with 0.2ms p99.9
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
fix: authorize the http connection to call DF commands
The assumption is that basic-auth already covers the authentication part.
And thanks to @sunneydev for finding the bug and providing the tests.
The tests actually uncovered another bug where we may parse partial http requests.
This one is handled by https://github.com/romange/helio/pull/243
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* chore: preparation for basic http api
The goal is to provide very basic support for simple commands,
fancy stuff like pipelining, blocking commands won't work.
1. Added optional registration for /api handler.
2. Implemented parsing of post body.
3. Added basic formatting routine for the response. It does not cover all the commands but should suffice for
basic usage.
The API is a POST method and the body of the request should contain command arguments formatted as json array.
For example, `'["set", "foo", "bar", "ex", "100"]'`.
The response is a json object with either `result` field holding the response of the command or
`error` field containing the error message sent by the server.
See `test_http` test in tests/dragonfly/connection_test.py for more details.
* chore: cover iouring with enable_direct_fd
---------
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
* feat(connection): Support pipelining with Memcached
Adds support for pipelining to Memcached, enhances Memcached pytests
---------
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
* chore: add more states to client connections
* fix: clear pipelined messages before close
* fix: skip same thread on backpressure
---------
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
Co-authored-by: Roman Gershman <roman@dragonflydb.io>
1. If the first request sent to the connection is large (2kb or more)
Dragonfly was closing the connection.
2. Changed server side error reporting according to memcache protocol:
https://github.com/memcached/memcached/blob/master/doc/protocol.txt#L172
3. Fixed the wrong casting in DispatchCommand.
4. Remove practically unused code that translated opstatus to strings.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
1. add tls-ca-cert-file flag
2. add tls-ca-cert-dir flag
3. enables redis-cli to connect over tls without --insecure flag by properly validating certificate wtih CA
fix: remove bad check-fail in the transaction code.
Fixes#1421.
The failure reproduces for dragongly running with a single thread where all the
arguments grouped within the same ShardData
Also, we improve verbosity levels inside reply_builder.cc.
For that we extend SinkReplyBuilder to support protocol errors reporting
and we remove ad-hoc code for this from dragonfly_connection.
Required to track errors easily with `--vmodule=reply_builder=1`
Finally, a pytest is added to cover the issue.
Signed-off-by: Roman Gershman <roman@dragonflydb.io>
Now `SUBSCRIBE` will respond synchronously. The change is here so we:
1. Maintain the order in pipelined requests
2. Don't have a "race condition": subscribe needs to update channel store pointers on all threads. While it awaits for all threads to complete the callback, some of them might have done it earlier, so they can already start sending messages before the initial ack is sent
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>