chore: add malloc-based stats and decommit (#2692)

* chore: add malloc-based stats and decommit

Provides more stats and control with glibc-malloc based allocator.
For example,
with v1.15.0 (--proactor_threads=2), empty database, `info memory`returns

```
used_memory:614576
used_memory_human:600.2KiB
used_memory_peak:614576
used_memory_peak_human:600.2KiB
used_memory_rss:19922944
used_memory_rss_human:19.00MiB
```

then during `memtier_benchmark  -n 300000  --key-maximum 100000 --ratio 0:1 --threads=30 -c 100` (i.e GET-only with 3k connections):

```
used_memory:614576
used_memory_human:600.2KiB
used_memory_peak:614576
used_memory_peak_human:600.2KiB
used_memory_rss:59985920
used_memory_rss_human:57.21MiB
used_memory_peak_rss:59985920
```

connections overhead grows by ~39MB.
when the traffic stops, `used_memory_rss_human` becomes `30.35MiB`
and we do not know where 11MB gets lost and `MEMORY DECOMMIT` does not reduce the RSS.

With this change, `memory malloc-stats` return during the memtier traffic
```
malloc arena: 394862592
malloc fordblks: 94192
```
i.e. 395MB virtual memory was allocated by malloc and only 94KB is chunks available for reuse.
395MB is arena virtual memory, and not RSS obviously, but at least we have some visibility into malloc reservations.
The RSS usage is the same ~57MB and the difference between virtual and RSS is due to the fact we reserve fiber stacks of size 131KB but we touch less.
After the traffic stops, `arena` is reduced to 134520832 bytes, and fordblks are 133016592, i.e. majority of reserved ranges are also free (available to reuse) in the malloc pools.
RSS goes down similarly to before to ~31MB.

So far, this PR only demonstrated the increased visibility to mmapped ranges reserved by glibc malloc.
The additional functional change is in `MEMORY DECOMMIT` that now trims malloc RSS usage from reserved but unused (fordblks) pages
by calling `malloc_trim`.
After the call, RSS is: `used_memory_rss_human:20.29MiB` which is almost the same as when we started the empty process.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>

* chore: fix build for older glibc environments

Disable these extensions for alpine and use legacy version
for older glibc libraries.
---------

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
This commit is contained in:
Roman Gershman 2024-03-06 15:11:44 +02:00 committed by GitHub
parent dfedaf7e6e
commit b38024ba4f
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
4 changed files with 57 additions and 9 deletions

View file

@ -726,7 +726,7 @@ void DbSlice::FlushSlotsFb(const SlotSet& slot_ids) {
}
} while (cursor && etl.gstate() != GlobalState::SHUTTING_DOWN);
mi_heap_collect(etl.data_heap(), true);
etl.DecommitMemory(ServerState::kDataHeap);
}
void DbSlice::FlushSlots(SlotSet slot_ids) {
@ -765,7 +765,8 @@ void DbSlice::FlushDbIndexes(const std::vector<DbIndex>& indexes) {
}
}
flush_db_arr.clear();
mi_heap_collect(ServerState::tlocal()->data_heap(), true);
ServerState::tlocal()->DecommitMemory(ServerState::kDataHeap | ServerState::kBackingHeap |
ServerState::kGlibcmalloc);
};
fb2::Fiber("flush_dbs", std::move(cb)).Detach();

View file

@ -5,6 +5,7 @@
#include "server/memory_cmd.h"
#include <absl/strings/str_cat.h>
#include <malloc.h>
#include <mimalloc.h>
#include "base/io_buf.h"
@ -48,8 +49,6 @@ std::string MallocStatsCb(bool backing, unsigned tid) {
string str;
uint64_t start = absl::GetCurrentTimeNanos();
absl::StrAppend(&str, "___ Begin mimalloc statistics ___\n");
mi_stats_print_out(MiStatsCallback, &str);
absl::StrAppend(&str, "\nArena statistics from thread:", tid, "\n");
absl::StrAppend(&str, "Count BlockSize Reserved Committed Used\n");
@ -69,10 +68,10 @@ std::string MallocStatsCb(bool backing, unsigned tid) {
}
uint64_t delta = (absl::GetCurrentTimeNanos() - start) / 1000;
absl::StrAppend(&str, "--- End mimalloc statistics, took ", delta, "us ---\n");
absl::StrAppend(&str, "total reserved: ", reserved, ", comitted: ", committed, ", used: ", used,
" fragmentation waste: ",
(100.0 * (committed - used)) / std::max<size_t>(1UL, committed), "%\n");
absl::StrAppend(&str, "--- End mimalloc statistics, took ", delta, "us ---\n");
return str;
}
@ -135,8 +134,8 @@ void MemoryCmd::Run(CmdArgList args) {
if (sub_cmd == "DECOMMIT") {
shard_set->pool()->Await([](auto* pb) {
mi_heap_collect(ServerState::tlocal()->data_heap(), true);
mi_heap_collect(mi_heap_get_backing(), true);
ServerState::tlocal()->DecommitMemory(ServerState::kDataHeap | ServerState::kBackingHeap |
ServerState::kGlibcmalloc);
});
return cntx_->SendSimpleString("OK");
}
@ -294,10 +293,34 @@ void MemoryCmd::MallocStats(CmdArgList args) {
return cntx_->SendError(absl::StrCat("Thread id must be less than ", shard_set->size()));
}
string res = shard_set->pool()->at(tid)->AwaitBrief([=] { return MallocStatsCb(backing, tid); });
string report;
#if __GLIBC__ // MUSL/alpine do not have mallinfo routines.
#if __GLIBC__ > 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ >= 33)
struct mallinfo2 malloc_info = mallinfo2();
#else
struct mallinfo malloc_info = mallinfo(); // buggy because 32-bit stats may overflow.
#endif
absl::StrAppend(&report, "___ Begin malloc stats ___\n");
absl::StrAppend(&report, "arena: ", malloc_info.arena, ", ordblks: ", malloc_info.ordblks,
", smblks: ", malloc_info.smblks, "\n");
absl::StrAppend(&report, "hblks: ", malloc_info.hblks, ", hblkhd: ", malloc_info.hblkhd,
", usmblks: ", malloc_info.usmblks, "\n");
absl::StrAppend(&report, "fsmblks: ", malloc_info.fsmblks, ", uordblks: ", malloc_info.uordblks,
", fordblks: ", malloc_info.fordblks, ", keepcost: ", malloc_info.keepcost, "\n");
absl::StrAppend(&report, "___ End malloc stats ___\n\n");
#endif
absl::StrAppend(&report, "___ Begin mimalloc stats ___\n");
mi_stats_print_out(MiStatsCallback, &report);
string mi_malloc_info =
shard_set->pool()->at(tid)->AwaitBrief([=] { return MallocStatsCb(backing, tid); });
report.append(std::move(mi_malloc_info));
auto* rb = static_cast<RedisReplyBuilder*>(cntx_->reply_builder());
return rb->SendVerbatimString(res);
return rb->SendVerbatimString(report);
}
void MemoryCmd::Usage(std::string_view key) {

View file

@ -152,6 +152,23 @@ bool ServerState::IsPaused() const {
return (client_pauses_[0] + client_pauses_[1]) > 0;
}
void ServerState::DecommitMemory(uint8_t flags) {
if (flags & kDataHeap) {
mi_heap_collect(data_heap(), true);
}
if (flags & kBackingHeap) {
mi_heap_collect(mi_heap_get_backing(), true);
}
if (flags & kGlibcmalloc) {
// trims the memory (reduces RSS usage) from the malloc allocator. Does not present in
// MUSL lib.
#ifdef __GLIBC__
malloc_trim(0);
#endif
}
}
Interpreter* ServerState::BorrowInterpreter() {
stats.blocked_on_interpreter++;
auto* ptr = interpreter_mgr_.Get();

View file

@ -267,6 +267,13 @@ class ServerState { // public struct - to allow initialization.
return slow_log_shard_;
};
// Tries to returns as much RSS memory as possible to the OS.
// Decommits 3 possible heaps according to the flags.
// For decommit_glibcmalloc the heap is global for the process, for others it's specific only
// for this thread.
enum { kDataHeap = 1, kBackingHeap = 2, kGlibcmalloc = 4 };
void DecommitMemory(uint8_t flags);
// Exec descriptor frequency count for this thread.
absl::flat_hash_map<std::string, unsigned> exec_freq_count;