Summary
A vulnerability in Dragonfly’s RDB deserialization allows an authenticated user to crash the server process by sending a single RESTORE command with a crafted stream payload. The vulnerability is caused by unbounded memory allocation in the stream consumer group deserialization path (ReadStreams() in rdb_load.cc), where attacker-controlled length values from the serialized payload are used directly in std::vector::resize() without any upper bound validation.
Severity
P2, Server Availability (CVSS 7.5: AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H)
This is a denial-of-service vulnerability that crashes the entire Dragonfly server process, affecting all connected clients and all databases on the instance. On Aiven managed Dragonfly, every authenticated database user can trigger this.
Vulnerability Details
Root Cause
In src/server/rdb_load.cc, the ReadStreams() function deserializes stream data from RDB payloads (including RESTORE command input). Several length values are read from the attacker-controlled payload and used directly in memory allocation without upper bound checks:
Location 1, Consumer group count (most direct):
// rdb_load.cc, ReadStreams()
uint64_t cgroups_count;
SET_OR_UNEXPECT(LoadLen(nullptr), cgroups_count);
load_trace->stream_trace->cgroup.resize(cgroups_count); // NO UPPER BOUND
Location 2, PEL (Pending Entry List) size per group:
uint64_t pel_size;
SET_OR_UNEXPECT(LoadLen(nullptr), pel_size);
cgroup.pel_arr.resize(pel_size); // NO UPPER BOUND
Location 3, Consumer count per group:
uint64_t consumers_num;
SET_OR_UNEXPECT(LoadLen(nullptr), consumers_num);
cgroup.cons_arr.resize(consumers_num); // NO UPPER BOUND
Location 4, Per-consumer NACK array:
SET_OR_UNEXPECT(LoadLen(nullptr), pel_size);
consumer.nack_arr.resize(pel_size); // NO UPPER BOUND
Attack Path
- Attacker authenticates to the Dragonfly instance (standard database credentials)
- Attacker sends:
RESTORE key 0 <crafted_payload> REPLACE - The payload encodes
RDB_TYPE_STREAM_LISTPACKS_3(type 21) with:, 1 valid stream node (passes initial validation), Valid stream metadata, Consumer groups count set to 1,073,741,824 (1 billion) ReadStreams()callscgroup.resize(1073741824)- Each
StreamCGTraceis ~100 bytes → attempts to allocate ~100GB std::vector::resize()throwsstd::bad_alloc- Exception is uncaught in the RESTORE command path →
std::terminate()→ process crash
Crash Mechanism
The RESTORE command path (GenericFamily::Restore() → OpRestore() → RdbRestoreValue::Add() → Parse() → ReadObj() → ReadStreams()) does not wrap the deserialization in a try-catch block. When std::bad_alloc is thrown by the vector allocation, it propagates through the entire call chain and terminates the process.
Proof of Concept
Environment
- Dragonfly v1.37.2 (
docker.dragonflydb.io/dragonflydb/dragonfly:latest) - Docker container on Linux host
Steps to Reproduce
- Start Dragonfly:
docker run -d --name dragonfly-test -p 6379:6379 \
docker.dragonflydb.io/dragonflydb/dragonfly:latest
- Run the PoC script:
python3 poc_stream_oom.py --host 127.0.0.1 --port 6379
- Observe: the Dragonfly process crashes and is no longer reachable.
PoC Script Output
[*] Building crafted stream RESTORE payload...
[*] Consumer group count: 1,073,741,824 (0x40000000)
[*] Payload size: 87 bytes
[+] Connected to server: df-v1.37.2
[!] Sending crafted RESTORE command...
[!] Key: __poc_crash_key__
[!] Expected result: server process crash (std::bad_alloc → terminate)
[?] Unexpected error: TimeoutError: Timeout reading from socket
[*] Checking if server is still alive...
[+] Server is NOT responding - crash confirmed!
Impact Demonstration
In our testing, the crafted payload with cgroups_count = 4,294,967,296 (4 billion) not only crashed the Dragonfly process but triggered the Linux OOM killer, taking down the entire host system including SSH access. The host required several minutes to recover.
On Aiven’s managed infrastructure, this means a single authenticated database user could: 1. Crash their Dragonfly instance (immediate service disruption) 2. Potentially trigger OOM on shared infrastructure (affecting other tenants) 3. Repeat the attack on service restart for sustained DoS
Affected Code
- File:
src/server/rdb_load.cc - Function:
RdbLoaderBase::ReadStreams() - Lines: Consumer group resize (~line 1690), PEL resize (~line 1710), consumer resize (~line 1720), NACK resize (~line 1740)
- Version: Confirmed on v1.37.2, likely affects all versions with stream support
Suggested Fix
Add upper bound validation for all attacker-controlled length values before allocation:
// Before resize operations in ReadStreams:
constexpr uint64_t kMaxCGroups = 1 << 20; // 1M consumer groups
constexpr uint64_t kMaxPelSize = 1 << 24; // 16M PEL entries
constexpr uint64_t kMaxConsumers = 1 << 20; // 1M consumers
if (cgroups_count > kMaxCGroups) {
LOG(ERROR) << "Stream consumer group count too large: " << cgroups_count;
return Unexpected(errc::rdb_file_corrupted);
}
load_trace->stream_trace->cgroup.resize(cgroups_count);
Additionally, consider wrapping the RdbRestoreValue::Add() path in a try-catch for std::bad_alloc to prevent any remaining unbounded allocation from crashing the process.
References
- Dragonfly source: https://github.com/dragonflydb/dragonfly
- Similar class: CVE-2023-41056 (Redis RESTORE heap overflow), CVE-2023-41053 (Redis listpack integer overflow)
- RDB format specification: https://rdb.fnordig.de/file_format.html
Proof of concept
poc/dragonfly-stream-oom.py, Stream RESTORE OOM crash.poc/dragonfly-cms-oom.py, Count-Min-Sketch RESTORE OOM (related family).
python3 poc/dragonfly-stream-oom.py <host> <port> <password>
Source · github.com/zionsworking/security-research-notebook · writeups/aiven/dragonfly-stream-restore-oom.md