Zion Boggan

In-depth vulnerability research, detection engineering & applied cryptography.

● Open to security-research & detection roles
GitHub · LinkedIn · Email
← Research notebook
DoS / OOM

Authenticated DoS: Dragonfly Server Crash via Crafted Stream RESTORE Payload

Summary

A vulnerability in Dragonfly’s RDB deserialization allows an authenticated user to crash the server process by sending a single RESTORE command with a crafted stream payload. The vulnerability is caused by unbounded memory allocation in the stream consumer group deserialization path (ReadStreams() in rdb_load.cc), where attacker-controlled length values from the serialized payload are used directly in std::vector::resize() without any upper bound validation.

Severity

P2, Server Availability (CVSS 7.5: AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H)

This is a denial-of-service vulnerability that crashes the entire Dragonfly server process, affecting all connected clients and all databases on the instance. On Aiven managed Dragonfly, every authenticated database user can trigger this.

Vulnerability Details

Root Cause

In src/server/rdb_load.cc, the ReadStreams() function deserializes stream data from RDB payloads (including RESTORE command input). Several length values are read from the attacker-controlled payload and used directly in memory allocation without upper bound checks:

Location 1, Consumer group count (most direct):

// rdb_load.cc, ReadStreams()
uint64_t cgroups_count;
SET_OR_UNEXPECT(LoadLen(nullptr), cgroups_count);
load_trace->stream_trace->cgroup.resize(cgroups_count); // NO UPPER BOUND

Location 2, PEL (Pending Entry List) size per group:

uint64_t pel_size;
SET_OR_UNEXPECT(LoadLen(nullptr), pel_size);
cgroup.pel_arr.resize(pel_size); // NO UPPER BOUND

Location 3, Consumer count per group:

uint64_t consumers_num;
SET_OR_UNEXPECT(LoadLen(nullptr), consumers_num);
cgroup.cons_arr.resize(consumers_num); // NO UPPER BOUND

Location 4, Per-consumer NACK array:

SET_OR_UNEXPECT(LoadLen(nullptr), pel_size);
consumer.nack_arr.resize(pel_size); // NO UPPER BOUND

Attack Path

  1. Attacker authenticates to the Dragonfly instance (standard database credentials)
  2. Attacker sends: RESTORE key 0 <crafted_payload> REPLACE
  3. The payload encodes RDB_TYPE_STREAM_LISTPACKS_3 (type 21) with:, 1 valid stream node (passes initial validation), Valid stream metadata, Consumer groups count set to 1,073,741,824 (1 billion)
  4. ReadStreams() calls cgroup.resize(1073741824)
  5. Each StreamCGTrace is ~100 bytes → attempts to allocate ~100GB
  6. std::vector::resize() throws std::bad_alloc
  7. Exception is uncaught in the RESTORE command path → std::terminate() → process crash

Crash Mechanism

The RESTORE command path (GenericFamily::Restore()OpRestore()RdbRestoreValue::Add()Parse()ReadObj()ReadStreams()) does not wrap the deserialization in a try-catch block. When std::bad_alloc is thrown by the vector allocation, it propagates through the entire call chain and terminates the process.

Proof of Concept

Environment

  • Dragonfly v1.37.2 (docker.dragonflydb.io/dragonflydb/dragonfly:latest)
  • Docker container on Linux host

Steps to Reproduce

  1. Start Dragonfly:
docker run -d --name dragonfly-test -p 6379:6379 \
 docker.dragonflydb.io/dragonflydb/dragonfly:latest
  1. Run the PoC script:
python3 poc_stream_oom.py --host 127.0.0.1 --port 6379
  1. Observe: the Dragonfly process crashes and is no longer reachable.

PoC Script Output

[*] Building crafted stream RESTORE payload...
[*] Consumer group count: 1,073,741,824 (0x40000000)
[*] Payload size: 87 bytes
[+] Connected to server: df-v1.37.2

[!] Sending crafted RESTORE command...
[!] Key: __poc_crash_key__
[!] Expected result: server process crash (std::bad_alloc → terminate)
[?] Unexpected error: TimeoutError: Timeout reading from socket

[*] Checking if server is still alive...
[+] Server is NOT responding - crash confirmed!

Impact Demonstration

In our testing, the crafted payload with cgroups_count = 4,294,967,296 (4 billion) not only crashed the Dragonfly process but triggered the Linux OOM killer, taking down the entire host system including SSH access. The host required several minutes to recover.

On Aiven’s managed infrastructure, this means a single authenticated database user could: 1. Crash their Dragonfly instance (immediate service disruption) 2. Potentially trigger OOM on shared infrastructure (affecting other tenants) 3. Repeat the attack on service restart for sustained DoS

Affected Code

  • File: src/server/rdb_load.cc
  • Function: RdbLoaderBase::ReadStreams()
  • Lines: Consumer group resize (~line 1690), PEL resize (~line 1710), consumer resize (~line 1720), NACK resize (~line 1740)
  • Version: Confirmed on v1.37.2, likely affects all versions with stream support

Suggested Fix

Add upper bound validation for all attacker-controlled length values before allocation:

// Before resize operations in ReadStreams:
constexpr uint64_t kMaxCGroups = 1 << 20; // 1M consumer groups
constexpr uint64_t kMaxPelSize = 1 << 24; // 16M PEL entries
constexpr uint64_t kMaxConsumers = 1 << 20; // 1M consumers

if (cgroups_count > kMaxCGroups) {
 LOG(ERROR) << "Stream consumer group count too large: " << cgroups_count;
 return Unexpected(errc::rdb_file_corrupted);
}
load_trace->stream_trace->cgroup.resize(cgroups_count);

Additionally, consider wrapping the RdbRestoreValue::Add() path in a try-catch for std::bad_alloc to prevent any remaining unbounded allocation from crashing the process.

References

  • Dragonfly source: https://github.com/dragonflydb/dragonfly
  • Similar class: CVE-2023-41056 (Redis RESTORE heap overflow), CVE-2023-41053 (Redis listpack integer overflow)
  • RDB format specification: https://rdb.fnordig.de/file_format.html

Proof of concept

python3 poc/dragonfly-stream-oom.py <host> <port> <password>

Source · github.com/zionsworking/security-research-notebook · writeups/aiven/dragonfly-stream-restore-oom.md