← THE INDEX  ·  WRITEUP

Authenticated DoS: Dragonfly Server Crash via Crafted Stream RESTORE Payload

A single RESTORE command with a crafted stream payload causes unbounded vector allocation in Dragonfly's RDB deserialization path, throwing std::bad_alloc and terminating the entire server process.

Summary

A vulnerability in Dragonfly's RDB deserialization path allows any authenticated user to crash the entire server process with a single RESTORE command. The ReadStreams() function in rdb_load.cc reads attacker-controlled length values from the serialized payload and passes them directly to std::vector::resize() with no upper bound validation. A payload encoding a consumer group count of 1 billion causes a resize of approximately 100 GB, std::bad_alloc is thrown, and because the RESTORE command path has no try-catch, the exception propagates to std::terminate(), killing the process. All connected clients are disconnected and all databases on the instance are unavailable until the service restarts.

Impact

Any authenticated database user can crash the Dragonfly instance in a single command. On Aiven's managed infrastructure, the crash affects all clients on the service. The attack can be repeated on each restart for sustained denial of service. With a sufficiently large count value (4 billion), the allocation attempt can trigger the Linux OOM killer, affecting the underlying host rather than just the Dragonfly process.

Four separate resize() calls in ReadStreams() are affected: consumer group count, PEL (Pending Entry List) size per group, consumer count per group, and per-consumer NACK array size. Any of these is sufficient to crash the server; the consumer group count is the most direct.

Root cause

In src/server/rdb_load.cc, RdbLoaderBase::ReadStreams() reads length values from the attacker-supplied payload and immediately resizes vectors without bounds checking:

// Consumer group count
uint64_t cgroups_count;
SET_OR_UNEXPECT(LoadLen(nullptr), cgroups_count);
load_trace->stream_trace->cgroup.resize(cgroups_count);  // no upper bound

// PEL size per group
uint64_t pel_size;
SET_OR_UNEXPECT(LoadLen(nullptr), pel_size);
cgroup.pel_arr.resize(pel_size);  // no upper bound

The RESTORE command path (GenericFamily::Restore() to OpRestore() to RdbRestoreValue::Add() to Parse() to ReadObj() to ReadStreams()) does not wrap deserialization in a try-catch block, so std::bad_alloc propagates to std::terminate(). Confirmed on Dragonfly v1.37.2.

Proof of concept

The script below builds the minimal stream RESTORE payload and sends it. Run locally first with a Docker container before any authorized testing. All host and credential details have been replaced with placeholders.

PoC script

The payload encodes RDB_TYPE_STREAM_LISTPACKS_3 (type 21) with one valid stream node, valid stream metadata, and a consumer group count of 0x40000000 (1,073,741,824). Each StreamCGTrace is roughly 100 bytes, so the resize attempts to allocate approximately 100 GB.

Disclosure and fix

Reported to Aiven through their bug bounty program. Aiven triaged this as P2 (High). Reported upstream to the Dragonfly project as well. Recommended fix in src/server/rdb_load.cc:

constexpr uint64_t kMaxCGroups   = 1 << 20;  // 1M consumer groups
constexpr uint64_t kMaxPelSize   = 1 << 24;  // 16M PEL entries
constexpr uint64_t kMaxConsumers = 1 << 20;  // 1M consumers

if (cgroups_count > kMaxCGroups)
    return Unexpected(errc::rdb_file_corrupted);
load_trace->stream_trace->cgroup.resize(cgroups_count);

Similar guards should be added for all four resize sites. Additionally, wrapping RdbRestoreValue::Add() in a try-catch for std::bad_alloc would prevent any remaining unbounded allocation from terminating the process.