CTI Detection Automation

What it does

Five threat-intel feeds (ThreatFox, Feodo Tracker, URLhaus, AlienVault OTX, and OpenPhish) cover meaningfully different indicator classes: C2 IPs, banking trojan infrastructure, malware-download URLs, phishing, and (optionally) leaked credentials. The pipeline collects from all five, normalizes every indicator into one model, and deduplicates across sources so an IP seen in both ThreatFox and Feodo becomes a single indicator that carries both source labels, the higher confidence score, and the union of ATT&CK techniques. From there it generates Wazuh CDB lists and a matching XML ruleset, ATT&CK-tagged per indicator bucket.

The piece I cared most about getting right: nothing is written to the SIEM automatically. Every run produces a candidate bundle and emails the analyst a signed, time-limited review link. The analyst sees the diff against the last approved bundle, including adds, removals, and unchanged count, along with the ATT&CK coverage and the top malware families represented. They approve or reject. Rules only land in Wazuh when a human has looked at them.

CTI ingestion → analyst approval-gate walkthrough

src/cti/dedup.py: cross-feed merge logic
from cti.models import Indicator


def deduplicate(indicators: list[Indicator]) -> list[Indicator]:
    merged: dict[tuple[str, str], Indicator] = {}
    for indicator in indicators:
        key = indicator.key()
        existing = merged.get(key)
        if existing is None:
            merged[key] = Indicator(
                type=indicator.type,
                value=indicator.value,
                source=indicator.source,
                threat_type=indicator.threat_type,
                confidence=indicator.confidence,
                malware=indicator.malware,
                techniques=sorted(set(indicator.techniques)),
                tags=sorted(set(indicator.tags)),
                reference=indicator.reference,
                first_seen=indicator.first_seen,
            )
            continue
        existing.confidence = max(existing.confidence, indicator.confidence)
        existing.techniques = sorted(set(existing.techniques) | set(indicator.techniques))
        existing.tags = sorted(set(existing.tags) | set(indicator.tags))
        existing.malware = existing.malware or indicator.malware
        if indicator.source not in existing.source.split(","):
            existing.source = ",".join(
                sorted(set(existing.source.split(",") + [indicator.source]))
            )
    return list(merged.values())


def filter_by_confidence(indicators: list[Indicator], minimum: int) -> list[Indicator]:
    return [i for i in indicators if i.confidence >= minimum]

The approval gate

The review link is signed with itsdangerous and carries a TTL, so it can't be forged or replayed after it expires. The approval console (a small Flask app) shows the candidate diff, the indicator breakdown by type, the top malware families in the bundle, and the ATT&CK technique coverage. Approving writes the CDB lists and rule XML into the active directory and, if wazuh_etc_dir is configured, directly into the manager's /var/ossec/etc.

The reason for the gate: fully automated CTI rule promotion carries real operational risk. A feed anomaly, a confidence miscalibration, or a noisy source can push false-positive rules that start alerting on legitimate traffic. The pipeline handles the mechanical work; the analyst keeps the final call.

Architecture and testing

Each feed connector is split into fetch (network) and parse (pure function), which makes the test suite straightforward: the tests run the real parsing logic against bundled fixtures with no network dependency. Coverage includes cross-feed dedup, TTP extraction, rule generation well-formedness, token signing and expiry, and the full approve/reject path through the web app.

This pipeline feeds the watchlists used by the SOC automation lab. The cti-malicious-ip, cti-malicious-domain, and cti-malware-hash CDB lists consumed by the Wazuh rules there are generated here. Schedule it with the included systemd timer for hourly updates, or run it on demand with python -m cti.cli run.

The design principle: automation handles collection, normalization, deduplication, and rule generation. A human handles promotion. The signed approval token means the gate can't be bypassed by replaying an old link, and the diff view means the analyst isn't approving a black box. They see exactly what changed since the last bundle they approved.