The four gates
A cheap ruff lint runs first as fail-fast. The three security scans then fan out in parallel via needs: lint, the test job waits on all three, and the SOC notification runs last with if: always() so failures are reported too, not just green runs. The workflow holds only contents: read and security-events: write.
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install ruff==0.6.9
- run: ruff check .
sast:
runs-on: ubuntu-latest
needs: lint
steps:
- uses: actions/checkout@v4
- uses: returntocorp/semgrep-action@v1
with:
config: >-
p/default
p/python
p/flask
.semgrep/rules.yml
generateSarif: "1"| Job | Tool | What it stops |
|---|---|---|
lint | ruff | Style plus the S security rule set |
sast | Semgrep | OWASP/Flask packs plus four custom rules |
secrets | gitleaks | Committed credentials, full history on PRs |
dependencies | pip-audit | Pinned packages with known advisories |
test | pytest | Regressions, with coverage reported |
Custom Semgrep rules
The upstream packs (p/default, p/python, p/flask) catch the common cases; .semgrep/rules.yml adds four rules for patterns that kept slipping through. Three are ERROR severity and block the merge; the 0.0.0.0 bind is a WARNING nudge to confirm intent.
rules:
- id: flask-debug-true
languages: [python]
severity: ERROR
message: Running Flask with debug=True exposes the Werkzeug debugger and allows remote code execution.
patterns:
- pattern: $APP.run(..., debug=True, ...)
- id: subprocess-shell-true
languages: [python]
severity: ERROR
message: subprocess call with shell=True and a non-literal argument is a command injection risk.
patterns:
- pattern: subprocess.$FN(..., shell=True, ...)
- pattern-not: subprocess.$FN("...", shell=True, ...)The subprocess rule uses a pattern-not to exempt fully-literal command strings, so it only fires when an attacker-controllable argument reaches the shell. The fourth rule, jwt-decode-without-verification, matches both verify=False and an options dict that disables verify_signature, the two ways a forged token gets accepted.
Secret + dependency scanning
The secrets job checks out with fetch-depth: 0 so gitleaks scans the full history on a pull request, not just the tip commit, and runs against a project config that extends the defaults with a generic API-key rule plus an allowlist for test fixtures and documented placeholders:
[[rules]]
id = "generic-api-key"
description = "Generic API key assignment"
regex = '''(?i)(api[_-]?key|secret|token)["'\s:=]{1,4}[a-z0-9]{24,}'''
keywords = ["api_key", "apikey", "secret", "token"]The dependency gate runs pip-audit -r requirements.txt --strict --desc against the pinned manifest. --strict fails the build on any package carrying a known advisory and --desc prints the advisory text into the log, so the diff between a passing and failing run is a single pinned version.
SARIF and the Security tab
Semgrep is invoked with generateSarif: "1" and the SARIF file is uploaded through github/codeql-action/upload-sarif@v3 with if: always(), so findings surface under the repository's Security tab and as inline pull-request annotations rather than living only in the job log. Uploading on always() means a failing SAST run still publishes its findings instead of swallowing them when the step exits non-zero.
SOC notifier
The final notify-soc job posts the run outcome, repository, commit, actor, status, and a link back to the run, to a Shuffle webhook, passing PIPELINE_STATUS: ${{ needs.test.result }} so the payload reflects whether the gates actually passed. The notifier in scripts/notify_soc.py uses only the Python standard library, validates the webhook is an http(s) URL, and degrades gracefully: if SHUFFLE_WEBHOOK_URL is unset the job no-ops instead of failing the build.
def main():
hook = os.environ.get("SHUFFLE_WEBHOOK_URL")
if not hook:
print("SHUFFLE_WEBHOOK_URL not set, skipping SOC notification")
return 0
if not hook.lower().startswith(("https://", "http://")):
print("SHUFFLE_WEBHOOK_URL must be an http(s) URL", file=sys.stderr)
return 1
event = build_event()
body = json.dumps(event).encode("utf-8")
req = request.Request(
hook, data=body, headers={"Content-Type": "application/json"}, method="POST"
)A network failure reaching the webhook returns 0 on purpose, the SOC being unreachable should not flip an otherwise-green build red. In the homelab this webhook feeds a SOC automation lab, so a failed security gate opens a TheHive case the same way a Wazuh alert does.
The sample app + tests
The target is a minimal Flask task API, health, list, create, fetch, and delete endpoints backed by a lock-guarded in-memory store. Five pytest tests cover the happy path plus the edge cases that matter for an API: missing required fields return 400, unknown task IDs return 404, and deletes return 204. The store is reset between tests so each runs against a clean fixture.
def test_delete(client):
created = client.post("/tasks", json={"title": "temp"}).get_json()["task"]
assert client.delete(f"/tasks/{created['id']}").status_code == 204
assert client.get(f"/tasks/{created['id']}").status_code == 404
def test_missing_task(client):
assert client.get("/tasks/999").status_code == 404The app itself binds to 127.0.0.1 and never sets debug=True, so it passes its own custom Semgrep rules, the rules are written against the mistakes the app deliberately avoids.
What fails the build
Each gate fails the run for a concrete, reproducible reason, and because every gate is its own job the red check names the cause directly:
lint, ruff finds a style violation or anSsecurity-rule hitsast, anyERROR-severity Semgrep finding, custom or upstream (debug=True,shell=Trueon non-literal input,jwt.decodewithout verification)secrets, gitleaks matches a credential anywhere in PR history that is not allowlisteddependencies,pip-audit --stricthits a pinned package with a known advisorytest, any of the five pytest tests regresses
Running make all executes the same chain locally, lint sast secrets deps test, given semgrep and gitleaks on the PATH and the rest pip-installed by make install, so a developer sees the same failure before pushing that the pipeline would surface after.

