Files
ai-workflow-course/modules/15-security-scanning
claude 7e8046a57f fix(voice/consistency): vary stock formulas, vendor-balance orient.py, unify the loop
- Vary 11 instances of two boilerplate openers ("A generic X course…" /
  "Strip away X…") across 10 modules so they read as distinct, concrete prose;
  kept the few deliberate, voice-distinct uses; locked exemplars untouched.
- M23 orient.py: detect a vendor-balanced set of AI-instruction filenames
  (AGENTS.md/CLAUDE.md/GEMINI.md/.cursorrules/.cursor/rules/copilot-instructions)
  instead of singling out one vendor. Still runs.
- Render the collaboration loop consistently as seven stations
  (issue->branch->implementation->PR->review->merge->closed) in M25/M26 to match
  M11 and the syllabus.

Closes #48
Closes #49
Closes #51

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TfzV5QvtPDz8LJS3Pu5VLT
2026-06-22 17:45:20 -04:00
..

Module 15 — Security Scanning for AI-Generated Code

Your build is green, your tests pass, and the AI just imported a package that doesn't exist — or one an attacker registered last week using exactly the name LLMs like to invent. CI proves the code runs; it says nothing about whether it's safe. This module adds the gates that catch what a build check structurally can't.


Prerequisites

  • Module 14 — Continuous Integration. You have a pipeline that runs lint, build, and tests on every push. Security scanning is more gates on that same pipeline, so you need somewhere to bolt them on.
  • Module 2 — Version Control as a Safety Net. Scanners flag findings in a diff; you'll commit, re-scan, and confirm a gate goes red then green. Secret scanning in particular cares about history, not just the working tree — that only makes sense once you think in commits.
  • Module 1 — the tasks-app. The running example. We'll let the AI bolt a "cloud sync" feature onto it and watch it introduce all three failure modes at once.

Helpful but not required: Module 8 (remotes/hosting) — host-native scanning (Dependabot-style alerts, push protection) lives on the remote; Module 10 (reviewing code you didn't write) — scanners are the automated half of that review. Secrets get a full treatment of their own in Module 17; this module's job is to catch them, not to manage them.


Learning objectives

By the end of this module you can:

  1. Name the three classes of risk AI introduces that a build-and-test pipeline will happily pass: vulnerable dependencies, hardcoded secrets, and hallucinated/typosquatted packages.
  2. Explain slopsquatting and why AI-suggested dependencies are a live supply-chain attack vector, not a hypothetical one.
  3. Run the three automated gates locally — SCA (dependency scanning), secret scanning, and SAST (static analysis) — and read their output for real signal vs. noise.
  4. Wire those gates into the Module 14 pipeline so a planted secret or a fake dependency turns the build red before it merges.
  5. Reason about each gate's limits — false positives, the secret that's already leaked, and what "no findings" does and doesn't prove.

Key concepts

Why CI passing is not the same as safe

Module 14's pipeline answers one question: does this code build, lint clean, and pass its tests? That's a question about behavior the tests exercise. None of the following change the answer:

  • A dependency three levels down has a known remote-code-execution CVE. The code still imports it, still runs, tests still pass. Green.
  • An API key is hardcoded in a source file. It's a perfectly valid string literal. Lint is happy, tests are happy. Green.
  • The AI used a SQL query built by string concatenation. The happy-path test passes a normal title; the injection case is never exercised. Green.

CI is a functional gate. Security scanning is a non-functional gate that asks a different question — is this code safe to ship? — and it asks it the only way that scales: automatically, on every push, with no human remembering to look. You are adding three checkers that each know a class of problem your tests structurally cannot see.

The reframe for this audience: you already gate merges on "tests pass." You're now adding "no known vulns, no secrets, no obvious injection" to the same gate. It's the same instinct — don't let bad things through automatically — pointed at a different failure mode.

The three gates

Gate Catches Category of tool
SCA (Software Composition Analysis) Known-vulnerable, abandoned, or non-existent dependencies Dependency/vulnerability scanners
Secret scanning Credentials committed into source or git history Entropy + pattern matchers over files and commits
SAST (Static Application Security Testing) Insecure code you wrote — injection, weak crypto, unsafe deserialization Static analyzers / linters with a security ruleset

SCA and SAST split the world cleanly: SCA scans the code you didn't write (your dependencies); SAST scans the code you did. Secret scanning cuts across both — a leaked key is neither a dependency nor a logic bug, it's a string that should never have been committed.

Gate 1 — SCA: scanning the code you didn't write

Modern software is mostly other people's code. A ten-line script can pull in a hundred transitive dependencies, any of which can have a published vulnerability. SCA tools resolve your full dependency tree and check every package and version against a vulnerability database (CVE feeds, the OSV database, language-ecosystem advisory databases). Output is a list of "package X version Y has advisory Z, fixed in version W."

This is well-trodden DevOps. What's new with AI is the failure mode at the bottom of the table: the dependency that doesn't exist at all.

Slopsquatting: the AI supply-chain attack

LLMs generate plausible text, and a package name is plausible text. Ask for code that talks to a service and the model will confidently import or list a dependency that sounds exactly right — requests-oauth, python-jsonlogger2, task-store-client — but was never published. This isn't rare; studies of AI-generated code find a meaningful fraction of suggested packages are hallucinations, and crucially, the model hallucinates the same plausible names repeatedly.

Attackers noticed. The attack — nicknamed slopsquatting (typosquatting, but aimed at LLM "slop" rather than human typos) — is:

  1. Watch what package names LLMs commonly invent.
  2. Register those exact names on the public package index, with malware inside.
  3. Wait. The next developer who pastes AI output and runs pip install -r requirements.txt (or npm install) pulls your payload — which now runs with that developer's privileges, in their dev environment or, worse, in CI.

The defense has two layers, and SCA is where they live:

  • The package doesn't exist (yet). The install or the resolver fails outright — "no matching distribution." Annoying, but safe: a name that 404s can't hurt you. The danger is treating that as a mere typo and "fixing" it by finding the closest real name without checking it.
  • The package exists but you didn't vet it. This is the live wire. SCA flags newly-published, low-download, or known-malicious packages; combined with the discipline of never installing a dependency the AI suggested without confirming it's the real, intended project, it closes the gap.

The habit to build: a dependency the AI added is an untrusted claim until you verify the package is real, is the one you meant, and is widely used. Treat the requirements file the AI hands you the same way you'd treat a stranger handing you a USB stick.

Gate 2 — Secret scanning

AI loves to hardcode credentials. Ask for code that calls an authenticated API and a model will cheerfully write API_KEY = "sk-live-..." straight into the source, because that makes the example work — and "make it work" is what it optimizes for. It has no instinct that the key is sensitive.

Secret scanners catch this by scanning files (and crucially, git history) for two signals:

  • Known patterns — provider key formats (cloud access keys, tokens with recognizable prefixes, private-key PEM headers, connection strings).
  • High entropy — random-looking strings that statistically resemble a generated credential even when they match no known pattern.

The non-obvious part for this audience: a secret committed once is leaked forever. Deleting it in a later commit doesn't help — it's still sitting in history, and anyone with the repo can git log -p their way to it. So secret scanning runs over history, not just the current files, and a true hit means two jobs, not one: (1) get it out of the code, and (2) rotate the credential, because you must assume it's compromised. Scrubbing history is harder than it looks and is a recovery-grade operation (Module 12 territory). The cheap win is catching it before it's ever pushed — which is exactly why this gate belongs in the pipeline and, ideally, in a pre-commit hook.

This module catches the secret. Managing secrets properly — env vars, secret stores, per-environment config so the AI never has a key to hardcode in the first place — is Module 17. Gate 2 is the tripwire that proves you need it.

Gate 3 — SAST: scanning the code you did write

SAST analyzes your source for insecure patterns without running it: SQL built by string concatenation, shell commands assembled from user input, weak or misused crypto, unsafe deserialization, paths built from untrusted input. It's a linter (Module 14) with a security ruleset — same machinery, different question.

Why it earns a place specifically for AI code: a model reproduces the patterns it was trained on, and the internet is full of insecure examples. It will write the string-concatenated SQL query because a million tutorials did. It looks idiomatic, it passes the happy-path test, and it's a vulnerability. SAST flags the shape of the bug regardless of whether any test happens to trigger it.

SAST is also the noisiest of the three. Expect false positives, expect to tune the ruleset, and expect to mark some findings "won't fix" with a reason. That's normal and it's why SAST is introduced after the two higher-signal gates — it's the most valuable to tune and the easiest to turn into ignored red noise if you don't.

Where the gates run

You want these in more than one place, cheapest-and-earliest first:

  • Local / pre-commit — fastest feedback, and the only place that stops a secret before it enters history. A pre-commit hook running secret scanning is the single highest-value placement.
  • CI (the Module 14 pipeline) — the enforcement gate. Local hooks can be skipped; the pipeline can't be, if you require it to pass before merge. This is where "the build goes red" has teeth.
  • Host-native, on the remote — most git hosts (Module 8) offer some of this for free: dependency alerts that watch your manifest against advisory feeds and open issues/PRs when a new CVE drops, and push protection that rejects a commit containing a recognized secret at the server. Turn these on; they cover the long tail (a CVE published after you merged) that a one-shot CI run never will.

The same scanner can run in all three. The lab uses one script you can run by hand and call from CI, so there's one source of truth for "what counts as a finding."


The AI angle

These three gates exist in any DevSecOps practice. What makes them load-bearing here is that AI-assisted coding doesn't just fail to prevent these problems — it actively manufactures all three, and does it in the exact form that slips past a human skim and a green build:

  • It invents dependencies. Hallucinated package names are a failure mode unique to generated code, and slopsquatting turns that failure into an externally-exploitable supply-chain attack. No human typing dependencies by hand produces this risk at the same rate.
  • It hardcodes secrets because hardcoding makes the example run, and running is what the model is rewarded for. The instinct that "this string is dangerous" is exactly the instinct it lacks.
  • It reproduces insecure idioms with total confidence, because plausible-looking code is the whole game, and insecure code is extremely plausible — it's all over the training data.

And the volume multiplies all of it. You're merging more code, faster, with less of it read line-by-line, precisely because the AI made generation cheap. The one defense that scales with that volume is the one that doesn't depend on a human remembering to look. That's these gates. You don't add them despite using AI — using AI is what moves them from "nice to have" to "required."


Hands-on lab

Lab language: shell, driving Python tooling, on the tasks-app from Module 1. You'll install two scanners (both pip-installable, cross-platform), let the AI introduce all three problems, catch them, and wire the catch into your pipeline.

Windows note: the scanner commands are identical everywhere. The wrapper script lab/security-scan.sh is bash — run it from Git Bash or WSL, or just run the three commands it contains directly in PowerShell. Nothing in the lab needs a specific shell beyond that.

You'll need:

  • The tasks-app folder under version control from Module 2, and your CI pipeline from Module 14.

  • Python 3.10+ and pip.

  • Two scanners installed into your environment:

    pip install pip-audit detect-secrets
    

    If pip install is refused with "externally-managed-environment" (PEP 668 — common on recent Debian/Ubuntu and Homebrew Python), install into a per-project virtual environment instead: python3 -m venv .venv && source .venv/bin/activate (Windows: .venv\Scripts\activate), then re-run the install. (pipx or pip install --break-system-packages also work; a venv is the clean default.)

    These are concrete, currently-maintained examples of the SCA and secret-scanning categories — not the only choices (see Where it breaks and Verify-before-publish). The lab teaches the moves; the moves transfer to any tool in the category.

  • Your AI assistant (browser or editor-integrated — by now you have Module 4 tooling; either is fine).

Part A — Let the AI introduce the problems

Copy this module's starter files into your project — they're a realistic snapshot of what an AI hands you when you ask the tasks-app to "sync tasks to a cloud service":

  • lab/config.py → a new module the AI "wrote," complete with a hardcoded API key.
  • lab/requirements.txt → the dependencies the AI "suggested," containing a vulnerable real package, a typosquatted name, and a hallucinated name that doesn't exist.

Open both and read them. They look completely normal — that's the point. Nothing here would fail a lint or a test.

If you'd rather generate them yourself, ask your AI: "Add a module to tasks-app that syncs tasks to a cloud API, and give me a requirements.txt for it." You'll very likely get a hardcoded key and at least one questionable dependency for free. Use the provided files if you want the lab to be reproducible.

Part B — Gate 1: SCA, and meeting a hallucinated package

Try to resolve the AI's dependencies:

pip-audit -r requirements.txt

It fails before it can audit anything — the resolver can't find one or more packages. That's slopsquatting's first tripwire. Read the error: it names the package it couldn't resolve. Ask yourself the dangerous question and answer it correctly: is this a typo I should "fix," or a name that should not exist? Do not silently swap in the nearest real name — that's exactly the reflex the attack relies on. Confirm against the real project's home page which dependency was actually intended.

Now edit requirements.txt: comment out the typosquatted and hallucinated lines (the ones flagged as unresolvable), leaving the real-but-vulnerable package. Re-run:

pip-audit -r requirements.txt

This time it resolves and reports a known vulnerability with an advisory ID and a fixed version. Bump the pin to the fixed version and run it once more until it's clean. You've now exercised both halves of SCA: the package that shouldn't exist, and the package that exists but shouldn't be at that version.

Part C — Gate 2: secret scanning

Scan for the hardcoded key:

detect-secrets scan config.py

The JSON output lists a detected secret with its file, line, and detector type. That's your tripwire firing on the AI's hardcoded key.

Now do it right: remove the literal from config.py and read the key from the environment instead (os.environ), then re-scan and confirm the finding is gone. And say the quiet part out loud — if that key had been real and ever pushed, removing it now is not enough; you'd have to rotate it, because it's in history. (Proper secret management is Module 17; this is just the catch.)

Stretch — Gate 3 (SAST): install a static analyzer for your language (for Python, pip install bandit, then bandit -r .) and watch it flag insecure code you wrote — here, the MD5-based request signing in config.py (weak crypto, CWE-327). Now note what it does not flag: the hardcoded SYNC_API_KEY. Bandit's hardcoded-credential checks (B105107) key on password-named identifiers — password, secret, token — so a key named SYNC_API_KEY slips right past them. Catching that string is a secret scanner's job (Gate 2), not SAST's. Same file, two distinct flaws, caught by two different gates with two different blind spots — which is exactly why you run all three rather than trusting one. And note how much noisier SAST is than the first two gates: that noise is why it's the one you tune.

Part D — Wire the gates into CI

A scan you have to remember to run is a scan you'll skip. Move it into the Module 14 pipeline so it runs on every push and blocks the merge.

  1. Copy lab/security-scan.sh into your project. It runs the SCA and secret-scan gates and exits non-zero on any finding — which is what makes CI go red. Make it executable (chmod +x security-scan.sh).

    Before you run it, stage the starter files so the secret gate can see them:

    git add config.py requirements.txt
    

    This is not a footnote. detect-secrets scan with no path argument scans the files Git tracks — an untracked config.py is invisible to it, so the gate would report "no secrets" on a file that's full of them (a silent false pass, the worst kind). Staging puts the file in front of the scanner. It's the same reason the explicit detect-secrets scan config.py in Part C worked, and the same reason "secrets live in history": the moment Git knows about a file, so does the gate.

    To watch the gate catch both planted problems at once, restore the original booby-trapped files first (you fixed them in Parts B and C) — re-copy config.py and requirements.txt from this module's starter, re-stage, then run:

    ./security-scan.sh
    

    It should fail on both gates — the SCA gate on the unresolvable/vulnerable dependencies and the secret gate on the hardcoded key — and you should be able to point at which finding caused each non-zero exit. Re-apply your Part B/C fixes (and re-stage), run it once more, and it should pass.

  2. Merge the security steps into your pipeline. lab/ci-security.yml shows the gate as a self-contained, provider-neutral job — check out, set up Python, install the scanners, run the script. But the check job you built in Module 14 already checks out the code and sets up Python, so you don't want a second job duplicating that work. You want its two new steps — install the scanners and run the gate — added to the steps you already have. (Checkout and Python are in the snippet only so it reads as a complete example; skip them when you merge.)

    Here is exactly where they go. Before — the tail of your Module 14 check job (GitHub Actions flavor, matching ci-starter.yml; on GitLab the same two steps drop into the job's script:):

    jobs:
      check:
        runs-on: ubuntu-latest
        steps:
          - name: Check out the code
            uses: actions/checkout@v7
          - name: Set up Python
            uses: actions/setup-python@v6
            with:
              python-version: "3.12"
          - name: Install tools
            run: pip install ruff
          - name: Lint
            run: ruff check .
          - name: Test
            run: python -m unittest
    

    After — the same job with the two security steps appended; nothing else changes:

           - name: Lint
             run: ruff check .
           - name: Test
             run: python -m unittest
    +      - name: Install scanners
    +        run: pip install pip-audit detect-secrets
    +      - name: Run the security gate
    +        run: |
    +          chmod +x security-scan.sh
    +          ./security-scan.sh
    

    YAML is indentation-sensitive — match the existing steps' indentation exactly. Each new - name: lines up in the same column as the steps above it, and the keys under it (run:) sit one level deeper. A step pasted even one space off will silently attach to the wrong block or fail to parse, and the whole workflow breaks. If you'd rather keep the gate as its own job (some teams prefer the isolation), copy ci-security.yml in whole as a second job under jobs: in the same workflow file instead — that is exactly why it carries its own checkout and Python steps. The shape — install tools, run the gate, fail on findings — is identical everywhere.

  3. Prove the gate has teeth: re-introduce the hardcoded key in config.py, commit, and push. Watch the pipeline go red on the security step even though lint, build, and tests are still green. Remove it, push again, watch it go green. That red-then-green is the whole module in one push.


Where it breaks

The honest limits — these gates are necessary, not sufficient:

  • A clean scan is not a safe codebase. Scanners find known vulns and recognizable patterns. A novel logic flaw, a business-logic auth bypass, or a brand-new zero-day in a dependency all pass clean. "No findings" means "none of the things these tools know about," not "secure." Human review (Module 10) and SAST tuning still matter.
  • The secret that already leaked. Catching a secret in CI is great; if it was pushed last month, the gate is closing the barn door. The credential must be assumed compromised and rotated, and scrubbing it from history is a separate, harder, recovery-grade job. Prevention (Module 17) beats detection here.
  • False positives are real and they erode trust. SAST especially will flag things that aren't exploitable in your context. If every push has noise, people start ignoring red — the worst outcome. Budget time to tune rulesets and triage findings, or the gate becomes decoration.
  • SCA depends on a manifest it can read. If dependencies aren't declared in a file the scanner understands (a pinned requirements/lock file, a package manifest), it can't see them. Vendored code, dynamically downloaded packages, and "just pip install whatever" workflows are blind spots.
  • A 404 today can be malware tomorrow. A hallucinated name that doesn't resolve now is safe now; nothing stops an attacker registering it next week. The durable defense isn't "the scan was clean," it's the habit of never adding an AI-suggested dependency without verifying it's the real, intended, widely-used project.
  • Scanners scan; they don't decide. A finding is information, not a verdict. Whether a given advisory actually affects you (is the vulnerable code path even reachable?) is a judgment call the tool can't make. The gate's job is to put the question in front of a human, not to answer it.

Check for understanding

You're done when:

  • You can state, without looking back, the three classes of risk AI introduces that a green build won't catch — and which gate catches each.
  • You can explain slopsquatting to a colleague in two sentences, including why registering a hallucinated name works as an attack.
  • Running ./security-scan.sh on the unmodified starter files fails, and on your fixed files passes — and you understand which finding each exit reflects.
  • You've pushed a commit with a planted secret and watched your CI pipeline go red on the security step while lint/build/test stayed green, then watched it go green after the fix.
  • You can say what a clean scan does and doesn't prove.

When a failing security gate feels like the pipeline doing its job — not an obstacle — you're ready for Module 16, where containers make the environment your code (and these scanners) run in reproducible.


Verify-before-publish

Expansion-zone module — these facts move fast. Re-check at build/publish time; don't ship the claims above from memory.

  • Pinned CI action versions. The ci-security.yml snippet (and the Part D before/after diff) pin actions/checkout and actions/setup-python to major versions (@v7/@v6 at build time). Pinned majors age — confirm they're current and not deprecated against the host's docs, the same check the Module 14 and Module 18 CI/CD checklists carry.
  • Scanner names and install methods. Confirm pip-audit, detect-secrets, and bandit are still maintained and still install as shown. If any has stalled, swap in a current equivalent from the same category and keep the prose category-first, not tool-first.
  • Category roster. Verify the named alternatives still exist and are reasonable to recommend: SCA (Trivy, Grype, OWASP Dependency-Check, Snyk, Safety, language-native npm audit etc.); secret scanning (gitleaks, trufflehog, git-secrets, detect-secrets); SAST (Semgrep, CodeQL, SonarQube, Bandit, language-native security linters). Add/remove as the landscape shifts.
  • Host-native features. The major hosts' free offerings (dependency alerts, automated fix PRs, secret push-protection) change names and availability. Confirm what's actually free vs. paid at publish time rather than naming a specific product tier.
  • Slopsquatting framing. Re-check the current research on AI package-hallucination rates and any newly-reported real-world slopsquatting incidents. Keep the figure qualitative ("a meaningful fraction") unless you can cite a current, specific source.
  • The planted vulnerable dependency in lab/requirements.txt. Confirm the pinned version still trips an advisory in the scanner (advisory databases get reorganized and old entries occasionally change shape). Re-pin to a currently-flagged version if needed so Part B actually fires.
  • The hallucinated/typosquatted names in lab/requirements.txt. Confirm they still do not resolve on the public index (someone may have since registered one — which would, ironically, make the slopsquatting point for you, but breaks the lab's "resolution fails" step). Swap for a currently-nonexistent plausible name if so.