ai-workflow-course/modules/15-security-scanning/README.md

# Module 15 — Security Scanning for AI-Generated Code

> **Your build is green, your tests pass, and the AI just imported a package that doesn't exist —
> or one an attacker registered last week using exactly the name LLMs like to invent.** CI proves
> the code *runs*; it says nothing about whether it's *safe*. This module adds the gates that catch
> what a build check structurally can't.

---

## Prerequisites

- **Module 14 — Continuous Integration.** You have a pipeline that runs lint, build, and tests on
  every push. Security scanning is *more gates on that same pipeline*, so you need somewhere to bolt
  them on.
- **Module 2 — Version Control as a Safety Net.** Scanners flag findings in a diff; you'll commit,
  re-scan, and confirm a gate goes red then green. Secret scanning in particular cares about *history*,
  not just the working tree — that only makes sense once you think in commits.
- **Module 1 — the `tasks-app`.** The running example. We'll let the AI bolt a "cloud sync" feature
  onto it and watch it introduce all three failure modes at once.

Helpful but not required: **Module 8 (remotes/hosting)** — host-native scanning (Dependabot-style
alerts, push protection) lives on the remote; **Module 10 (reviewing code you didn't write)** —
scanners are the automated half of that review. Secrets get a full treatment of their own in
**Module 17**; this module's job is to *catch* them, not to manage them.

---

## Learning objectives

By the end of this module you can:

1. Name the three classes of risk AI introduces that a build-and-test pipeline will happily pass:
   vulnerable dependencies, hardcoded secrets, and hallucinated/typosquatted packages.
2. Explain **slopsquatting** and why AI-suggested dependencies are a live supply-chain attack vector,
   not a hypothetical one.
3. Run the three automated gates locally — **SCA (dependency scanning)**, **secret scanning**, and
   **SAST (static analysis)** — and read their output for real signal vs. noise.
4. Wire those gates into the Module 14 pipeline so a planted secret or a fake dependency turns the
   build red *before* it merges.
5. Reason about each gate's limits — false positives, the secret that's already leaked, and what
   "no findings" does and doesn't prove.

---

## Key concepts

### Why CI passing is not the same as safe

Module 14's pipeline answers one question: *does this code build, lint clean, and pass its tests?*
That's a question about **behavior the tests exercise.** None of the following change the answer:

- A dependency three levels down has a known remote-code-execution CVE. The code still imports it,
  still runs, tests still pass. Green.
- An API key is hardcoded in a source file. It's a perfectly valid string literal. Lint is happy,
  tests are happy. Green.
- The AI used a SQL query built by string concatenation. The happy-path test passes a normal title;
  the injection case is never exercised. Green.

CI is a *functional* gate. Security scanning is a *non-functional* gate that asks a different
question — *is this code safe to ship?* — and it asks it the only way that scales: automatically, on
every push, with no human remembering to look. You are adding three checkers that each know a class
of problem your tests structurally cannot see.

The reframe for this audience: you already gate merges on "tests pass." You're now adding "no known
vulns, no secrets, no obvious injection" to the same gate. It's the same instinct — *don't let bad
things through automatically* — pointed at a different failure mode.

### The three gates

| Gate | Catches | Category of tool |
|------|---------|------------------|
| **SCA** (Software Composition Analysis) | Known-vulnerable, abandoned, or **non-existent** dependencies | Dependency/vulnerability scanners |
| **Secret scanning** | Credentials committed into source or git history | Entropy + pattern matchers over files and commits |
| **SAST** (Static Application Security Testing) | Insecure code *you wrote* — injection, weak crypto, unsafe deserialization | Static analyzers / linters with a security ruleset |

SCA and SAST split the world cleanly: **SCA scans the code you didn't write (your dependencies);
SAST scans the code you did.** Secret scanning cuts across both — a leaked key is neither a
dependency nor a logic bug, it's a string that should never have been committed.

### Gate 1 — SCA: scanning the code you didn't write

Modern software is mostly other people's code. A ten-line script can pull in a hundred transitive
dependencies, any of which can have a published vulnerability. SCA tools resolve your full dependency
tree and check every package and version against a vulnerability database (CVE feeds, the OSV
database, language-ecosystem advisory databases). Output is a list of "package X version Y has
advisory Z, fixed in version W."

This is well-trodden DevOps. What's *new* with AI is the failure mode at the bottom of the table:
the dependency that **doesn't exist at all.**

#### Slopsquatting: the AI supply-chain attack

LLMs generate plausible text, and a package name is plausible text. Ask for code that talks to a
service and the model will confidently `import` or list a dependency that *sounds* exactly right —
`requests-oauth`, `python-jsonlogger2`, `task-store-client` — but was never published. This isn't
rare; studies of AI-generated code find a meaningful fraction of suggested packages are
hallucinations, and crucially, **the model hallucinates the same plausible names repeatedly.**

Attackers noticed. The attack — nicknamed **slopsquatting** (typosquatting, but aimed at LLM "slop"
rather than human typos) — is:

1. Watch what package names LLMs commonly invent.
2. Register those exact names on the public package index, with malware inside.
3. Wait. The next developer who pastes AI output and runs `pip install -r requirements.txt`
   (or `npm install`) pulls your payload — which now runs with that developer's privileges, in their
   dev environment or, worse, in CI.

The defense has two layers, and SCA is where they live:

- **The package doesn't exist (yet).** The install or the resolver fails outright — "no matching
  distribution." Annoying, but *safe*: a name that 404s can't hurt you. The danger is treating that
  as a mere typo and "fixing" it by finding the closest real name without checking it.
- **The package exists but you didn't vet it.** This is the live wire. SCA flags newly-published,
  low-download, or known-malicious packages; combined with the discipline of *never installing a
  dependency the AI suggested without confirming it's the real, intended project*, it closes the gap.

The habit to build: **a dependency the AI added is an untrusted claim until you verify the package is
real, is the one you meant, and is widely used.** Treat the requirements file the AI hands you the
same way you'd treat a stranger handing you a USB stick.

### Gate 2 — Secret scanning

AI loves to hardcode credentials. Ask for code that calls an authenticated API and a model will
cheerfully write `API_KEY = "sk-live-..."` straight into the source, because that makes the example
*work* — and "make it work" is what it optimizes for. It has no instinct that the key is sensitive.

Secret scanners catch this by scanning files (and crucially, **git history**) for two signals:

- **Known patterns** — provider key formats (cloud access keys, tokens with recognizable prefixes,
  private-key PEM headers, connection strings).
- **High entropy** — random-looking strings that statistically resemble a generated credential even
  when they match no known pattern.

The non-obvious part for this audience: **a secret committed once is leaked forever.** Deleting it in
a later commit doesn't help — it's still sitting in history, and anyone with the repo can
`git log -p` their way to it. So secret scanning runs over *history*, not just the current files, and
a true hit means two jobs, not one: (1) get it out of the code, and (2) **rotate the credential**,
because you must assume it's compromised. Scrubbing history is harder than it looks and is a
recovery-grade operation (Module 12 territory). The cheap win is catching it *before* it's ever
pushed — which is exactly why this gate belongs in the pipeline and, ideally, in a pre-commit hook.

This module catches the secret. *Managing* secrets properly — env vars, secret stores, per-environment
config so the AI never has a key to hardcode in the first place — is **Module 17**. Gate 2 is the
tripwire that proves you need it.

### Gate 3 — SAST: scanning the code you did write

SAST analyzes *your* source for insecure patterns without running it: SQL built by string
concatenation, shell commands assembled from user input, weak or misused crypto, unsafe
deserialization, paths built from untrusted input. It's a linter (Module 14) with a security
ruleset — same machinery, different question.

Why it earns a place specifically for AI code: a model reproduces the patterns it was trained on, and
the internet is full of insecure examples. It will write the string-concatenated SQL query because a
million tutorials did. It looks idiomatic, it passes the happy-path test, and it's a vulnerability.
SAST flags the *shape* of the bug regardless of whether any test happens to trigger it.

SAST is also the noisiest of the three. Expect false positives, expect to tune the ruleset, and
expect to mark some findings "won't fix" with a reason. That's normal and it's why SAST is introduced
*after* the two higher-signal gates — it's the most valuable to tune and the easiest to turn into
ignored red noise if you don't.

### Where the gates run

You want these in more than one place, cheapest-and-earliest first:

- **Local / pre-commit** — fastest feedback, and the only place that stops a secret *before* it
  enters history. A pre-commit hook running secret scanning is the single highest-value placement.
- **CI (the Module 14 pipeline)** — the enforcement gate. Local hooks can be skipped; the pipeline
  can't be, if you require it to pass before merge. This is where "the build goes red" has teeth.
- **Host-native, on the remote** — most git hosts (Module 8) offer some of this for free:
  dependency alerts that watch your manifest against advisory feeds and open issues/PRs when a new
  CVE drops, and push protection that rejects a commit containing a recognized secret at the server.
  Turn these on; they cover the long tail (a CVE published *after* you merged) that a one-shot CI run
  never will.

The same scanner can run in all three. The lab uses one script you can run by hand *and* call from
CI, so there's one source of truth for "what counts as a finding."

---

## The AI angle

A generic DevSecOps course teaches these three gates too. What makes them *load-bearing* here is that
AI-assisted coding doesn't just fail to prevent these problems — it actively manufactures all three,
and does it in the exact form that slips past a human skim and a green build:

- **It invents dependencies.** Hallucinated package names are a failure mode unique to generated
  code, and slopsquatting turns that failure into an externally-exploitable supply-chain attack. No
  human typing dependencies by hand produces this risk at the same rate.
- **It hardcodes secrets** because hardcoding makes the example run, and running is what the model is
  rewarded for. The instinct that "this string is dangerous" is exactly the instinct it lacks.
- **It reproduces insecure idioms** with total confidence, because plausible-looking code is the
  whole game, and insecure code is extremely plausible — it's all over the training data.

And the volume multiplies all of it. You're merging more code, faster, with less of it read
line-by-line, precisely because the AI made generation cheap. The one defense that scales with that
volume is the one that doesn't depend on a human remembering to look. That's these gates. You don't
add them *despite* using AI — using AI is what moves them from "nice to have" to "required."

---

## Hands-on lab

**Lab language:** shell, driving Python tooling, on the `tasks-app` from Module 1. You'll install two
scanners (both pip-installable, cross-platform), let the AI introduce all three problems, catch them,
and wire the catch into your pipeline.

> **Windows note:** the scanner *commands* are identical everywhere. The wrapper script
> `lab/security-scan.sh` is bash — run it from Git Bash or WSL, or just run the three commands it
> contains directly in PowerShell. Nothing in the lab needs a specific shell beyond that.

**You'll need:**

- The `tasks-app` folder under version control from Module 2, and your CI pipeline from Module 14.
- Python 3.10+ and `pip`.
- Two scanners installed into your environment:

  ```bash
  pip install pip-audit detect-secrets
  ```

  > **If `pip install` is refused** with "externally-managed-environment" (PEP 668 — common on
  > recent Debian/Ubuntu and Homebrew Python), install into a per-project virtual environment
  > instead: `python3 -m venv .venv && source .venv/bin/activate` (Windows: `.venv\Scripts\activate`),
  > then re-run the install. (`pipx` or `pip install --break-system-packages` also work; a venv is the
  > clean default.)

  These are concrete, currently-maintained examples of the **SCA** and **secret-scanning**
  categories — not the only choices (see *Where it breaks* and *Verify-before-publish*). The lab
  teaches the moves; the moves transfer to any tool in the category.

- Your AI assistant (browser or editor-integrated — by now you have Module 4 tooling; either is fine).

### Part A — Let the AI introduce the problems

Copy this module's starter files into your project — they're a realistic snapshot of what an AI hands
you when you ask the `tasks-app` to "sync tasks to a cloud service":

- `lab/config.py` → a new module the AI "wrote," complete with a **hardcoded API key**.
- `lab/requirements.txt` → the dependencies the AI "suggested," containing a **vulnerable real
  package**, a **typosquatted** name, and a **hallucinated** name that doesn't exist.

Open both and read them. They look completely normal — that's the point. Nothing here would fail a
lint or a test.

If you'd rather generate them yourself, ask your AI: *"Add a module to tasks-app that syncs tasks to
a cloud API, and give me a requirements.txt for it."* You'll very likely get a hardcoded key and at
least one questionable dependency for free. Use the provided files if you want the lab to be
reproducible.

### Part B — Gate 1: SCA, and meeting a hallucinated package

Try to resolve the AI's dependencies:

```bash
pip-audit -r requirements.txt
```

It fails before it can audit anything — the resolver can't find one or more packages. **That's
slopsquatting's first tripwire.** Read the error: it names the package it couldn't resolve. Ask
yourself the dangerous question and answer it correctly: *is this a typo I should "fix," or a name
that should not exist?* Do **not** silently swap in the nearest real name — that's exactly the
reflex the attack relies on. Confirm against the real project's home page which dependency was
actually intended.

Now edit `requirements.txt`: comment out the typosquatted and hallucinated lines (the ones flagged as
unresolvable), leaving the real-but-vulnerable package. Re-run:

```bash
pip-audit -r requirements.txt
```

This time it resolves and reports a known vulnerability with an advisory ID and a fixed version. Bump
the pin to the fixed version and run it once more until it's clean. You've now exercised both halves
of SCA: the package that *shouldn't exist*, and the package that exists but *shouldn't be at that
version*.

### Part C — Gate 2: secret scanning

Scan for the hardcoded key:

```bash
detect-secrets scan config.py
```

The JSON output lists a detected secret with its file, line, and detector type. That's your tripwire
firing on the AI's hardcoded key.

Now do it right: remove the literal from `config.py` and read the key from the environment instead
(`os.environ`), then re-scan and confirm the finding is gone. And say the quiet part out loud — **if
that key had been real and ever pushed, removing it now is not enough; you'd have to rotate it,**
because it's in history. (Proper secret management is Module 17; this is just the catch.)

> **Stretch — Gate 3 (SAST):** install a static analyzer for your language (for Python,
> `pip install bandit`, then `bandit -r .`) and watch it flag insecure *code you wrote* — here, the
> MD5-based request signing in `config.py` (weak crypto, CWE-327). Now note what it does **not**
> flag: the hardcoded `SYNC_API_KEY`. Bandit's hardcoded-credential checks (B105–107) key on
> *password-named* identifiers — `password`, `secret`, `token` — so a key named `SYNC_API_KEY` slips
> right past them. Catching that string is a secret scanner's job (Gate 2), not SAST's. Same file,
> two distinct flaws, caught by two different gates with two different blind spots — which is exactly
> why you run all three rather than trusting one. And note how much noisier SAST is than the first
> two gates: that noise is why it's the one you tune.

### Part D — Wire the gates into CI

A scan you have to remember to run is a scan you'll skip. Move it into the Module 14 pipeline so it
runs on every push and blocks the merge.

1. Copy `lab/security-scan.sh` into your project. It runs the SCA and secret-scan gates and **exits
   non-zero on any finding** — which is what makes CI go red. Make it executable
   (`chmod +x security-scan.sh`).

   Before you run it, **stage the starter files** so the secret gate can see them:

   ```bash
   git add config.py requirements.txt
   ```

   This is not a footnote. `detect-secrets scan` with no path argument scans the files Git
   *tracks* — an *untracked* `config.py` is invisible to it, so the gate would report "no secrets"
   on a file that's full of them (a silent false pass, the worst kind). Staging puts the file in
   front of the scanner. It's the same reason the explicit `detect-secrets scan config.py` in
   Part C worked, and the same reason "secrets live in history": the moment Git knows about a file,
   so does the gate.

   To watch the gate catch both planted problems at once, restore the original booby-trapped files
   first (you fixed them in Parts B and C) — re-copy `config.py` and `requirements.txt` from this
   module's starter, re-stage, then run:

   ```bash
   ./security-scan.sh
   ```

   It should **fail on both gates** — the SCA gate on the unresolvable/vulnerable dependencies and
   the secret gate on the hardcoded key — and you should be able to point at which finding caused
   each non-zero exit. Re-apply your Part B/C fixes (and re-stage), run it once more, and it should
   pass.

2. Merge the security steps into your pipeline. `lab/ci-security.yml` shows the gate as a
   self-contained, provider-neutral job — check out, set up Python, install the scanners, run the
   script. But the `check` job you built in Module 14 *already* checks out the code and sets up
   Python, so you don't want a second job duplicating that work. You want its two **new** steps —
   **install the scanners** and **run the gate** — added to the steps you already have. (Checkout and
   Python are in the snippet only so it reads as a complete example; skip them when you merge.)

   Here is exactly where they go. **Before** — the tail of your Module 14 `check` job (GitHub Actions
   flavor, matching `ci-starter.yml`; on GitLab the same two steps drop into the job's `script:`):

   ```yaml
   jobs:
     check:
       runs-on: ubuntu-latest
       steps:
         - name: Check out the code
           uses: actions/checkout@v7
         - name: Set up Python
           uses: actions/setup-python@v6
           with:
             python-version: "3.12"
         - name: Install tools
           run: pip install ruff
         - name: Lint
           run: ruff check .
         - name: Test
           run: python -m unittest
   ```

   **After** — the same job with the two security steps appended; nothing else changes:

   ```diff
          - name: Lint
            run: ruff check .
          - name: Test
            run: python -m unittest
   +      - name: Install scanners
   +        run: pip install pip-audit detect-secrets
   +      - name: Run the security gate
   +        run: |
   +          chmod +x security-scan.sh
   +          ./security-scan.sh
   ```

   > **YAML is indentation-sensitive — match the existing steps' indentation exactly.** Each new
   > `- name:` lines up in the *same column* as the steps above it, and the keys under it (`run:`) sit
   > one level deeper. A step pasted even one space off will silently attach to the wrong block or
   > fail to parse, and the whole workflow breaks. If you'd rather keep the gate as its own job (some
   > teams prefer the isolation), copy `ci-security.yml` in whole as a second job under `jobs:` in the
   > same workflow file instead — that is exactly why it carries its own checkout and Python steps.
   > The *shape* — install tools, run the gate, fail on findings — is identical everywhere.

3. Prove the gate has teeth: re-introduce the hardcoded key in `config.py`, commit, and push. Watch
   the pipeline go **red** on the security step even though lint, build, and tests are still green.
   Remove it, push again, watch it go green. That red-then-green is the whole module in one push.

---

## Where it breaks

The honest limits — these gates are necessary, not sufficient:

- **A clean scan is not a safe codebase.** Scanners find *known* vulns and *recognizable* patterns. A
  novel logic flaw, a business-logic auth bypass, or a brand-new zero-day in a dependency all pass
  clean. "No findings" means "none of the things these tools know about," not "secure." Human review
  (Module 10) and SAST tuning still matter.
- **The secret that already leaked.** Catching a secret in CI is great; if it was pushed last month,
  the gate is closing the barn door. The credential must be assumed compromised and **rotated**, and
  scrubbing it from history is a separate, harder, recovery-grade job. Prevention (Module 17) beats
  detection here.
- **False positives are real and they erode trust.** SAST especially will flag things that aren't
  exploitable in your context. If every push has noise, people start ignoring red — the worst
  outcome. Budget time to tune rulesets and triage findings, or the gate becomes decoration.
- **SCA depends on a manifest it can read.** If dependencies aren't declared in a file the scanner
  understands (a pinned requirements/lock file, a package manifest), it can't see them. Vendored code,
  dynamically downloaded packages, and "just `pip install` whatever" workflows are blind spots.
- **A 404 today can be malware tomorrow.** A hallucinated name that doesn't resolve now is safe *now*;
  nothing stops an attacker registering it next week. The durable defense isn't "the scan was clean,"
  it's the *habit* of never adding an AI-suggested dependency without verifying it's the real,
  intended, widely-used project.
- **Scanners scan; they don't decide.** A finding is information, not a verdict. Whether a given
  advisory actually affects you (is the vulnerable code path even reachable?) is a judgment call the
  tool can't make. The gate's job is to put the question in front of a human, not to answer it.

---

## Check for understanding

**You're done when:**

- You can state, without looking back, the three classes of risk AI introduces that a green build
  won't catch — and which gate catches each.
- You can explain slopsquatting to a colleague in two sentences, including *why* registering a
  hallucinated name works as an attack.
- Running `./security-scan.sh` on the unmodified starter files **fails**, and on your fixed files
  **passes** — and you understand which finding each exit reflects.
- You've pushed a commit with a planted secret and watched your CI pipeline go red on the security
  step while lint/build/test stayed green, then watched it go green after the fix.
- You can say what a *clean* scan does and doesn't prove.

When a failing security gate feels like the pipeline doing its job — not an obstacle — you're ready
for Module 16, where containers make the environment your code (and these scanners) run in
reproducible.

---

## Verify-before-publish

> **Expansion-zone module — these facts move fast.** Re-check at build/publish time; don't ship the
> claims above from memory.

- [ ] **Pinned CI action versions.** The `ci-security.yml` snippet (and the Part D before/after diff)
      pin `actions/checkout` and `actions/setup-python` to major versions (`@v7`/`@v6` at build time).
      Pinned majors age — confirm they're current and not deprecated against the host's docs, the same
      check the Module 14 and Module 18 CI/CD checklists carry.
- [ ] **Scanner names and install methods.** Confirm `pip-audit`, `detect-secrets`, and `bandit` are
      still maintained and still install as shown. If any has stalled, swap in a current equivalent
      from the *same category* and keep the prose category-first, not tool-first.
- [ ] **Category roster.** Verify the named alternatives still exist and are reasonable to recommend:
      SCA (Trivy, Grype, OWASP Dependency-Check, Snyk, Safety, language-native `npm audit` etc.);
      secret scanning (gitleaks, trufflehog, git-secrets, detect-secrets); SAST (Semgrep, CodeQL,
      SonarQube, Bandit, language-native security linters). Add/remove as the landscape shifts.
- [ ] **Host-native features.** The major hosts' free offerings (dependency alerts, automated
      fix PRs, secret push-protection) change names and availability. Confirm what's actually free vs.
      paid at publish time rather than naming a specific product tier.
- [ ] **Slopsquatting framing.** Re-check the current research on AI package-hallucination rates and
      any newly-reported real-world slopsquatting incidents. Keep the figure qualitative
      ("a meaningful fraction") unless you can cite a current, specific source.
- [ ] **The planted vulnerable dependency in `lab/requirements.txt`.** Confirm the pinned version
      *still* trips an advisory in the scanner (advisory databases get reorganized and old entries
      occasionally change shape). Re-pin to a currently-flagged version if needed so Part B actually
      fires.
- [ ] **The hallucinated/typosquatted names in `lab/requirements.txt`.** Confirm they still do **not**
      resolve on the public index (someone may have since registered one — which would, ironically,
      make the slopsquatting point for you, but breaks the lab's "resolution fails" step). Swap for a
      currently-nonexistent plausible name if so.