De-slop: remove every em-dash + banned words across all modules + capstone (#94)

Co-authored-by: claude <claude@jpaul.io> Co-committed-by: claude <claude@jpaul.io>
2026-06-22 23:21:22 -04:00
parent 513d7e7ac8
commit c098933f25
99 changed files with 1324 additions and 1315 deletions
@@ -1,6 +1,6 @@
-# Module 15 — Security Scanning for AI-Generated Code
+# Module 15: Security Scanning for AI-Generated Code

-> **Your build is green, your tests pass, and the AI just imported a package that doesn't exist —
+> **Your build is green, your tests pass, and the AI just imported a package that doesn't exist,
 > or one an attacker registered last week using exactly the name LLMs like to invent.** CI proves
 > the code *runs*; it says nothing about whether it's *safe*. This module adds the gates that catch
 > what a build check structurally can't.
@@ -9,18 +9,18 @@

 ## Prerequisites

- **Module 14 — Continuous Integration.** You have a pipeline that runs lint, build, and tests on
+- **Module 14: Continuous Integration.** You have a pipeline that runs lint, build, and tests on
  every push. Security scanning is *more gates on that same pipeline*, so you need somewhere to bolt
  them on.
- **Module 2 — Version Control as a Safety Net.** Scanners flag findings in a diff; you'll commit,
+- **Module 2: Version Control as a Safety Net.** Scanners flag findings in a diff; you'll commit,
  re-scan, and confirm a gate goes red then green. Secret scanning in particular cares about *history*,
  not just the working tree; that only makes sense once you think in commits.
- **Module 1 — the `tasks-app`.** The running example. We'll let the AI bolt a "cloud sync" feature
+- **Module 1: the `tasks-app`.** The running example. We'll let the AI bolt a "cloud sync" feature
  onto it and watch it introduce all three failure modes at once.

-Helpful but not required: **Module 8 (remotes/hosting)** — host-native scanning (Dependabot-style
-alerts, push protection) lives on the remote; **Module 10 (reviewing code you didn't write)** —
-scanners are the automated half of that review. Secrets get a full treatment of their own in
+Helpful but not required: **Module 8 (remotes/hosting)** gives you host-native scanning (Dependabot-style
+alerts, push protection) that lives on the remote; **Module 10 (reviewing code you didn't write)** frames
+scanners as the automated half of that review. Secrets get a full treatment of their own in
 **Module 17**; this module's job is to *catch* them, not to manage them.

 ---
@@ -33,11 +33,11 @@ By the end of this module you can:
   vulnerable dependencies, hardcoded secrets, and hallucinated/typosquatted packages.
 2. Explain **slopsquatting** and why AI-suggested dependencies are a live supply-chain attack vector,
   not a hypothetical one.
-3. Run the three automated gates locally — **SCA (dependency scanning)**, **secret scanning**, and
-   **SAST (static analysis)** — and read their output for real signal vs. noise.
+3. Run the three automated gates locally and read their output for real signal vs. noise:
+   **SCA (dependency scanning)**, **secret scanning**, and **SAST (static analysis)**.
 4. Wire those gates into the Module 14 pipeline so a planted secret or a fake dependency turns the
   build red *before* it merges.
-5. Reason about each gate's limits — false positives, the secret that's already leaked, and what
+5. Reason about each gate's limits: false positives, the secret that's already leaked, and what
   "no findings" does and doesn't prove.

 ---
@@ -57,13 +57,13 @@ That's a question about **behavior the tests exercise.** None of the following c
  the injection case is never exercised. Green.

 CI is a *functional* gate. Security scanning is a *non-functional* gate that asks a different
-question — *is this code safe to ship?* — and it asks it the only way that scales: automatically, on
+question (*is this code safe to ship?*), and it asks it the only way that scales: automatically, on
 every push, with no human remembering to look. You are adding three checkers that each know a class
 of problem your tests structurally cannot see.

 The reframe for this audience: you already gate merges on "tests pass." You're now adding "no known
-vulns, no secrets, no obvious injection" to the same gate. It's the same instinct — *don't let bad
-things through automatically* — pointed at a different failure mode.
+vulns, no secrets, no obvious injection" to the same gate. It's the same instinct, *don't let bad
+things through automatically*, pointed at a different failure mode.

 ### The three gates

@@ -71,13 +71,13 @@ things through automatically* — pointed at a different failure mode.
 |------|---------|------------------|
 | **SCA** (Software Composition Analysis) | Known-vulnerable, abandoned, or **non-existent** dependencies | Dependency/vulnerability scanners |
 | **Secret scanning** | Credentials committed into source or git history | Entropy + pattern matchers over files and commits |
-| **SAST** (Static Application Security Testing) | Insecure code *you wrote* — injection, weak crypto, unsafe deserialization | Static analyzers / linters with a security ruleset |
+| **SAST** (Static Application Security Testing) | Insecure code *you wrote*: injection, weak crypto, unsafe deserialization | Static analyzers / linters with a security ruleset |

 SCA and SAST split the world cleanly: **SCA scans the code you didn't write (your dependencies);
 SAST scans the code you did.** Secret scanning cuts across both: a leaked key is neither a
 dependency nor a logic bug, it's a string that should never have been committed.

-### Gate 1 — SCA: scanning the code you didn't write
+### Gate 1 (SCA): scanning the code you didn't write

 Modern software is mostly other people's code. A ten-line script can pull in a hundred transitive
 dependencies, any of which can have a published vulnerability. SCA tools resolve your full dependency
@@ -96,8 +96,8 @@ service and the model will `import` or list a dependency that *sounds* exactly r
 rare; studies of AI-generated code find a meaningful fraction of suggested packages are
 hallucinations, and crucially, **the model hallucinates the same plausible names repeatedly.**

-Attackers noticed. The attack — nicknamed **slopsquatting** (typosquatting, but aimed at LLM "slop"
-rather than human typos) — is:
+Attackers noticed. The attack, nicknamed **slopsquatting** (typosquatting, but aimed at LLM "slop"
+rather than human typos), is:

 1. Watch what package names LLMs commonly invent.
 2. Register those exact names on the public package index, with malware inside.
@@ -118,7 +118,7 @@ The habit to build: **a dependency the AI added is an untrusted claim until you
 real, is the one you meant, and is widely used.** Treat the requirements file the AI hands you the
 same way you'd treat a stranger handing you a USB stick.

-### Gate 2 — Secret scanning
+### Gate 2 (secret scanning)

 AI loves to hardcode credentials. Ask for code that calls an authenticated API and a model will
 write `API_KEY = "sk-live-..."` straight into the source, because that makes the example
@@ -126,9 +126,9 @@ write `API_KEY = "sk-live-..."` straight into the source, because that makes the

 Secret scanners catch this by scanning files (and crucially, **git history**) for two signals:

- **Known patterns** — provider key formats (cloud access keys, tokens with recognizable prefixes,
+- **Known patterns**: provider key formats (cloud access keys, tokens with recognizable prefixes,
  private-key PEM headers, connection strings).
- **High entropy** — random-looking strings that statistically resemble a generated credential even
+- **High entropy**: random-looking strings that statistically resemble a generated credential even
  when they match no known pattern.

 The non-obvious part for this audience: **a secret committed once is leaked forever.** Deleting it in
@@ -137,18 +137,18 @@ a later commit doesn't help; it's still sitting in history, and anyone with the
 a true hit means two jobs, not one: (1) get it out of the code, and (2) **rotate the credential**,
 because you must assume it's compromised. Scrubbing history is harder than it looks and is a
 recovery-grade operation (Module 12 territory). The cheap win is catching it *before* it's ever
-pushed — which is exactly why this gate belongs in the pipeline and, ideally, in a pre-commit hook.
+pushed, which is exactly why this gate belongs in the pipeline and, ideally, in a pre-commit hook.

-This module catches the secret. *Managing* secrets properly — env vars, secret stores, per-environment
-config so the AI never has a key to hardcode in the first place — is **Module 17**. Gate 2 is the
+This module catches the secret. *Managing* secrets properly (env vars, secret stores, per-environment
+config so the AI never has a key to hardcode in the first place) is **Module 17**. Gate 2 is the
 tripwire that proves you need it.

-### Gate 3 — SAST: scanning the code you did write
+### Gate 3 (SAST): scanning the code you did write

 SAST analyzes *your* source for insecure patterns without running it: SQL built by string
 concatenation, shell commands assembled from user input, weak or misused crypto, unsafe
 deserialization, paths built from untrusted input. It's a linter (Module 14) with a security
-ruleset — same machinery, different question.
+ruleset; same machinery, different question.

 Why it earns a place specifically for AI code: a model reproduces the patterns it was trained on, and
 the internet is full of insecure examples. It will write the string-concatenated SQL query because a
@@ -164,12 +164,12 @@ ignored red noise if you don't.

 You want these in more than one place, cheapest-and-earliest first:

- **Local / pre-commit** — fastest feedback, and the only place that stops a secret *before* it
+- **Local / pre-commit**: fastest feedback, and the only place that stops a secret *before* it
  enters history. A pre-commit hook running secret scanning is the single highest-value placement.
- **CI (the Module 14 pipeline)** — the enforcement gate. Local hooks can be skipped; the pipeline
+- **CI (the Module 14 pipeline)**: the enforcement gate. Local hooks can be skipped; the pipeline
  can't be, if you require it to pass before merge. This is where "the build goes red" actually
  blocks a merge.
- **Host-native, on the remote** — most git hosts (Module 8) offer some of this for free:
+- **Host-native, on the remote**: most git hosts (Module 8) offer some of this for free:
  dependency alerts that watch your manifest against advisory feeds and open issues/PRs when a new
  CVE drops, and push protection that rejects a commit containing a recognized secret at the server.
  Turn these on; they cover the long tail (a CVE published *after* you merged) that a one-shot CI run
@@ -192,12 +192,12 @@ and does it in the exact form that slips past a human skim and a green build:
 - **It hardcodes secrets** because hardcoding makes the example run, and running is what the model is
  rewarded for. The instinct that "this string is dangerous" is exactly the instinct it lacks.
 - **It reproduces insecure idioms** by default, because plausible-looking code is the
-  whole game, and insecure code is extremely plausible: it's all over the training data.
+  whole game, and insecure code is plausible by default: it's all over the training data.

 And the volume multiplies all of it. You're merging more code, faster, with less of it read
 line-by-line, precisely because the AI made generation cheap. The one defense that scales with that
 volume is the one that doesn't depend on a human remembering to look. That's these gates. You don't
-add them *despite* using AI — using AI is what moves them from "nice to have" to "required."
+add them *despite* using AI; using AI is what moves them from "nice to have" to "required."

 ---

@@ -208,7 +208,7 @@ scanners (both pip-installable, cross-platform), let the AI introduce all three
 and wire the catch into your pipeline.

 > **Windows note:** the scanner *commands* are identical everywhere. The wrapper script
-> `lab/security-scan.sh` is bash — run it from Git Bash or WSL, or just run the three commands it
+> `lab/security-scan.sh` is bash; run it from Git Bash or WSL, or just run the three commands it
 > contains directly in PowerShell. Nothing in the lab needs a specific shell beyond that.

 **You'll need:**
@@ -234,7 +234,7 @@ and wire the catch into your pipeline.

 - Your coding agent (Claude Code is the worked example; sub your own).

-### Part A — Let the AI introduce the problems
+### Part A: Let the AI introduce the problems

 Direct your agent (Claude Code is the worked example; sub your own) to place this module's starter
 files: *"Copy `~/ai-workflow-course/modules/15-security-scanning/lab/config.py` and
@@ -255,7 +255,7 @@ to a cloud API, and give me a requirements.txt for it."* You'll very likely get
 at least one questionable dependency for free. Use the provided files if you want the lab to be
 reproducible.

-### Part B — Gate 1: SCA, and meeting a hallucinated package
+### Part B (Gate 1): SCA, and meeting a hallucinated package

 From the repo, try to resolve the AI's dependencies. Running the scanner is the lesson, so you run it
 by hand:
@@ -267,7 +267,7 @@ pip-audit -r requirements.txt

 It fails before it can audit anything: the resolver can't find one or more packages. **That's
 slopsquatting's first tripwire.** Read the error; it names the package it couldn't resolve. Now make
-the call this module is really about, and make it *yourself* — this is the human-in-the-loop judgment
+the call this module is really about, and make it *yourself*; this is the human-in-the-loop judgment
 no tool and no agent should make for you: *is this a typo I should "fix," or a name that should not
 exist?* Do **not** let the agent (or your own reflex) swap in the nearest real name; that reflex is
 exactly what the attack relies on. Confirm against the real project's home page which dependency was
@@ -287,7 +287,7 @@ to the fixed version the advisory names in requirements.txt."* Run `pip-audit` o
 clean. You've now exercised both halves of SCA: the package that *shouldn't exist*, and the package
 that exists but *shouldn't be at that version*.

-### Part C — Gate 2: secret scanning
+### Part C (Gate 2): secret scanning

 Scan for the hardcoded key yourself:

@@ -305,17 +305,17 @@ finding is gone. And say the quiet part out loud: **if that key had been real an
 removing it now is not enough; you'd have to rotate it,** because it's in history. (Proper secret
 management is Module 17; this is just the catch.)

-> **Stretch — Gate 3 (SAST):** install a static analyzer for your language (for Python,
-> `pip install bandit`, then `bandit -r .`) and watch it flag insecure *code you wrote* — here, the
+> **Stretch (Gate 3, SAST):** install a static analyzer for your language (for Python,
+> `pip install bandit`, then `bandit -r .`) and watch it flag insecure *code you wrote*: here, the
 > MD5-based request signing in `config.py` (weak crypto, CWE-327). Now note what it does **not**
 > flag: the hardcoded `SYNC_API_KEY`. Bandit's hardcoded-credential checks (B105–107) key on
-> *password-named* identifiers — `password`, `secret`, `token` — so a key named `SYNC_API_KEY` slips
+> *password-named* identifiers (`password`, `secret`, `token`), so a key named `SYNC_API_KEY` slips
 > right past them. Catching that string is a secret scanner's job (Gate 2), not SAST's. Same file,
-> two distinct flaws, caught by two different gates with two different blind spots — which is exactly
+> two distinct flaws, caught by two different gates with two different blind spots, which is exactly
 > why you run all three rather than trusting one. And note how much noisier SAST is than the first
 > two gates: that noise is why it's the one you tune.

-### Part D — Wire the gates into CI
+### Part D: Wire the gates into CI

 A scan you have to remember to run is a scan you'll skip. Move it into the Module 14 pipeline so it
 runs on every push and blocks the merge.
@@ -347,8 +347,8 @@ runs on every push and blocks the merge.
   ./security-scan.sh
   ```

-   It should **fail on both gates** — the SCA gate on the unresolvable/vulnerable dependencies and
-   the secret gate on the hardcoded key — and you should be able to point at which finding caused
+   It should **fail on both gates** (the SCA gate on the unresolvable/vulnerable dependencies and
+   the secret gate on the hardcoded key), and you should be able to point at which finding caused
   each non-zero exit. Direct your agent to re-apply your Part B/C fixes and re-stage, run the gate
   once more yourself, and it should pass.

@@ -366,7 +366,7 @@ runs on every push and blocks the merge.
   runs `./security-scan.sh` (chmod it first). Don't add a second job, and don't touch the checkout
   or Python steps."*

-   Here is exactly what the result should look like. **Before** — the tail of your Module 14 `check`
+   Here is exactly what the result should look like. **Before**: the tail of your Module 14 `check`
   job (GitHub Actions flavor, matching `ci-starter.yml`; on GitLab the same two steps drop into the
   job's `script:`):

@@ -389,7 +389,7 @@ runs on every push and blocks the merge.
           run: python -m unittest
   ```

-   **After** — the same job with the two security steps appended; nothing else changes:
+   **After**: the same job with the two security steps appended; nothing else changes:

   ```diff
          - name: Lint
@@ -425,7 +425,7 @@ runs on every push and blocks the merge.

 ## Where it breaks

-The honest limits — these gates are necessary, not sufficient:
+The honest limits (these gates are necessary, not sufficient):

 - **A clean scan is not a safe codebase.** Scanners find *known* vulns and *recognizable* patterns. A
  novel logic flaw, a business-logic auth bypass, or a brand-new zero-day in a dependency all pass
@@ -456,16 +456,16 @@ The honest limits — these gates are necessary, not sufficient:
 **You're done when:**

 - You can state, without looking back, the three classes of risk AI introduces that a green build
-  won't catch — and which gate catches each.
+  won't catch, and which gate catches each.
 - You can explain slopsquatting to a colleague in two sentences, including *why* registering a
  hallucinated name works as an attack.
 - Running `./security-scan.sh` on the unmodified starter files **fails**, and on your fixed files
-  **passes** — and you understand which finding each exit reflects.
+  **passes**, and you understand which finding each exit reflects.
 - You've pushed a commit with a planted secret and watched your CI pipeline go red on the security
  step while lint/build/test stayed green, then watched it go green after the fix.
 - You can say what a *clean* scan does and doesn't prove.

-When a failing security gate feels like the pipeline doing its job — not an obstacle — you're ready
+When a failing security gate feels like the pipeline doing its job, not an obstacle, you're ready
 for Module 16, where containers make the environment your code (and these scanners) run in
 reproducible.

@@ -473,12 +473,12 @@ reproducible.

 ## Verify-before-publish

-> **Expansion-zone module — these facts move fast.** Re-check at build/publish time; don't ship the
+> **Expansion-zone module: these facts move fast.** Re-check at build/publish time; don't ship the
 > claims above from memory.

 - [ ] **Pinned CI action versions.** The `ci-security.yml` snippet (and the Part D before/after diff)
      pin `actions/checkout` and `actions/setup-python` to major versions (`@v7`/`@v6` at build time).
-      Pinned majors age — confirm they're current and not deprecated against the host's docs, the same
+      Pinned majors age; confirm they're current and not deprecated against the host's docs, the same
      check the Module 14 and Module 18 CI/CD checklists carry.
 - [ ] **Scanner names and install methods.** Confirm `pip-audit`, `detect-secrets`, and `bandit` are
      still maintained and still install as shown. If any has stalled, swap in a current equivalent
@@ -498,6 +498,6 @@ reproducible.
      occasionally change shape). Re-pin to a currently-flagged version if needed so Part B actually
      fires.
 - [ ] **The hallucinated/typosquatted names in `lab/requirements.txt`.** Confirm they still do **not**
-      resolve on the public index (someone may have since registered one — which would, ironically,
+      resolve on the public index (someone may have since registered one, which would, ironically,
      make the slopsquatting point for you, but breaks the lab's "resolution fails" step). Swap for a
      currently-nonexistent plausible name if so.
@@ -1,4 +1,4 @@
-# ci-security.yml — the security gate as a CI step (Module 15).
+# ci-security.yml: the security gate as a CI step (Module 15).
 #
 # This is a PROVIDER-NEUTRAL snippet, not a drop-in file. The YAML below uses the widely-shared
 # "workflow / job / steps" shape that most hosted and self-hosted CI systems understand (the exact
@@ -24,7 +24,7 @@ jobs:
      - name: Check out the code
        uses: actions/checkout@v7
        # Secret scanning cares about history. If your tool scans commits (not just the working
-        # tree), fetch full history here — e.g. set `with: { fetch-depth: 0 }`.
+        # tree), fetch full history here; e.g. set `with: { fetch-depth: 0 }`.

      - name: Set up Python
        uses: actions/setup-python@v6
@@ -1,4 +1,4 @@
-"""Cloud-sync config for tasks-app — a realistic snapshot of what an AI hands you.
+"""Cloud-sync config for tasks-app: a realistic snapshot of what an AI hands you.

 Asked to "sync tasks to a cloud service," a model will produce something like this: it works, it
 reads naturally, it passes lint and tests... and it carries two planted flaws: a live credential
@@ -24,15 +24,15 @@ def sync_headers() -> dict:

 # --- The problem the SAST scanner should flag (Gate 3) -----------------------------------------
 # AI-classic: "sign" the request body with a quick hash. MD5 is broken for anything
-# security-relevant — a textbook weak-crypto idiom. A secret scanner won't catch this (it's not a
+# security-relevant; a textbook weak-crypto idiom. A secret scanner won't catch this (it's not a
 # secret); a SAST tool like bandit will (it's insecure code you wrote). DO NOT imitate.
 def sign_payload(body: str) -> str:
    return hashlib.md5(body.encode()).hexdigest()


 # --- The fix (Part C) --------------------------------------------------------------------------
-# Read the secret from the environment instead of committing it. Proper secret management — env
-# files, secret stores, per-environment config — is Module 17. This is just enough to make the
+# Read the secret from the environment instead of committing it. Proper secret management (env
+# files, secret stores, per-environment config) is Module 17. This is just enough to make the
 # scanner go quiet honestly.
 #
 # import os
@@ -1,7 +1,7 @@
 # Dependencies an AI "suggested" for the tasks-app cloud-sync feature.
 #
 # This file is deliberately booby-trapped with the three things AI gets wrong about dependencies.
-# Read it before you run anything — every line looks plausible, which is the whole problem.
+# Read it before you run anything; every line looks plausible, which is the whole problem.
 #
 # Work through it in Part B of the lab:
 #   1) `pip-audit -r requirements.txt` will FAIL TO RESOLVE because of the bad names below.
@@ -14,11 +14,11 @@
 requests==2.19.1

 # (2) TYPOSQUAT of a real package ("requests"). One transposed letter. Does not exist on the
-#     public index today — the resolver will reject it. The danger isn't the 404; it's "fixing"
+#     public index today; the resolver will reject it. The danger isn't the 404; it's "fixing"
 #     it by guessing instead of verifying what was actually meant.
 reqeusts==2.31.0

-# (3) HALLUCINATION — a plausible-but-invented name the model produced from thin air. This is the
+# (3) HALLUCINATION: a plausible-but-invented name the model produced from thin air. This is the
 #     slopsquatting target: register this name with malware and the next person to `pip install`
 #     gets owned. Confirm it does not resolve; never add it without verifying the real project.
 task-cloud-sync-client==1.4.2
@@ -1,12 +1,12 @@
 #!/usr/bin/env bash
 #
-# security-scan.sh — the security gate for tasks-app (Module 15).
+# security-scan.sh: the security gate for tasks-app (Module 15).
 #
 # Runs two scanners and exits non-zero if EITHER finds something. That non-zero exit is what turns
 # a CI run red (Module 14). One script, two homes: run it by hand for fast local feedback, and call
 # it from the pipeline so the same definition of "a finding" enforces the merge.
 #
-# These two tools (pip-audit, detect-secrets) are concrete examples of their categories — SCA and
+# These two tools (pip-audit, detect-secrets) are concrete examples of their categories, SCA and
 # secret scanning. Swap in any equivalent; keep the contract the same: scan, print, fail on findings.
 #
 # Usage:   ./security-scan.sh
@@ -30,7 +30,7 @@ if [ -f requirements.txt ]; then
    status=1
  fi
 else
-  echo "(no requirements.txt found — skipping SCA)"
+  echo "(no requirements.txt found; skipping SCA)"
 fi

 echo
@@ -38,7 +38,7 @@ echo "=== Gate 2: secret scan (detect-secrets) ==="
 # detect-secrets prints a JSON report of any secrets it finds. NOTE: with no path it scans the files
 # git TRACKS, so stage the starter files (`git add`) before running this, or an untracked file is
 # invisible to the gate. We parse the JSON with `python3` (no jq dependency) and fail CLOSED: the
-# parser returns 0=secrets found, 1=clean, anything else=couldn't tell — and "couldn't tell" must
+# parser returns 0=secrets found, 1=clean, anything else=couldn't tell; "couldn't tell" must
 # count as a failure, never a silent pass.
 report="$(detect-secrets scan)"
 printf '%s' "$report" | python3 -c 'import sys, json