fix(M7-27+capstone): apply AI-drives-git reframe, lesson=theory, de-slop course-wide

Phase 2 sweep — all modules are post-pivot, so the learner directs the AI agent (Claude Code as the worked example) to do the git/setup work and verifies, instead of typing commands by hand; no re-teaching basics. Lesson sections are theory with example output; all execution lives in the labs. De-slopped ("prose" etc. gone course-wide, em-dash density thinned). /path/to placeholders -> ~/ai-workflow-course. Every deliberate teaching device verified intact: M10 ai-change.patch trap, M12 bad-clear-snippet, M13/M27 planted pending_count bug, M15 secret+typosquat+MD5, M18 BREAK=1, M21 absent-.gitignore, M22 poisoned skill, M24 no-op patch, M25 --simulate. Labs compile/parse (py/sh/yaml/json); no junk. Closes #83 Closes #86 Closes #89 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01TfzV5QvtPDz8LJS3Pu5VLT
2026-06-22 21:58:17 -04:00
parent a29823f4b3
commit f925fd9645
38 changed files with 1735 additions and 1424 deletions
@@ -1,8 +1,8 @@
 # Module 14 — Continuous Integration

-> **The AI writes code that looks right. CI is the tireless reviewer that checks whether it actually
-> is — automatically, on every single push, before anyone trusts it.** This module turns the tests
-> you wrote in Module 13 into a gate that runs itself.
+> **The AI writes code that looks right. CI checks whether it actually is: automatically, on every
+> push, before anyone trusts it.** This module turns the tests you wrote in Module 13 into a gate
+> that runs itself.

 ---

@@ -46,7 +46,7 @@ By the end of this module you can:

 Continuous Integration has a grand-sounding name and a mundane core: **a set of checks that run
 automatically whenever you push code, on a clean machine you don't control.** That's it. The checks
-are usually the same commands you'd run by hand — lint, build, test — and the magic is entirely in
+are usually the same commands you'd run by hand (lint, build, test), and the magic is entirely in
 the word *automatically*.

 You already run checks. Before you commit, you (sometimes) run the tests, (sometimes) run the
@@ -60,12 +60,12 @@ Three properties make CI more than a glorified shell script:
 - **It's triggered, not invoked.** You don't run CI; pushing runs it. The check is bound to the
  event, so it can't be skipped by forgetting.
 - **It runs on a clean machine.** The forge spins up a fresh, throwaway runner with nothing of yours
-  on it — no half-installed dependency, no environment variable you set six months ago and forgot.
+  on it: no half-installed dependency, no environment variable you set six months ago and forgot.
  If your code only works because of something special about your laptop, CI finds out immediately.
  ("Works on my machine" dies here. Module 16 takes the reproducibility idea further with
  containers.)
 - **Its result is visible and shared.** A green check or a red X shows up on the commit and on the
-  pull request (Module 10), where everyone — every human reviewer and, later, every agent — can see
+  pull request (Module 10), where everyone (every human reviewer and, later, every agent) can see
  whether this code passed the gate.

 ### The pipeline: checkout → setup → checks
@@ -81,7 +81,7 @@ That last point is the load-bearing one. CI's entire enforcement mechanism is th
 Every tool you'd run in a terminal returns 0 for success and non-zero for failure. `python -m
 unittest` exits non-zero if a test fails. `ruff check` exits non-zero if it finds a lint problem. CI runs your
 commands and watches those exit codes; one failure turns the run red. You're not learning a new
-testing system — you're wiring the tools you already have to a trigger.
+testing system; you're wiring the tools you already have to a trigger.

 ### What goes in a CI run for this audience

@@ -136,13 +136,13 @@ Reading it top to bottom: `on:` is the trigger (push and pull request). `runs-on
 machine. The `steps:` are the four moves — checkout, set up Python, install the tools, then the two
 checks. `uses:` pulls in a pre-built action (someone else's reusable step); `run:` is just a shell
 command. The linter runs first because it's cheap; the tests run last because they're the
-expensive, decisive check. Only the linter needs a `pip install` here — the tests run on Python's
+expensive, decisive check. Only the linter needs a `pip install` here; the tests run on Python's
 standard-library `unittest` runner from Module 13, so there's nothing to install for them.

-This file lives *in the repo*, committed and versioned like everything else. That's deliberate and
-on-thesis: your pipeline is code, it's reviewed as a diff in a PR (Module 10), and a teammate or an
-agent inherits it automatically by cloning. The same logic as committing the AI's config in
-Module 5 — the automation around your work is itself a durable, shared artifact.
+This file lives *in the repo*, committed and versioned like everything else. That's deliberate:
+your pipeline is code, it's reviewed as a diff in a PR (Module 10), and a teammate or an agent
+inherits it automatically by cloning. The same logic as committing the AI's config in Module 5.
+The automation around your work is itself a durable, shared artifact.

 ### Reading a failed run

@@ -154,32 +154,32 @@ When CI goes red, the skill is triage, and it's fast once you know the shape:
 3. **Read that step's log.** It's the same output the tool prints in your terminal — a failing
   `unittest` assertion, a `ruff` finding with a file and line number. CI didn't invent a new error
   format; it's showing you the command's own output.
-4. **Reproduce it locally.** Run the exact command from the failed step (`python -m unittest` or
-   `ruff check .`) on your machine. It will fail the same way, because CI ran the same command. Fix
-   it locally, confirm it's green locally, push again.
+4. **Reproduce it locally.** The same command from the failed step (`python -m unittest` or
+   `ruff check .`) fails the same way on your own machine, because CI ran exactly that command. That
+   reproducibility is the point: fix locally, confirm green locally, push again.

-That loop — red on the forge, reproduce locally, fix, push — is the entire day-to-day of working
-with CI. The clean-machine runner occasionally surfaces a failure you *can't* reproduce locally;
-that's not CI being flaky, that's CI correctly catching that your machine has something the clean
+That loop (red on the forge, reproduce locally, fix, push) is the entire day-to-day of working
+with CI. The clean-machine runner occasionally surfaces a failure you *can't* reproduce locally.
+That's not CI being flaky; it's CI correctly catching that your machine has something the clean
 one doesn't. (See "Where it breaks.")

 ---

 ## The AI angle

-This is the module where CI stops being generic devops hygiene and becomes specifically, urgently
-about AI-assisted work.
+This is the module where CI stops being generic devops hygiene and becomes specifically about
+AI-assisted work.

-AI generates code that **looks right.** That's not a knock on the models — it's their defining
+AI generates code that **looks right.** That's not a knock on the models; it's their defining
 property. They produce fluent, plausible, well-formatted code that passes a human skim, because
 "looks like correct code" is close to what they're optimizing for. The failure mode isn't garbage
 that obviously won't run; it's the function that's 95% right with a flipped comparison, the refactor
 that quietly drops an edge case, the "cleanup" that breaks one path you didn't think to re-check.
 A human reviewer skimming a confident-looking diff is exactly the reviewer that misses these
-(Module 10 is the whole skill of *not* missing them — and it's hard).
+(Module 10 is the whole skill of *not* missing them, and it's hard).

 CI is the reviewer that doesn't skim. It runs the code. It doesn't care how clean the diff looks or
-how confidently the commit message is worded — it executes the tests and reports the exit code. The
+how confidently the commit message is worded; it executes the tests and reports the exit code. The
 flipped comparison fails an assertion. The dropped edge case fails the test that covered it. The
 plausibility that fools a human is invisible to a process that only checks behavior.

@@ -187,13 +187,14 @@ This compounds with everything else AI changes about your workflow:

 - **AI raises your push rate.** You're making more changes, faster, more of them generated. Manual
  pre-push checking scales with discipline and doesn't survive volume. The automated gate scales
-  for free — it doesn't get tired on the fortieth push of the day.
+  for free; it doesn't get tired on the fortieth push of the day.
 - **AI can fix what CI catches.** A red CI run is a precise, machine-readable problem statement: the
-  exact command, the exact failing assertion, the exact line. That's ideal input for an agent —
-  paste the failed log and ask it to fix the failure. (Module 25 automates this into agents that
-  respond to a failing pipeline on their own. CI is the trigger that makes self-healing possible.)
+  exact command, the exact failing assertion, the exact line. That's ideal input for an agent. Paste
+  the failed log into Claude Code (or your agent) and direct it to fix the failure. (Module 25
+  automates this into agents that respond to a failing pipeline on their own. CI is the trigger that
+  makes self-healing possible.)
 - **CI is the gate that makes letting agents run safely possible at all.** Every later module that
-  hands the AI more autonomy — issue-to-PR agents, unattended runs — relies on the fact that nothing
+  hands the AI more autonomy (issue-to-PR agents, unattended runs) relies on the fact that nothing
  the agent produces reaches anyone without passing CI first. The supervision is structural: it's
  this gate, not a human watching the agent type.

@@ -204,8 +205,9 @@ the more you need a reviewer that checks behavior instead of believing the diff.

 ## Hands-on lab

-**Lab language:** YAML (the CI config) plus the Python `tasks-app` and shell commands. You won't
-write much by hand — you'll commit a starter workflow, watch it pass, then break it on purpose.
+**Lab language:** YAML (the CI config) plus the Python `tasks-app` and shell commands. You direct
+the agent to place files, commit, and recover; you commit a starter workflow, watch it pass, then
+break it on purpose and watch CI catch it.

 **You'll need:**

@@ -214,71 +216,83 @@ write much by hand — you'll commit a starter workflow, watch it pass, then bre
  - `ci-starter.yml` — the workflow (GitHub Actions flavor).
  - `gitlab-ci-starter.yml` — the same pipeline for GitLab, if that's your forge.
  - `test_tasks.py` — a small test suite (use your Module 13 tests instead if you have them).
- Python 3.10+ locally, and your AI assistant.
+- Python 3.10+ locally, and your agent. Examples use **Claude Code**; sub your own agent anywhere.

 ### Part A — Run the checks locally first

-Never push a workflow you haven't run by hand. CI just runs the same commands — prove they work on
+Never push a workflow you haven't run by hand. CI just runs the same commands, so prove they work on
 your machine first.

-1. Copy `lab/test_tasks.py` into your `tasks-app` folder (next to `tasks.py`). Install the tools and
-   run both checks exactly as CI will:
+1. Direct your agent to set up the project, then run the checks yourself once. Tell Claude Code (sub
+   your own agent): *"Copy the lab's `test_tasks.py` next to `tasks.py` in `~/ai-workflow-course/tasks-app`,
+   then install `ruff` into this project."* The agent places the file and handles the install,
+   including the PEP 668 fallback (a per-project venv) if the system Python refuses a global install.
+   What it runs looks like:

   ```bash
   cd ~/ai-workflow-course/tasks-app
   pip install ruff
+   # if pip is refused with "externally-managed-environment" (PEP 668, common on recent
+   # Debian/Ubuntu and Homebrew Python), the agent falls back to a per-project venv:
+   #   python3 -m venv .venv && source .venv/bin/activate   # Windows: .venv\Scripts\activate
+   #   pip install ruff
+   ```
+
+   Then run both checks **yourself**, once. This is the one part you do by hand on purpose: feeling
+   that CI is nothing more than these same two commands is what makes the rest of the module click.
+
+   ```bash
   python -m unittest   # should report all tests passing
   ruff check .         # should report no issues (or fix what it flags)
   ```

-   If both are clean locally, CI will be green. If not, fix it here — it's faster than waiting on a
-   runner.
-
-   > **If `pip install` is refused** with "externally-managed-environment" (PEP 668 — common on
-   > recent Debian/Ubuntu and Homebrew Python), install into a per-project virtual environment
-   > instead: `python3 -m venv .venv && source .venv/bin/activate` (Windows:
-   > `.venv\Scripts\activate`), then re-run `pip install ruff`. Only the linter needs installing — the
-   > stdlib `unittest` runner needs nothing. (`pipx` or `pip install --break-system-packages` also
-   > work; a venv is the clean default.)
+   If both are clean locally, CI will be green. If not, fix it here; it's faster than waiting on a
+   runner. (Only the linter needs installing. The stdlib `unittest` runner ships with Python.)

 ### Part B — Add the workflow and watch it pass

-2. Put the workflow where your forge looks for it:
-   - **GitHub / Forgejo / Gitea:** copy `lab/ci-starter.yml` to `.github/workflows/ci.yml` in your
-     repo (Forgejo/Gitea also read `.forgejo/workflows/` or `.gitea/workflows/` — check yours).
-   - **GitLab:** copy `lab/gitlab-ci-starter.yml` to `.gitlab-ci.yml` at the repo root.
+2. Direct the agent to put the workflow where your forge looks for it. Tell Claude Code which forge
+   you're on and let it pick the path:
+   - **GitHub / Forgejo / Gitea:** `lab/ci-starter.yml` goes to `.github/workflows/ci.yml` (Forgejo/Gitea
+     also read `.forgejo/workflows/` or `.gitea/workflows/`; the agent checks which yours uses).
+   - **GitLab:** `lab/gitlab-ci-starter.yml` goes to `.gitlab-ci.yml` at the repo root.

-3. Commit and push it:
+3. Direct the agent to commit and push it, then verify. Tell Claude Code: *"Stage the new workflow
+   and `test_tasks.py`, commit with a message about adding CI, and push."* Let it decide what to
+   stage and run the git for you. What it runs looks like:

   ```bash
-   git add .github/workflows/ci.yml test_tasks.py    # adjust path for your forge
+   git add .github/workflows/ci.yml test_tasks.py    # path varies by forge; the agent picks it
   git commit -m "Add CI: lint and test on every push"
   git push
   ```

+   Verify it committed the workflow and the test file (a `git show --stat HEAD` confirms what landed),
+   not stray files.
+
 4. Open your repo in the forge's web UI and find the run (usually an "Actions," "CI/CD," or
   "Pipelines" tab, and a status icon on the commit). Watch the steps execute and turn green.
   **That green check is the gate now standing guard on every future push.** (Self-host track: if
   the run sits queued with nothing picking it up, that's the no-hosted-runner situation from the
-   prerequisites — the workflow is correct, it just has no compute until you attach a runner in
-   Module 19. Run this part on a SaaS forge to see green here and now.)
+   prerequisites; the workflow is correct, it just has no compute until you attach a runner in
+   Module 19. Run this part on a SaaS forge to see green right now.)

 ### Part C — Break it on purpose and watch CI catch it

 This is the whole point. You're going to ship the kind of plausible-but-wrong change AI produces,
 and watch CI stop it.

-5. Introduce a breaking change. Ask your AI assistant — in the browser, or with your editor-
-   integrated tool from Module 4 — for something that *sounds* like a cleanup but changes behavior.
-   For example: *"Refactor `pending()` in tasks.py to be simpler"* and, if it stays correct, nudge
-   it until the logic actually changes — or just make the change yourself to feel it. A classic
-   plausible break: have `pending()` return `self.tasks` (all tasks) instead of filtering out the
-   done ones. It reads fine. It's wrong.
+5. Introduce a breaking change with the agent. Ask Claude Code (sub your own) for something that
+   *sounds* like a cleanup but changes behavior: *"Refactor `pending()` in tasks.py to be simpler."*
+   If it stays correct, nudge it until the logic actually changes. The classic plausible break: have
+   `pending()` return `self.tasks` (all tasks) instead of filtering out the done ones. It reads fine.
+   It's wrong.

 6. **Notice it still looks right.** Glance at the diff. The function is short, clean, plausible.
-   This is exactly the trap from "The AI angle" — nothing in the *appearance* warns you.
+   This is exactly the trap from "The AI angle": nothing in the *appearance* warns you.

-7. Commit and push it:
+7. Direct the agent to commit and push the change it just made. Tell Claude Code: *"Commit this and
+   push it."* What it runs looks like:

   ```bash
   git add tasks.py
@@ -286,31 +300,34 @@ and watch CI stop it.
   git push
   ```

+   Then verify CI goes red.
+
 8. Watch CI go red. Open the run, find the first failed step (`Test`), and read the log:
   `test_pending_excludes_completed_tasks` failed, with the assertion and the actual-vs-expected
   values. CI caught in seconds what a skim would have waved through.

-9. Reproduce and fix. The bad change is already committed *and pushed*, so `git restore` is no help
-   here — it only discards *uncommitted* edits, and there are none. The team-safe undo for something
-   already on shared history is `git revert` (Module 12): it writes a **new** commit that inverts the
-   bad one, instead of rewriting history other people may have pulled.
+9. Hand the failure to the agent and let it recover. Paste the red CI log (the failed `Test` step)
+   into Claude Code and direct it: *"Reproduce this locally, then undo the bad change safely; it's
+   already pushed."* Your job is to verify it makes the right call, not to type git. The check:
+   because the commit is already on shared history, the team-safe undo is `git revert`, not
+   `git restore` (Module 12). What the agent runs looks like:

   ```bash
-   python -m unittest # fails locally too — same command, same failure
-   git revert HEAD    # new commit that undoes "Simplify pending()" (Module 12)
-   git push           # CI re-runs on the fixed code and goes green again
+   python -m unittest          # fails locally too: same command, same failure
+   git revert --no-edit HEAD   # new commit that undoes "Simplify pending()" (Module 12)
+   git push                    # CI re-runs on the fixed code and goes green again
   ```

-   `git revert HEAD` opens an editor with a prefilled message (`Revert "Simplify pending()"`) — save
-   and close it. The revert restores the correct `pending()`, the push triggers CI on the fixed code,
-   and the run goes green.
+   Verify CI goes green again, and that the agent chose revert (a new inverting commit) over a
+   history-rewriting undo on a branch others may have pulled.

 10. *(Optional, to feel the linter tier.)* Add an obviously unused import to `cli.py`
-    (`import os` at the top, unused), commit, and push. Watch the **Lint** step fail *before* the
-    tests even run — the cheap check failing fast. Remove it and push again.
+    (`import os` at the top, unused), then direct the agent to commit and push. Watch the **Lint**
+    step fail *before* the tests even run: the cheap check failing fast. Have the agent remove it and
+    push again.

-You've now seen both halves: CI passing as a quiet guardrail, and CI failing as the reviewer that
-caught a change you might have trusted.
+You've now seen both halves: CI passing as a guardrail that stays out of your way, and CI failing as
+the reviewer that caught a change you might have trusted.

 ---

@@ -324,7 +341,7 @@ The honest caveats, because a skeptical audience trusts the limits more than the
  better. The flipped-comparison bug above got caught *because a test covered it.*
 - **Green CI is not "reviewed."** It checks behavior, not design, intent, security, or whether the
  feature is even the right one. It does not replace human review (Module 10) or the security gates
-  in Module 15 — it sits alongside them. Treating a green check as sign-off is how plausible-wrong
+  in Module 15; it sits alongside them. Treating a green check as sign-off is how plausible-wrong
  code with no failing test sails straight through.
 - **The clean machine is a feature that feels like a bug.** Sooner or later CI fails in a way you
  can't reproduce locally — a dependency you have installed but never declared, a file outside the