fix(M7-27+capstone): apply AI-drives-git reframe, lesson=theory, de-slop course-wide

Phase 2 sweep — all modules are post-pivot, so the learner directs the AI agent
(Claude Code as the worked example) to do the git/setup work and verifies, instead
of typing commands by hand; no re-teaching basics. Lesson sections are theory with
example output; all execution lives in the labs. De-slopped ("prose" etc. gone
course-wide, em-dash density thinned). /path/to placeholders -> ~/ai-workflow-course.

Every deliberate teaching device verified intact: M10 ai-change.patch trap,
M12 bad-clear-snippet, M13/M27 planted pending_count bug, M15 secret+typosquat+MD5,
M18 BREAK=1, M21 absent-.gitignore, M22 poisoned skill, M24 no-op patch, M25 --simulate.
Labs compile/parse (py/sh/yaml/json); no junk.

Closes #83
Closes #86
Closes #89

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TfzV5QvtPDz8LJS3Pu5VLT
This commit is contained in:
2026-06-22 21:58:17 -04:00
parent a29823f4b3
commit f925fd9645
38 changed files with 1735 additions and 1424 deletions
+91 -74
View File
@@ -1,8 +1,8 @@
# Module 14 — Continuous Integration
> **The AI writes code that looks right. CI is the tireless reviewer that checks whether it actually
> is — automatically, on every single push, before anyone trusts it.** This module turns the tests
> you wrote in Module 13 into a gate that runs itself.
> **The AI writes code that looks right. CI checks whether it actually is: automatically, on every
> push, before anyone trusts it.** This module turns the tests you wrote in Module 13 into a gate
> that runs itself.
---
@@ -46,7 +46,7 @@ By the end of this module you can:
Continuous Integration has a grand-sounding name and a mundane core: **a set of checks that run
automatically whenever you push code, on a clean machine you don't control.** That's it. The checks
are usually the same commands you'd run by hand lint, build, test and the magic is entirely in
are usually the same commands you'd run by hand (lint, build, test), and the magic is entirely in
the word *automatically*.
You already run checks. Before you commit, you (sometimes) run the tests, (sometimes) run the
@@ -60,12 +60,12 @@ Three properties make CI more than a glorified shell script:
- **It's triggered, not invoked.** You don't run CI; pushing runs it. The check is bound to the
event, so it can't be skipped by forgetting.
- **It runs on a clean machine.** The forge spins up a fresh, throwaway runner with nothing of yours
on it no half-installed dependency, no environment variable you set six months ago and forgot.
on it: no half-installed dependency, no environment variable you set six months ago and forgot.
If your code only works because of something special about your laptop, CI finds out immediately.
("Works on my machine" dies here. Module 16 takes the reproducibility idea further with
containers.)
- **Its result is visible and shared.** A green check or a red X shows up on the commit and on the
pull request (Module 10), where everyone every human reviewer and, later, every agent can see
pull request (Module 10), where everyone (every human reviewer and, later, every agent) can see
whether this code passed the gate.
### The pipeline: checkout → setup → checks
@@ -81,7 +81,7 @@ That last point is the load-bearing one. CI's entire enforcement mechanism is th
Every tool you'd run in a terminal returns 0 for success and non-zero for failure. `python -m
unittest` exits non-zero if a test fails. `ruff check` exits non-zero if it finds a lint problem. CI runs your
commands and watches those exit codes; one failure turns the run red. You're not learning a new
testing system you're wiring the tools you already have to a trigger.
testing system; you're wiring the tools you already have to a trigger.
### What goes in a CI run for this audience
@@ -136,13 +136,13 @@ Reading it top to bottom: `on:` is the trigger (push and pull request). `runs-on
machine. The `steps:` are the four moves — checkout, set up Python, install the tools, then the two
checks. `uses:` pulls in a pre-built action (someone else's reusable step); `run:` is just a shell
command. The linter runs first because it's cheap; the tests run last because they're the
expensive, decisive check. Only the linter needs a `pip install` here the tests run on Python's
expensive, decisive check. Only the linter needs a `pip install` here; the tests run on Python's
standard-library `unittest` runner from Module 13, so there's nothing to install for them.
This file lives *in the repo*, committed and versioned like everything else. That's deliberate and
on-thesis: your pipeline is code, it's reviewed as a diff in a PR (Module 10), and a teammate or an
agent inherits it automatically by cloning. The same logic as committing the AI's config in
Module 5 — the automation around your work is itself a durable, shared artifact.
This file lives *in the repo*, committed and versioned like everything else. That's deliberate:
your pipeline is code, it's reviewed as a diff in a PR (Module 10), and a teammate or an agent
inherits it automatically by cloning. The same logic as committing the AI's config in Module 5.
The automation around your work is itself a durable, shared artifact.
### Reading a failed run
@@ -154,32 +154,32 @@ When CI goes red, the skill is triage, and it's fast once you know the shape:
3. **Read that step's log.** It's the same output the tool prints in your terminal — a failing
`unittest` assertion, a `ruff` finding with a file and line number. CI didn't invent a new error
format; it's showing you the command's own output.
4. **Reproduce it locally.** Run the exact command from the failed step (`python -m unittest` or
`ruff check .`) on your machine. It will fail the same way, because CI ran the same command. Fix
it locally, confirm it's green locally, push again.
4. **Reproduce it locally.** The same command from the failed step (`python -m unittest` or
`ruff check .`) fails the same way on your own machine, because CI ran exactly that command. That
reproducibility is the point: fix locally, confirm green locally, push again.
That loop red on the forge, reproduce locally, fix, push is the entire day-to-day of working
with CI. The clean-machine runner occasionally surfaces a failure you *can't* reproduce locally;
that's not CI being flaky, that's CI correctly catching that your machine has something the clean
That loop (red on the forge, reproduce locally, fix, push) is the entire day-to-day of working
with CI. The clean-machine runner occasionally surfaces a failure you *can't* reproduce locally.
That's not CI being flaky; it's CI correctly catching that your machine has something the clean
one doesn't. (See "Where it breaks.")
---
## The AI angle
This is the module where CI stops being generic devops hygiene and becomes specifically, urgently
about AI-assisted work.
This is the module where CI stops being generic devops hygiene and becomes specifically about
AI-assisted work.
AI generates code that **looks right.** That's not a knock on the models it's their defining
AI generates code that **looks right.** That's not a knock on the models; it's their defining
property. They produce fluent, plausible, well-formatted code that passes a human skim, because
"looks like correct code" is close to what they're optimizing for. The failure mode isn't garbage
that obviously won't run; it's the function that's 95% right with a flipped comparison, the refactor
that quietly drops an edge case, the "cleanup" that breaks one path you didn't think to re-check.
A human reviewer skimming a confident-looking diff is exactly the reviewer that misses these
(Module 10 is the whole skill of *not* missing them and it's hard).
(Module 10 is the whole skill of *not* missing them, and it's hard).
CI is the reviewer that doesn't skim. It runs the code. It doesn't care how clean the diff looks or
how confidently the commit message is worded it executes the tests and reports the exit code. The
how confidently the commit message is worded; it executes the tests and reports the exit code. The
flipped comparison fails an assertion. The dropped edge case fails the test that covered it. The
plausibility that fools a human is invisible to a process that only checks behavior.
@@ -187,13 +187,14 @@ This compounds with everything else AI changes about your workflow:
- **AI raises your push rate.** You're making more changes, faster, more of them generated. Manual
pre-push checking scales with discipline and doesn't survive volume. The automated gate scales
for free it doesn't get tired on the fortieth push of the day.
for free; it doesn't get tired on the fortieth push of the day.
- **AI can fix what CI catches.** A red CI run is a precise, machine-readable problem statement: the
exact command, the exact failing assertion, the exact line. That's ideal input for an agent
paste the failed log and ask it to fix the failure. (Module 25 automates this into agents that
respond to a failing pipeline on their own. CI is the trigger that makes self-healing possible.)
exact command, the exact failing assertion, the exact line. That's ideal input for an agent. Paste
the failed log into Claude Code (or your agent) and direct it to fix the failure. (Module 25
automates this into agents that respond to a failing pipeline on their own. CI is the trigger that
makes self-healing possible.)
- **CI is the gate that makes letting agents run safely possible at all.** Every later module that
hands the AI more autonomy issue-to-PR agents, unattended runs relies on the fact that nothing
hands the AI more autonomy (issue-to-PR agents, unattended runs) relies on the fact that nothing
the agent produces reaches anyone without passing CI first. The supervision is structural: it's
this gate, not a human watching the agent type.
@@ -204,8 +205,9 @@ the more you need a reviewer that checks behavior instead of believing the diff.
## Hands-on lab
**Lab language:** YAML (the CI config) plus the Python `tasks-app` and shell commands. You won't
write much by hand — you'll commit a starter workflow, watch it pass, then break it on purpose.
**Lab language:** YAML (the CI config) plus the Python `tasks-app` and shell commands. You direct
the agent to place files, commit, and recover; you commit a starter workflow, watch it pass, then
break it on purpose and watch CI catch it.
**You'll need:**
@@ -214,71 +216,83 @@ write much by hand — you'll commit a starter workflow, watch it pass, then bre
- `ci-starter.yml` — the workflow (GitHub Actions flavor).
- `gitlab-ci-starter.yml` — the same pipeline for GitLab, if that's your forge.
- `test_tasks.py` — a small test suite (use your Module 13 tests instead if you have them).
- Python 3.10+ locally, and your AI assistant.
- Python 3.10+ locally, and your agent. Examples use **Claude Code**; sub your own agent anywhere.
### Part A — Run the checks locally first
Never push a workflow you haven't run by hand. CI just runs the same commands prove they work on
Never push a workflow you haven't run by hand. CI just runs the same commands, so prove they work on
your machine first.
1. Copy `lab/test_tasks.py` into your `tasks-app` folder (next to `tasks.py`). Install the tools and
run both checks exactly as CI will:
1. Direct your agent to set up the project, then run the checks yourself once. Tell Claude Code (sub
your own agent): *"Copy the lab's `test_tasks.py` next to `tasks.py` in `~/ai-workflow-course/tasks-app`,
then install `ruff` into this project."* The agent places the file and handles the install,
including the PEP 668 fallback (a per-project venv) if the system Python refuses a global install.
What it runs looks like:
```bash
cd ~/ai-workflow-course/tasks-app
pip install ruff
# if pip is refused with "externally-managed-environment" (PEP 668, common on recent
# Debian/Ubuntu and Homebrew Python), the agent falls back to a per-project venv:
# python3 -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
# pip install ruff
```
Then run both checks **yourself**, once. This is the one part you do by hand on purpose: feeling
that CI is nothing more than these same two commands is what makes the rest of the module click.
```bash
python -m unittest # should report all tests passing
ruff check . # should report no issues (or fix what it flags)
```
If both are clean locally, CI will be green. If not, fix it here it's faster than waiting on a
runner.
> **If `pip install` is refused** with "externally-managed-environment" (PEP 668 — common on
> recent Debian/Ubuntu and Homebrew Python), install into a per-project virtual environment
> instead: `python3 -m venv .venv && source .venv/bin/activate` (Windows:
> `.venv\Scripts\activate`), then re-run `pip install ruff`. Only the linter needs installing — the
> stdlib `unittest` runner needs nothing. (`pipx` or `pip install --break-system-packages` also
> work; a venv is the clean default.)
If both are clean locally, CI will be green. If not, fix it here; it's faster than waiting on a
runner. (Only the linter needs installing. The stdlib `unittest` runner ships with Python.)
### Part B — Add the workflow and watch it pass
2. Put the workflow where your forge looks for it:
- **GitHub / Forgejo / Gitea:** copy `lab/ci-starter.yml` to `.github/workflows/ci.yml` in your
repo (Forgejo/Gitea also read `.forgejo/workflows/` or `.gitea/workflows/` — check yours).
- **GitLab:** copy `lab/gitlab-ci-starter.yml` to `.gitlab-ci.yml` at the repo root.
2. Direct the agent to put the workflow where your forge looks for it. Tell Claude Code which forge
you're on and let it pick the path:
- **GitHub / Forgejo / Gitea:** `lab/ci-starter.yml` goes to `.github/workflows/ci.yml` (Forgejo/Gitea
also read `.forgejo/workflows/` or `.gitea/workflows/`; the agent checks which yours uses).
- **GitLab:** `lab/gitlab-ci-starter.yml` goes to `.gitlab-ci.yml` at the repo root.
3. Commit and push it:
3. Direct the agent to commit and push it, then verify. Tell Claude Code: *"Stage the new workflow
and `test_tasks.py`, commit with a message about adding CI, and push."* Let it decide what to
stage and run the git for you. What it runs looks like:
```bash
git add .github/workflows/ci.yml test_tasks.py # adjust path for your forge
git add .github/workflows/ci.yml test_tasks.py # path varies by forge; the agent picks it
git commit -m "Add CI: lint and test on every push"
git push
```
Verify it committed the workflow and the test file (a `git show --stat HEAD` confirms what landed),
not stray files.
4. Open your repo in the forge's web UI and find the run (usually an "Actions," "CI/CD," or
"Pipelines" tab, and a status icon on the commit). Watch the steps execute and turn green.
**That green check is the gate now standing guard on every future push.** (Self-host track: if
the run sits queued with nothing picking it up, that's the no-hosted-runner situation from the
prerequisites the workflow is correct, it just has no compute until you attach a runner in
Module 19. Run this part on a SaaS forge to see green here and now.)
prerequisites; the workflow is correct, it just has no compute until you attach a runner in
Module 19. Run this part on a SaaS forge to see green right now.)
### Part C — Break it on purpose and watch CI catch it
This is the whole point. You're going to ship the kind of plausible-but-wrong change AI produces,
and watch CI stop it.
5. Introduce a breaking change. Ask your AI assistant — in the browser, or with your editor-
integrated tool from Module 4 — for something that *sounds* like a cleanup but changes behavior.
For example: *"Refactor `pending()` in tasks.py to be simpler"* and, if it stays correct, nudge
it until the logic actually changes — or just make the change yourself to feel it. A classic
plausible break: have `pending()` return `self.tasks` (all tasks) instead of filtering out the
done ones. It reads fine. It's wrong.
5. Introduce a breaking change with the agent. Ask Claude Code (sub your own) for something that
*sounds* like a cleanup but changes behavior: *"Refactor `pending()` in tasks.py to be simpler."*
If it stays correct, nudge it until the logic actually changes. The classic plausible break: have
`pending()` return `self.tasks` (all tasks) instead of filtering out the done ones. It reads fine.
It's wrong.
6. **Notice it still looks right.** Glance at the diff. The function is short, clean, plausible.
This is exactly the trap from "The AI angle" nothing in the *appearance* warns you.
This is exactly the trap from "The AI angle": nothing in the *appearance* warns you.
7. Commit and push it:
7. Direct the agent to commit and push the change it just made. Tell Claude Code: *"Commit this and
push it."* What it runs looks like:
```bash
git add tasks.py
@@ -286,31 +300,34 @@ and watch CI stop it.
git push
```
Then verify CI goes red.
8. Watch CI go red. Open the run, find the first failed step (`Test`), and read the log:
`test_pending_excludes_completed_tasks` failed, with the assertion and the actual-vs-expected
values. CI caught in seconds what a skim would have waved through.
9. Reproduce and fix. The bad change is already committed *and pushed*, so `git restore` is no help
here — it only discards *uncommitted* edits, and there are none. The team-safe undo for something
already on shared history is `git revert` (Module 12): it writes a **new** commit that inverts the
bad one, instead of rewriting history other people may have pulled.
9. Hand the failure to the agent and let it recover. Paste the red CI log (the failed `Test` step)
into Claude Code and direct it: *"Reproduce this locally, then undo the bad change safely; it's
already pushed."* Your job is to verify it makes the right call, not to type git. The check:
because the commit is already on shared history, the team-safe undo is `git revert`, not
`git restore` (Module 12). What the agent runs looks like:
```bash
python -m unittest # fails locally too same command, same failure
git revert HEAD # new commit that undoes "Simplify pending()" (Module 12)
git push # CI re-runs on the fixed code and goes green again
python -m unittest # fails locally too: same command, same failure
git revert --no-edit HEAD # new commit that undoes "Simplify pending()" (Module 12)
git push # CI re-runs on the fixed code and goes green again
```
`git revert HEAD` opens an editor with a prefilled message (`Revert "Simplify pending()"`) — save
and close it. The revert restores the correct `pending()`, the push triggers CI on the fixed code,
and the run goes green.
Verify CI goes green again, and that the agent chose revert (a new inverting commit) over a
history-rewriting undo on a branch others may have pulled.
10. *(Optional, to feel the linter tier.)* Add an obviously unused import to `cli.py`
(`import os` at the top, unused), commit, and push. Watch the **Lint** step fail *before* the
tests even run the cheap check failing fast. Remove it and push again.
(`import os` at the top, unused), then direct the agent to commit and push. Watch the **Lint**
step fail *before* the tests even run: the cheap check failing fast. Have the agent remove it and
push again.
You've now seen both halves: CI passing as a quiet guardrail, and CI failing as the reviewer that
caught a change you might have trusted.
You've now seen both halves: CI passing as a guardrail that stays out of your way, and CI failing as
the reviewer that caught a change you might have trusted.
---
@@ -324,7 +341,7 @@ The honest caveats, because a skeptical audience trusts the limits more than the
better. The flipped-comparison bug above got caught *because a test covered it.*
- **Green CI is not "reviewed."** It checks behavior, not design, intent, security, or whether the
feature is even the right one. It does not replace human review (Module 10) or the security gates
in Module 15 it sits alongside them. Treating a green check as sign-off is how plausible-wrong
in Module 15; it sits alongside them. Treating a green check as sign-off is how plausible-wrong
code with no failing test sails straight through.
- **The clean machine is a feature that feels like a bug.** Sooner or later CI fails in a way you
can't reproduce locally — a dependency you have installed but never declared, a file outside the