fix(M7-27+capstone): apply AI-drives-git reframe, lesson=theory, de-slop course-wide
Phase 2 sweep — all modules are post-pivot, so the learner directs the AI agent
(Claude Code as the worked example) to do the git/setup work and verifies, instead
of typing commands by hand; no re-teaching basics. Lesson sections are theory with
example output; all execution lives in the labs. De-slopped ("prose" etc. gone
course-wide, em-dash density thinned). /path/to placeholders -> ~/ai-workflow-course.
Every deliberate teaching device verified intact: M10 ai-change.patch trap,
M12 bad-clear-snippet, M13/M27 planted pending_count bug, M15 secret+typosquat+MD5,
M18 BREAK=1, M21 absent-.gitignore, M22 poisoned skill, M24 no-op patch, M25 --simulate.
Labs compile/parse (py/sh/yaml/json); no junk.
Closes #83
Closes #86
Closes #89
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TfzV5QvtPDz8LJS3Pu5VLT
This commit is contained in:
@@ -1,8 +1,8 @@
|
||||
# Module 14 — Continuous Integration
|
||||
|
||||
> **The AI writes code that looks right. CI is the tireless reviewer that checks whether it actually
|
||||
> is — automatically, on every single push, before anyone trusts it.** This module turns the tests
|
||||
> you wrote in Module 13 into a gate that runs itself.
|
||||
> **The AI writes code that looks right. CI checks whether it actually is: automatically, on every
|
||||
> push, before anyone trusts it.** This module turns the tests you wrote in Module 13 into a gate
|
||||
> that runs itself.
|
||||
|
||||
---
|
||||
|
||||
@@ -46,7 +46,7 @@ By the end of this module you can:
|
||||
|
||||
Continuous Integration has a grand-sounding name and a mundane core: **a set of checks that run
|
||||
automatically whenever you push code, on a clean machine you don't control.** That's it. The checks
|
||||
are usually the same commands you'd run by hand — lint, build, test — and the magic is entirely in
|
||||
are usually the same commands you'd run by hand (lint, build, test), and the magic is entirely in
|
||||
the word *automatically*.
|
||||
|
||||
You already run checks. Before you commit, you (sometimes) run the tests, (sometimes) run the
|
||||
@@ -60,12 +60,12 @@ Three properties make CI more than a glorified shell script:
|
||||
- **It's triggered, not invoked.** You don't run CI; pushing runs it. The check is bound to the
|
||||
event, so it can't be skipped by forgetting.
|
||||
- **It runs on a clean machine.** The forge spins up a fresh, throwaway runner with nothing of yours
|
||||
on it — no half-installed dependency, no environment variable you set six months ago and forgot.
|
||||
on it: no half-installed dependency, no environment variable you set six months ago and forgot.
|
||||
If your code only works because of something special about your laptop, CI finds out immediately.
|
||||
("Works on my machine" dies here. Module 16 takes the reproducibility idea further with
|
||||
containers.)
|
||||
- **Its result is visible and shared.** A green check or a red X shows up on the commit and on the
|
||||
pull request (Module 10), where everyone — every human reviewer and, later, every agent — can see
|
||||
pull request (Module 10), where everyone (every human reviewer and, later, every agent) can see
|
||||
whether this code passed the gate.
|
||||
|
||||
### The pipeline: checkout → setup → checks
|
||||
@@ -81,7 +81,7 @@ That last point is the load-bearing one. CI's entire enforcement mechanism is th
|
||||
Every tool you'd run in a terminal returns 0 for success and non-zero for failure. `python -m
|
||||
unittest` exits non-zero if a test fails. `ruff check` exits non-zero if it finds a lint problem. CI runs your
|
||||
commands and watches those exit codes; one failure turns the run red. You're not learning a new
|
||||
testing system — you're wiring the tools you already have to a trigger.
|
||||
testing system; you're wiring the tools you already have to a trigger.
|
||||
|
||||
### What goes in a CI run for this audience
|
||||
|
||||
@@ -136,13 +136,13 @@ Reading it top to bottom: `on:` is the trigger (push and pull request). `runs-on
|
||||
machine. The `steps:` are the four moves — checkout, set up Python, install the tools, then the two
|
||||
checks. `uses:` pulls in a pre-built action (someone else's reusable step); `run:` is just a shell
|
||||
command. The linter runs first because it's cheap; the tests run last because they're the
|
||||
expensive, decisive check. Only the linter needs a `pip install` here — the tests run on Python's
|
||||
expensive, decisive check. Only the linter needs a `pip install` here; the tests run on Python's
|
||||
standard-library `unittest` runner from Module 13, so there's nothing to install for them.
|
||||
|
||||
This file lives *in the repo*, committed and versioned like everything else. That's deliberate and
|
||||
on-thesis: your pipeline is code, it's reviewed as a diff in a PR (Module 10), and a teammate or an
|
||||
agent inherits it automatically by cloning. The same logic as committing the AI's config in
|
||||
Module 5 — the automation around your work is itself a durable, shared artifact.
|
||||
This file lives *in the repo*, committed and versioned like everything else. That's deliberate:
|
||||
your pipeline is code, it's reviewed as a diff in a PR (Module 10), and a teammate or an agent
|
||||
inherits it automatically by cloning. The same logic as committing the AI's config in Module 5.
|
||||
The automation around your work is itself a durable, shared artifact.
|
||||
|
||||
### Reading a failed run
|
||||
|
||||
@@ -154,32 +154,32 @@ When CI goes red, the skill is triage, and it's fast once you know the shape:
|
||||
3. **Read that step's log.** It's the same output the tool prints in your terminal — a failing
|
||||
`unittest` assertion, a `ruff` finding with a file and line number. CI didn't invent a new error
|
||||
format; it's showing you the command's own output.
|
||||
4. **Reproduce it locally.** Run the exact command from the failed step (`python -m unittest` or
|
||||
`ruff check .`) on your machine. It will fail the same way, because CI ran the same command. Fix
|
||||
it locally, confirm it's green locally, push again.
|
||||
4. **Reproduce it locally.** The same command from the failed step (`python -m unittest` or
|
||||
`ruff check .`) fails the same way on your own machine, because CI ran exactly that command. That
|
||||
reproducibility is the point: fix locally, confirm green locally, push again.
|
||||
|
||||
That loop — red on the forge, reproduce locally, fix, push — is the entire day-to-day of working
|
||||
with CI. The clean-machine runner occasionally surfaces a failure you *can't* reproduce locally;
|
||||
that's not CI being flaky, that's CI correctly catching that your machine has something the clean
|
||||
That loop (red on the forge, reproduce locally, fix, push) is the entire day-to-day of working
|
||||
with CI. The clean-machine runner occasionally surfaces a failure you *can't* reproduce locally.
|
||||
That's not CI being flaky; it's CI correctly catching that your machine has something the clean
|
||||
one doesn't. (See "Where it breaks.")
|
||||
|
||||
---
|
||||
|
||||
## The AI angle
|
||||
|
||||
This is the module where CI stops being generic devops hygiene and becomes specifically, urgently
|
||||
about AI-assisted work.
|
||||
This is the module where CI stops being generic devops hygiene and becomes specifically about
|
||||
AI-assisted work.
|
||||
|
||||
AI generates code that **looks right.** That's not a knock on the models — it's their defining
|
||||
AI generates code that **looks right.** That's not a knock on the models; it's their defining
|
||||
property. They produce fluent, plausible, well-formatted code that passes a human skim, because
|
||||
"looks like correct code" is close to what they're optimizing for. The failure mode isn't garbage
|
||||
that obviously won't run; it's the function that's 95% right with a flipped comparison, the refactor
|
||||
that quietly drops an edge case, the "cleanup" that breaks one path you didn't think to re-check.
|
||||
A human reviewer skimming a confident-looking diff is exactly the reviewer that misses these
|
||||
(Module 10 is the whole skill of *not* missing them — and it's hard).
|
||||
(Module 10 is the whole skill of *not* missing them, and it's hard).
|
||||
|
||||
CI is the reviewer that doesn't skim. It runs the code. It doesn't care how clean the diff looks or
|
||||
how confidently the commit message is worded — it executes the tests and reports the exit code. The
|
||||
how confidently the commit message is worded; it executes the tests and reports the exit code. The
|
||||
flipped comparison fails an assertion. The dropped edge case fails the test that covered it. The
|
||||
plausibility that fools a human is invisible to a process that only checks behavior.
|
||||
|
||||
@@ -187,13 +187,14 @@ This compounds with everything else AI changes about your workflow:
|
||||
|
||||
- **AI raises your push rate.** You're making more changes, faster, more of them generated. Manual
|
||||
pre-push checking scales with discipline and doesn't survive volume. The automated gate scales
|
||||
for free — it doesn't get tired on the fortieth push of the day.
|
||||
for free; it doesn't get tired on the fortieth push of the day.
|
||||
- **AI can fix what CI catches.** A red CI run is a precise, machine-readable problem statement: the
|
||||
exact command, the exact failing assertion, the exact line. That's ideal input for an agent —
|
||||
paste the failed log and ask it to fix the failure. (Module 25 automates this into agents that
|
||||
respond to a failing pipeline on their own. CI is the trigger that makes self-healing possible.)
|
||||
exact command, the exact failing assertion, the exact line. That's ideal input for an agent. Paste
|
||||
the failed log into Claude Code (or your agent) and direct it to fix the failure. (Module 25
|
||||
automates this into agents that respond to a failing pipeline on their own. CI is the trigger that
|
||||
makes self-healing possible.)
|
||||
- **CI is the gate that makes letting agents run safely possible at all.** Every later module that
|
||||
hands the AI more autonomy — issue-to-PR agents, unattended runs — relies on the fact that nothing
|
||||
hands the AI more autonomy (issue-to-PR agents, unattended runs) relies on the fact that nothing
|
||||
the agent produces reaches anyone without passing CI first. The supervision is structural: it's
|
||||
this gate, not a human watching the agent type.
|
||||
|
||||
@@ -204,8 +205,9 @@ the more you need a reviewer that checks behavior instead of believing the diff.
|
||||
|
||||
## Hands-on lab
|
||||
|
||||
**Lab language:** YAML (the CI config) plus the Python `tasks-app` and shell commands. You won't
|
||||
write much by hand — you'll commit a starter workflow, watch it pass, then break it on purpose.
|
||||
**Lab language:** YAML (the CI config) plus the Python `tasks-app` and shell commands. You direct
|
||||
the agent to place files, commit, and recover; you commit a starter workflow, watch it pass, then
|
||||
break it on purpose and watch CI catch it.
|
||||
|
||||
**You'll need:**
|
||||
|
||||
@@ -214,71 +216,83 @@ write much by hand — you'll commit a starter workflow, watch it pass, then bre
|
||||
- `ci-starter.yml` — the workflow (GitHub Actions flavor).
|
||||
- `gitlab-ci-starter.yml` — the same pipeline for GitLab, if that's your forge.
|
||||
- `test_tasks.py` — a small test suite (use your Module 13 tests instead if you have them).
|
||||
- Python 3.10+ locally, and your AI assistant.
|
||||
- Python 3.10+ locally, and your agent. Examples use **Claude Code**; sub your own agent anywhere.
|
||||
|
||||
### Part A — Run the checks locally first
|
||||
|
||||
Never push a workflow you haven't run by hand. CI just runs the same commands — prove they work on
|
||||
Never push a workflow you haven't run by hand. CI just runs the same commands, so prove they work on
|
||||
your machine first.
|
||||
|
||||
1. Copy `lab/test_tasks.py` into your `tasks-app` folder (next to `tasks.py`). Install the tools and
|
||||
run both checks exactly as CI will:
|
||||
1. Direct your agent to set up the project, then run the checks yourself once. Tell Claude Code (sub
|
||||
your own agent): *"Copy the lab's `test_tasks.py` next to `tasks.py` in `~/ai-workflow-course/tasks-app`,
|
||||
then install `ruff` into this project."* The agent places the file and handles the install,
|
||||
including the PEP 668 fallback (a per-project venv) if the system Python refuses a global install.
|
||||
What it runs looks like:
|
||||
|
||||
```bash
|
||||
cd ~/ai-workflow-course/tasks-app
|
||||
pip install ruff
|
||||
# if pip is refused with "externally-managed-environment" (PEP 668, common on recent
|
||||
# Debian/Ubuntu and Homebrew Python), the agent falls back to a per-project venv:
|
||||
# python3 -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
|
||||
# pip install ruff
|
||||
```
|
||||
|
||||
Then run both checks **yourself**, once. This is the one part you do by hand on purpose: feeling
|
||||
that CI is nothing more than these same two commands is what makes the rest of the module click.
|
||||
|
||||
```bash
|
||||
python -m unittest # should report all tests passing
|
||||
ruff check . # should report no issues (or fix what it flags)
|
||||
```
|
||||
|
||||
If both are clean locally, CI will be green. If not, fix it here — it's faster than waiting on a
|
||||
runner.
|
||||
|
||||
> **If `pip install` is refused** with "externally-managed-environment" (PEP 668 — common on
|
||||
> recent Debian/Ubuntu and Homebrew Python), install into a per-project virtual environment
|
||||
> instead: `python3 -m venv .venv && source .venv/bin/activate` (Windows:
|
||||
> `.venv\Scripts\activate`), then re-run `pip install ruff`. Only the linter needs installing — the
|
||||
> stdlib `unittest` runner needs nothing. (`pipx` or `pip install --break-system-packages` also
|
||||
> work; a venv is the clean default.)
|
||||
If both are clean locally, CI will be green. If not, fix it here; it's faster than waiting on a
|
||||
runner. (Only the linter needs installing. The stdlib `unittest` runner ships with Python.)
|
||||
|
||||
### Part B — Add the workflow and watch it pass
|
||||
|
||||
2. Put the workflow where your forge looks for it:
|
||||
- **GitHub / Forgejo / Gitea:** copy `lab/ci-starter.yml` to `.github/workflows/ci.yml` in your
|
||||
repo (Forgejo/Gitea also read `.forgejo/workflows/` or `.gitea/workflows/` — check yours).
|
||||
- **GitLab:** copy `lab/gitlab-ci-starter.yml` to `.gitlab-ci.yml` at the repo root.
|
||||
2. Direct the agent to put the workflow where your forge looks for it. Tell Claude Code which forge
|
||||
you're on and let it pick the path:
|
||||
- **GitHub / Forgejo / Gitea:** `lab/ci-starter.yml` goes to `.github/workflows/ci.yml` (Forgejo/Gitea
|
||||
also read `.forgejo/workflows/` or `.gitea/workflows/`; the agent checks which yours uses).
|
||||
- **GitLab:** `lab/gitlab-ci-starter.yml` goes to `.gitlab-ci.yml` at the repo root.
|
||||
|
||||
3. Commit and push it:
|
||||
3. Direct the agent to commit and push it, then verify. Tell Claude Code: *"Stage the new workflow
|
||||
and `test_tasks.py`, commit with a message about adding CI, and push."* Let it decide what to
|
||||
stage and run the git for you. What it runs looks like:
|
||||
|
||||
```bash
|
||||
git add .github/workflows/ci.yml test_tasks.py # adjust path for your forge
|
||||
git add .github/workflows/ci.yml test_tasks.py # path varies by forge; the agent picks it
|
||||
git commit -m "Add CI: lint and test on every push"
|
||||
git push
|
||||
```
|
||||
|
||||
Verify it committed the workflow and the test file (a `git show --stat HEAD` confirms what landed),
|
||||
not stray files.
|
||||
|
||||
4. Open your repo in the forge's web UI and find the run (usually an "Actions," "CI/CD," or
|
||||
"Pipelines" tab, and a status icon on the commit). Watch the steps execute and turn green.
|
||||
**That green check is the gate now standing guard on every future push.** (Self-host track: if
|
||||
the run sits queued with nothing picking it up, that's the no-hosted-runner situation from the
|
||||
prerequisites — the workflow is correct, it just has no compute until you attach a runner in
|
||||
Module 19. Run this part on a SaaS forge to see green here and now.)
|
||||
prerequisites; the workflow is correct, it just has no compute until you attach a runner in
|
||||
Module 19. Run this part on a SaaS forge to see green right now.)
|
||||
|
||||
### Part C — Break it on purpose and watch CI catch it
|
||||
|
||||
This is the whole point. You're going to ship the kind of plausible-but-wrong change AI produces,
|
||||
and watch CI stop it.
|
||||
|
||||
5. Introduce a breaking change. Ask your AI assistant — in the browser, or with your editor-
|
||||
integrated tool from Module 4 — for something that *sounds* like a cleanup but changes behavior.
|
||||
For example: *"Refactor `pending()` in tasks.py to be simpler"* and, if it stays correct, nudge
|
||||
it until the logic actually changes — or just make the change yourself to feel it. A classic
|
||||
plausible break: have `pending()` return `self.tasks` (all tasks) instead of filtering out the
|
||||
done ones. It reads fine. It's wrong.
|
||||
5. Introduce a breaking change with the agent. Ask Claude Code (sub your own) for something that
|
||||
*sounds* like a cleanup but changes behavior: *"Refactor `pending()` in tasks.py to be simpler."*
|
||||
If it stays correct, nudge it until the logic actually changes. The classic plausible break: have
|
||||
`pending()` return `self.tasks` (all tasks) instead of filtering out the done ones. It reads fine.
|
||||
It's wrong.
|
||||
|
||||
6. **Notice it still looks right.** Glance at the diff. The function is short, clean, plausible.
|
||||
This is exactly the trap from "The AI angle" — nothing in the *appearance* warns you.
|
||||
This is exactly the trap from "The AI angle": nothing in the *appearance* warns you.
|
||||
|
||||
7. Commit and push it:
|
||||
7. Direct the agent to commit and push the change it just made. Tell Claude Code: *"Commit this and
|
||||
push it."* What it runs looks like:
|
||||
|
||||
```bash
|
||||
git add tasks.py
|
||||
@@ -286,31 +300,34 @@ and watch CI stop it.
|
||||
git push
|
||||
```
|
||||
|
||||
Then verify CI goes red.
|
||||
|
||||
8. Watch CI go red. Open the run, find the first failed step (`Test`), and read the log:
|
||||
`test_pending_excludes_completed_tasks` failed, with the assertion and the actual-vs-expected
|
||||
values. CI caught in seconds what a skim would have waved through.
|
||||
|
||||
9. Reproduce and fix. The bad change is already committed *and pushed*, so `git restore` is no help
|
||||
here — it only discards *uncommitted* edits, and there are none. The team-safe undo for something
|
||||
already on shared history is `git revert` (Module 12): it writes a **new** commit that inverts the
|
||||
bad one, instead of rewriting history other people may have pulled.
|
||||
9. Hand the failure to the agent and let it recover. Paste the red CI log (the failed `Test` step)
|
||||
into Claude Code and direct it: *"Reproduce this locally, then undo the bad change safely; it's
|
||||
already pushed."* Your job is to verify it makes the right call, not to type git. The check:
|
||||
because the commit is already on shared history, the team-safe undo is `git revert`, not
|
||||
`git restore` (Module 12). What the agent runs looks like:
|
||||
|
||||
```bash
|
||||
python -m unittest # fails locally too — same command, same failure
|
||||
git revert HEAD # new commit that undoes "Simplify pending()" (Module 12)
|
||||
git push # CI re-runs on the fixed code and goes green again
|
||||
python -m unittest # fails locally too: same command, same failure
|
||||
git revert --no-edit HEAD # new commit that undoes "Simplify pending()" (Module 12)
|
||||
git push # CI re-runs on the fixed code and goes green again
|
||||
```
|
||||
|
||||
`git revert HEAD` opens an editor with a prefilled message (`Revert "Simplify pending()"`) — save
|
||||
and close it. The revert restores the correct `pending()`, the push triggers CI on the fixed code,
|
||||
and the run goes green.
|
||||
Verify CI goes green again, and that the agent chose revert (a new inverting commit) over a
|
||||
history-rewriting undo on a branch others may have pulled.
|
||||
|
||||
10. *(Optional, to feel the linter tier.)* Add an obviously unused import to `cli.py`
|
||||
(`import os` at the top, unused), commit, and push. Watch the **Lint** step fail *before* the
|
||||
tests even run — the cheap check failing fast. Remove it and push again.
|
||||
(`import os` at the top, unused), then direct the agent to commit and push. Watch the **Lint**
|
||||
step fail *before* the tests even run: the cheap check failing fast. Have the agent remove it and
|
||||
push again.
|
||||
|
||||
You've now seen both halves: CI passing as a quiet guardrail, and CI failing as the reviewer that
|
||||
caught a change you might have trusted.
|
||||
You've now seen both halves: CI passing as a guardrail that stays out of your way, and CI failing as
|
||||
the reviewer that caught a change you might have trusted.
|
||||
|
||||
---
|
||||
|
||||
@@ -324,7 +341,7 @@ The honest caveats, because a skeptical audience trusts the limits more than the
|
||||
better. The flipped-comparison bug above got caught *because a test covered it.*
|
||||
- **Green CI is not "reviewed."** It checks behavior, not design, intent, security, or whether the
|
||||
feature is even the right one. It does not replace human review (Module 10) or the security gates
|
||||
in Module 15 — it sits alongside them. Treating a green check as sign-off is how plausible-wrong
|
||||
in Module 15; it sits alongside them. Treating a green check as sign-off is how plausible-wrong
|
||||
code with no failing test sails straight through.
|
||||
- **The clean machine is a feature that feels like a bug.** Sooner or later CI fails in a way you
|
||||
can't reproduce locally — a dependency you have installed but never declared, a file outside the
|
||||
|
||||
Reference in New Issue
Block a user