fix(M7-27+capstone): apply AI-drives-git reframe, lesson=theory, de-slop course-wide

Phase 2 sweep — all modules are post-pivot, so the learner directs the AI agent (Claude Code as the worked example) to do the git/setup work and verifies, instead of typing commands by hand; no re-teaching basics. Lesson sections are theory with example output; all execution lives in the labs. De-slopped ("prose" etc. gone course-wide, em-dash density thinned). /path/to placeholders -> ~/ai-workflow-course. Every deliberate teaching device verified intact: M10 ai-change.patch trap, M12 bad-clear-snippet, M13/M27 planted pending_count bug, M15 secret+typosquat+MD5, M18 BREAK=1, M21 absent-.gitignore, M22 poisoned skill, M24 no-op patch, M25 --simulate. Labs compile/parse (py/sh/yaml/json); no junk. Closes #83 Closes #86 Closes #89 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01TfzV5QvtPDz8LJS3Pu5VLT
2026-06-22 21:58:17 -04:00
parent a29823f4b3
commit f925fd9645
38 changed files with 1735 additions and 1424 deletions
@@ -1,8 +1,8 @@
 # Module 10 — Reviewing Code You Didn't Write

 > **The AI wrote a diff that reads beautifully and is wrong in one line you'll skim right past.**
-> Reviewing for *plausibility traps* — not just bugs — is the highest-leverage, least-taught skill
-> in this whole space. This module gives you a gate to run it at and a checklist to run.
+> Reviewing for *plausibility traps*, not just bugs, is a skill almost nobody teaches. This module
+> gives you a gate to run it at and a checklist to run.

 ---

@@ -11,13 +11,13 @@
 - **Module 2 — Version Control as a Safety Net.** You read changes with `git diff`. This module
  turns that one-off habit into a disciplined review pass over a whole change.
 - **Module 8 — Remotes and Hosting.** Your repo lives on a host now, and a change arrives as a
-  *pull request* (GitHub/Gitea/Forgejo) or *merge request* (GitLab) — same thing, different name.
+  *pull request* (GitHub/Gitea/Forgejo) or *merge request* (GitLab): same thing, different name.
  We'll write "PR" throughout; it's the unit of review.
 - **Module 9 — Issues and the Task Layer** (helpful, not required). A PR usually answers an issue;
  the issue is the "what I asked for" you review the diff against.

-If you only have Modules 1–2, you can still do the core skill of this module locally — reviewing a
-diff between two branches with `git diff` — and skip the part where you open it as a PR on a host.
+If you only have Modules 1–2, you can still do the core skill of this module locally (reviewing a
+diff between two branches with `git diff`) and skip the part where you open it as a PR on a host.

 ---

@@ -26,11 +26,11 @@ diff between two branches with `git diff` — and skip the part where you open i
 By the end of this module you can:

 1. Use a pull request as a **review gate**: nothing reaches the main branch without passing through
-   a diff someone (or something) signed off on — even on a solo repo.
+   a diff someone (or something) signed off on, even on a solo repo.
 2. Read an AI-generated diff the right way: against the request, deletions first, the diff over the
   AI's own description of it.
-3. Name and spot the four **plausibility traps** — invented APIs, silent scope creep, deleted
-   edge-case handling, and convincing-but-wrong logic — that pass a human skim and a quick run.
+3. Name and spot the four **plausibility traps** (invented APIs, silent scope creep, deleted
+   edge-case handling, convincing-but-wrong logic) that pass a human skim and a quick run.
 4. Run a repeatable **AI-diff review checklist** and end every review with an explicit
   *approve* / *request changes* decision you can defend.

@@ -42,7 +42,7 @@ By the end of this module you can:

 A pull request proposes merging a branch into another (usually `main`) and pauses there so the
 change can be looked at *before* it lands. On a team that pause is where review happens. The trap
-is treating it as a rubber stamp — "looks good, merge" — which is exactly how bad changes get the
+is treating it as a rubber stamp ("looks good, merge"), which is exactly how bad changes get the
 institutional blessing of "it was reviewed."

 Reframe it the way you already think about change control: **a PR is a change gate, and merge is a
@@ -51,7 +51,7 @@ The cheapest place to catch a problem is in the diff, before the door closes. Yo
 (that's Module 12), but recovery is always more expensive than the review you skipped.

 This holds **even when you're the only human on the repo.** That's not bureaucracy for its own
-sake — the syllabus's own course repo opens a PR for every module for exactly two reasons that
+sake. The syllabus's own course repo opens a PR for every module for exactly two reasons that
 apply to you solo:

 - **Traceability.** The PR is a durable record of *what changed and why*, linked to the issue it
@@ -65,23 +65,23 @@ When the author is an AI, both reasons get sharper. The AI produced the change w
 confidence and no memory of why; the PR is where a human supplies the judgment and the record the
 AI can't.

-### Why this is a genuinely new skill
+### Why this is a new skill

 You already know how to review human code. Reviewing AI code is *not the same activity*, and
 assuming it is gets people burned.

-When a human writes a function, the bugs cluster where the human was uncertain — the gnarly edge,
+When a human writes a function, the bugs cluster where the human was uncertain: the gnarly edge,
 the bit they rushed, the TODO they meant to come back to. You can often *feel* the soft spots, and
 the code's roughness is a signal: confusing code is suspicious code.

 AI output inverts that signal. It is **uniformly fluent.** The variable names are good, the
 structure is clean, the comment above the broken line confidently states the correct intention,
 and the one wrong line looks exactly as polished as the forty right ones. The fluency is constant;
-the correctness is not — and your eye has spent a career using fluency as a proxy for correctness.
+the correctness is not, and your eye has spent a career using fluency as a proxy for correctness.
 That proxy is now actively misleading.

 So the question shifts. With human code you mostly ask *"is this good code?"* With AI code you have
-to ask *"is this code true?"* — does it do what it claims, against the request I actually made,
+to ask *"is this code true?"*: does it do what it claims, against the request I actually made,
 using things that actually exist. That's reviewing for **plausibility traps**: code engineered (by
 a process optimizing for plausible-looking output) to pass exactly the skim you're tempted to give
 it.
@@ -92,15 +92,15 @@ These are the failure modes to hunt for specifically. They're not random bugs; t
 characteristic ways fluent-but-untrue code goes wrong.

 **1. Invented APIs.** The model reaches for a function, method, keyword argument, flag, config key,
-or endpoint that *should* exist by analogy — and doesn't, or exists with a different signature.
+or endpoint that *should* exist by analogy, and doesn't, or exists with a different signature.
 It's the same generative move behind hallucinated package names (the supply-chain version of this
 gets its own treatment in Module 15). The tell is that it reads *more* natural than the real API,
 because it was generated to be plausible rather than recalled from docs. Classic shape: assuming
 `list.pop(i, default)` works because `dict.pop(k, default)` does. Verify every unfamiliar
-symbol against real docs or source — confidence in the surrounding prose is not evidence.
+symbol against real docs or source. Confidence in the surrounding words is not evidence.

 **2. Silent scope creep.** You asked for one thing; the diff does that thing *and* quietly
-"improves" three others it was never asked to touch — reformatting a file, reshuffling imports,
+"improves" three others it was never asked to touch: reformatting a file, reshuffling imports,
 renaming a variable across the module, "simplifying" an unrelated function. Each extra edit is an
 unrequested change you now have to review with no stated intent behind it, and it's where
 regressions hide. The discipline: **every hunk must trace back to the request.** Anything that
@@ -109,7 +109,7 @@ own PR."

 **3. Deleted edge-case handling.** The most dangerous trap, because it lives in the `-` lines you
 skim. While implementing the feature, the model drops a bounds check, removes a `None` guard,
-collapses a `try/except` into the happy path, or — worst — *replaces a real error with a silent
+collapses a `try/except` into the happy path, or, worst, *replaces a real error with a silent
 swallow* (`except: pass`) under the banner of "making it robust." The code now looks cleaner and
 passes every test you'd casually run, because you'd test the path that works. The bad input that
 the deleted guard existed to catch now fails silently. **Read every deletion. Deletions are where
@@ -118,29 +118,35 @@ behavior disappears.**
 **4. Convincing-but-wrong logic.** An inverted condition (`if not x` where it meant `if x`), an
 off-by-one, `<` where it meant `<=`, `and` where it meant `or`, a filter quietly dropped from a
 comprehension. On the happy path it often produces a believable-enough result, and the comment
-above it cheerfully describes the *correct* behavior — so the comment actively vouches for the bug.
+above it cheerfully describes the *correct* behavior, so the comment actively vouches for the bug.
 The defense is to **trace one real call through the changed code yourself** instead of trusting the
 narration.

-A real AI diff usually has *most lines correct* and one trap buried in legitimate work — which is
-what makes it dangerous. The feature genuinely works when you try it; the trap is somewhere you
+A real AI diff usually has *most lines correct* and one trap buried in legitimate work, which is
+what makes it dangerous. The feature really does work when you try it; the trap is somewhere you
 didn't look.

 ### How to actually read the diff

-Mechanics first. You want the change as one reviewable unit, separate from the code you wrote it in:
+You want the change as one reviewable unit, separate from the editor you generated it in. On your
+host's PR page that's the default view: the whole change as a diff, with line comments,
+file-by-file navigation, and CI results attached. The same change reads as a block of `+`/`-`
+lines, for example a hunk that quietly drops a guard:

-```bash
-git fetch                       # get the branch the PR is built from
-git diff main..feature-branch   # the whole change, as one diff
+```diff
+ def charge(amount):
+-    if amount <= 0:
+-        raise ValueError("amount must be positive")
+     gateway.charge(amount)
 ```

-On your host's PR page you get the same diff with line comments, file-by-file navigation, and the
-CI results attached — use it. But the content of the review is the same whether you read it in the
-browser or the terminal.
+That block is the unit of review, whether you read it in the browser or have the agent pull it up
+in the terminal. You already know the git for this from Module 2, and from Module 4 on the agent
+fetches the branch and surfaces the diff for you. Your job is the reading, and reading the `-`
+lines first: the deleted guard above is exactly the kind of thing a skim sails past.

-Then run the pass in this order (the full version is in
-[`lab/ai-diff-review-checklist.md`](lab/ai-diff-review-checklist.md) — keep it open while you work):
+Run the pass in this order (the full version is in
+[`lab/ai-diff-review-checklist.md`](lab/ai-diff-review-checklist.md), keep it open while you work):

 1. **State the request in one sentence.** This is your scope yardstick. If it answers an issue
   (Module 9), that's your sentence.
@@ -148,14 +154,14 @@ Then run the pass in this order (the full version is in
   what it *did*. Only the diff is real.
 3. **Scope check.** Every hunk maps to the request. Flag everything that doesn't.
 4. **Deletions first.** Read every `-` line and ask what behavior just left the codebase.
-5. **Verify the unfamiliar.** Every API, flag, and key you don't personally know exists —
+5. **Verify the unfamiliar.** Every API, flag, and key you don't personally know exists:
   check it.
-6. **Trace one real call**, including a failure case. Not the happy path — the bad input.
+6. **Trace one real call**, including a failure case. Not the happy path, the bad input.
 7. **Decide.** Approve only if you can explain every hunk. Otherwise request changes. The burden of
   proof is on the diff, not on you.

 That last point is the whole posture: **a diff is guilty until proven correct.** "It runs" is the
-weakest evidence there is — the traps above are *designed* to run.
+weakest evidence there is; the traps above are *designed* to run.

 ---

@@ -164,20 +170,20 @@ weakest evidence there is — the traps above are *designed* to run.
 Every other module here makes a tool more valuable because of AI. This module is the one where the
 *human stays in the loop on purpose*, and it's worth being precise about why.

-The thing AI is best at — producing fluent, confident, well-structured output — is precisely the
+The thing AI is best at, producing fluent, confident, well-structured output, is precisely the
 thing that defeats the review reflex you built reviewing humans. You learned to trust clean code
 and distrust messy code; AI produces uniformly clean code regardless of whether it's correct, so
 that heuristic now points the wrong way. Reviewing AI diffs means consciously *overriding* an
 instinct that served you well for years.

-And the volume cuts against you. AI makes generating a 300-line PR almost free, which quietly
-shifts the bottleneck from *writing* to *reviewing* — and tempts everyone to review at the speed
-they generate. The economics of the team now hinge on review being the gate that writing no longer
-is. The fluent-but-wrong line costs nothing to produce and everything to miss.
+And the volume cuts against you. AI makes generating a 300-line PR almost free, which shifts the
+bottleneck from *writing* to *reviewing* and tempts everyone to review at the speed they generate.
+Review is now the gate that writing no longer is. The fluent-but-wrong line costs nothing to
+produce and everything to miss.

 This is the human half of a loop you'll keep building. Module 11 wires this review gate into the
 full issue → branch → PR → review → merge motion with humans *and* agents as contributors. Much
-later, Module 24 looks at AI *reviewers* that comment on PRs automatically — but an automated
+later, Module 24 looks at AI *reviewers* that comment on PRs automatically, but an automated
 reviewer is an assistant to this skill, not a replacement for it. You can't supervise a review bot
 you couldn't do yourself.

@@ -190,28 +196,41 @@ real change, then review a diff the "AI" produced and catch the trap planted in

 **You'll need:**

- Git, Python 3.10+, and your AI assistant.
+- Git, Python 3.10+, and your coding agent (Claude Code in the examples; sub your own).
 - The starter base app in [`lab/tasks-app/`](lab/tasks-app/) (`tasks.py`, `cli.py`). It's the
  Module 1/2 app with one addition: `complete()` validates the index and `done` turns a bad index
-  into a clean error. Note that behavior — the trap will mess with it.
+  into a clean error. Note that behavior; the trap will mess with it.
 - The planted AI change in [`lab/ai-change.patch`](lab/ai-change.patch).
 - The review checklist in [`lab/ai-diff-review-checklist.md`](lab/ai-diff-review-checklist.md).
 - **Optional (Part A as a real PR):** the repo you pushed to a host in Module 8. If you don't have
-  one, do Part A locally as a branch — the review skill in Parts B–C is identical either way.
+  one, do Part A locally as a branch; the review skill in Parts B–C is identical either way.

 ### Part A — Open a PR as a gate

-1. Set up the base app as a repo and confirm its baseline behavior. This `review-lab` is a
-   throwaway repo *separate* from the `tasks-app` you've built up across earlier modules — you can
-   delete it when you're done, and nothing here touches your main app. (Use your real course path in
-   place of `/path/to/`, the same copy-it-in move from Module 5.)
+1. Have your agent set up the base app as a throwaway `review-lab` repo, then confirm the baseline
+   behavior yourself. This `review-lab` is *separate* from the `tasks-app` you've built up across
+   earlier modules; you can delete it when you're done, and nothing here touches your main app. From
+   Module 4 on the agent drives the git and setup, so direct Claude Code (sub your own agent) to
+   scaffold it:
+
+   > *"Make a new directory `~/ai-workflow-course/review-lab` and copy the two Python files from
+   > `~/ai-workflow-course/the-workflow-course/modules/10-reviewing-code-you-didnt-write/lab/tasks-app/`
+   > into it. Add a `.gitignore` that ignores `tasks.json` and `__pycache__/` so runtime state stays
+   > out of the diffs. Initialize a git repo on a branch named `main`, stage everything, and make one
+   > commit: `base: tasks-app`."*
+
+   The branch name is load-bearing: the steps below diff against `main` and switch back to it, so
+   verify the agent actually used `main` (not whatever its default is). Confirm the result:

   ```bash
-   mkdir -p ~/ai-workflow-course/review-lab && cd ~/ai-workflow-course/review-lab
-   cp /path/to/modules/10-reviewing-code-you-didnt-write/lab/tasks-app/*.py .
-   printf 'tasks.json\n__pycache__/\n' > .gitignore   # keep generated runtime state out of your review diffs (Module 2)
-   git init -qb main && git add . && git commit -qm "base: tasks-app"   # -b main so the git switch main / git diff main.. steps below resolve
+   cd ~/ai-workflow-course/review-lab
+   git log --oneline        # one commit, "base: tasks-app", on branch main
+   git status               # clean tree; tasks.json ignored, not tracked
+   ```

+   Then see the baseline behavior with your own eyes, because the trap is going to change it:
+
+   ```bash
   python cli.py add "write the review module"
   python cli.py done 99        # baseline: prints "error: no task at index 99", exits non-zero
   echo "exit code: $?"
@@ -219,36 +238,35 @@ real change, then review a diff the "AI" produced and catch the trap planted in

   Remember that last result. A bad index is a clean, loud error today.

-2. Make a small honest change of your own on a branch — ask your AI for a one-line tweak, e.g.
-   *"make the empty-list message say '(nothing to do)' instead of '(no tasks yet)'"* — apply it,
-   commit it, and open it as a PR:
+2. Now practice the gate on a trivial, honest change. Tell the agent to make a one-line tweak on
+   its own branch and put it up for review:

-   ```bash
-   git switch -c tweak-empty-message
-   # apply the AI's one-line change to tasks.py, then:
-   git add . && git commit -m "Friendlier empty-list message"
-   ```
+   > *"On a new branch `tweak-empty-message`, change the empty-list message in `tasks.py` from
+   > '(no tasks yet)' to '(nothing to do)'. Commit it as 'Friendlier empty-list message'. If this
+   > repo has a remote, push the branch and open a pull request; otherwise leave it on the branch."*

-   If you have a Module 8 remote: `git push -u origin tweak-empty-message`, then open the PR on
-   your host and read your own diff in the PR view. If you're local-only:
-   `git diff main..tweak-empty-message`. Either way, **review your own one-line change as a diff
-   before merging it.** Get used to the gate on a trivial change so it's a reflex on a dangerous
-   one. Merge it when you're satisfied (`git switch main && git merge tweak-empty-message`).
+   Your job is the review, not the plumbing. Read the resulting diff before it lands: on the PR page
+   if the agent opened one, or with `git diff main..tweak-empty-message` if you're local-only. It's
+   one line, and that's the point. Make reading-before-merging a reflex on a trivial change so it's
+   automatic on a dangerous one. Once you've read it and it's exactly what you asked for, tell the
+   agent to merge it into `main`.

 ### Part B — Review the AI's diff (the real exercise)

 3. Now a teammate-who-is-an-AI has opened a PR. The prompt it was given was exactly:
-   **"Add a `delete <index>` command to the tasks app."** Bring its change in on its own branch.
-   `git apply` lays the AI's proposed change onto this branch as if it were its PR, so you can read
-   it before deciding whether to keep it — exactly what you'd be doing in a real PR review. (Again,
-   use your real course path in place of `/path/to/`.)
+   **"Add a `delete <index>` command to the tasks app."** The change is captured as a patch in the
+   lab so the review is reproducible. Have the agent stage it as that teammate's PR, on its own
+   branch:

-   ```bash
-   git switch main
-   git switch -c ai-delete-command
-   git apply /path/to/modules/10-reviewing-code-you-didnt-write/lab/ai-change.patch
-   git add . && git commit -m "Add delete command"
-   ```
+   > *"From `main`, create a branch `ai-delete-command`. Apply the patch at
+   > `~/ai-workflow-course/the-workflow-course/modules/10-reviewing-code-you-didnt-write/lab/ai-change.patch`
+   > to the working tree, then commit it as 'Add delete command'. Don't review or 'fix' it; just
+   > land it on the branch so I can review it."*
+
+   `git apply` is how the lab injects the incoming change so you can read it before deciding whether
+   to keep it, exactly what you'd do in a real PR review. Telling the agent not to clean it up
+   matters: left to its own judgment it might "helpfully" repair the planted problem before you
+   ever see it.

 4. **Review it before you run it.** Open the checklist and read the diff as one unit:

@@ -275,15 +293,15 @@ real change, then review a diff the "AI" produced and catch the trap planted in
   ```

   In the base app, `done 99` was a clean error with a non-zero exit. After this "add a delete
-   command" change, it prints `updated` and exits `0` — silently claiming success while marking
+   command" change, it prints `updated` and exits `0`, silently claiming success while marking
   nothing. The diff *only said* it was adding `delete`. While in the file it also rewrote
   `complete()` to swallow the `IndexError` "for robustness," deleting the edge-case handling and
   turning a loud failure into a silent lie. That's three traps in one small hunk: **scope creep**
   (it touched `complete`, which the request never mentioned), **deleted edge-case handling**, and
   **convincing-but-wrong logic** wearing a reassuring comment.

-6. Play it out. On your host's PR you'd leave a line comment on the `complete()` hunk —
-   *"out of scope, and this swallows the error `done` relied on; please drop it"* — and **request
+6. Play it out. On your host's PR you'd leave a line comment on the `complete()` hunk
+   (*"out of scope, and this swallows the error `done` relied on; please drop it"*) and **request
   changes** rather than approve. The feature you were asked for was fine; the PR still doesn't
   merge. That's the gate doing its job.

@@ -293,11 +311,11 @@ real change, then review a diff the "AI" produced and catch the trap planted in

 - **A checklist is a floor, not a ceiling.** It catches the characteristic traps reliably; it will
  not catch a deep logic error that requires understanding the whole system. For changes in code
-  you don't know, reviewing the diff in isolation isn't enough — that harder case (pointing AI at
+  you don't know, reviewing the diff in isolation isn't enough; that harder case (pointing AI at
  an unfamiliar codebase, and reviewing safely there) is Module 23.
 - **Tests catch what review misses, and vice versa.** This module is human review; it pairs with
  automated testing and CI (Modules 13–14), which catch the regressions a tired reviewer skims
-  past. Neither replaces the other — the trap in this lab passes a casual run *and* would pass a
+  past. Neither replaces the other: the trap in this lab passes a casual run *and* would pass a
  test suite that only tests the happy path. Review is what notices the test you *should* have.
 - **Review fatigue is real and AI makes it worse.** Twenty fluent PRs in a day will wear down the
  exact attention this skill needs, and a rubber-stamped review is worse than none because it
@@ -305,7 +323,7 @@ real change, then review a diff the "AI" produced and catch the trap planted in
  small and single-purpose so each one is reviewable in full. A PR too big to review honestly
  should be sent back to be split, not skimmed.
 - **You can't review what you don't understand.** If a diff uses an API or a corner of the language
-  you don't know, "looks fine" is not a review — that's the moment to verify it exists and does
+  you don't know, "looks fine" is not a review; that's the moment to verify it exists and does
  what it claims, or to pull in someone who knows. The honest output of a review is sometimes
  "I'm not qualified to approve this," and that's a valid result.

@@ -315,17 +333,17 @@ real change, then review a diff the "AI" produced and catch the trap planted in

 **You're done when:**

- You've opened (or branched) a change and reviewed it as a diff *before* merging — the gate is a
-  reflex, even on a one-liner.
+- You've opened (or branched) a change and reviewed it as a diff *before* merging, so the gate is a
+  reflex even on a one-liner.
 - You found the planted trap in `ai-change.patch` by reading the diff against the one-sentence
  request, and named *why* it's a trap (it changed `complete()`, which the request never mentioned,
  and swallowed the error `done` depended on).
 - You confirmed it by running the **failure** case (`done 99`) and seeing the silent `updated` +
  exit `0`, instead of trusting the happy path (`delete 0`) that worked fine.
- You can name the four plausibility traps from memory — invented APIs, silent scope creep, deleted
-  edge-case handling, convincing-but-wrong logic — and you treat a diff as guilty until proven
+- You can name the four plausibility traps from memory (invented APIs, silent scope creep, deleted
+  edge-case handling, convincing-but-wrong logic) and you treat a diff as guilty until proven
  correct.

 When "it runs" stops feeling like sufficient evidence and "I read every `-` line" starts feeling
 mandatory, you've got the skill. Module 11 takes this gate and wires it into the full collaboration
-loop — issues, branches, PRs, and merges — with both humans and agents as contributors.
+loop (issues, branches, PRs, and merges) with both humans and agents as contributors.
@@ -7,7 +7,7 @@ file; it's to interrogate **the change** against the prompt you gave. Work top t

 - [ ] **What did I actually ask for?** Write the request in one sentence. Every changed line
      should trace back to it.
- [ ] **Read the diff, not the prose.** Ignore the AI's summary of what it did; the diff is the
+- [ ] **Read the diff, not the summary.** Ignore the AI's account of what it did; the diff is the
      only ground truth. (`git diff main..<branch>`)

 ## 1. Scope — did it change only what was asked?