Reframe sweep M7-27 + capstone (AI drives git, lesson=theory, de-slop) (#93)

Co-authored-by: claude <claude@jpaul.io> Co-committed-by: claude <claude@jpaul.io>
2026-06-22 21:58:36 -04:00
parent a29823f4b3
commit 513d7e7ac8
38 changed files with 1735 additions and 1424 deletions
@@ -56,7 +56,7 @@ something that matters.** You're not asked to build it. You're asked to change o
 without breaking the other thousand things you've never read.

 This is where AI is simultaneously most tempting and most dangerous. Tempting, because "just ask the
-AI to figure it out" feels like exactly the leverage you need against 200,000 lines you don't know.
+AI to figure it out" feels like exactly the help you need against 200,000 lines you don't know.
 Dangerous, because the AI's two default failure modes get *worse* the bigger and less familiar the
 codebase is:

@@ -64,7 +64,7 @@ codebase is:
  model whether or not the real auth lives there. It confidently describes structure it inferred
  from names, not from reading. In a small repo you'd catch it. In a huge one you won't.
 - **It rewrites instead of edits.** Ask for a small change and it hands you a "cleaned-up" version of
-  the whole file — reformatted, renamed, restructured — burying your one-line fix in a 300-line diff
+  the whole file (reformatted, renamed, restructured) burying your one-line fix in a 300-line diff
  nobody can review. In code you wrote, that's annoying. In code you didn't, it's how an invisible
  regression ships.

@@ -90,7 +90,7 @@ table — and crucially, a list of **open questions the code didn't answer.** A
 trustworthy. A map with no gaps is fiction. This phase is **read-only**; nothing changes on disk.

 **3. Change — the smallest scoped, tested, reviewable diff.** Only now do you edit. One change, one
-branch (Module 6). Find the blast radius first — every caller of what you're touching — and if you
+branch (Module 6). Find the blast radius first, every caller of what you're touching, and if you
 can't enumerate them, you're not ready. Make the minimal edit, add a test that fails without it,
 run the *full* existing suite, and self-review the diff like it's someone else's PR (Module 10). No
 drive-by reformatting. No "while I was in here." The diff a reviewer sees should be exactly the
@@ -99,7 +99,7 @@ change and nothing else.
 ### Context is the bottleneck, not intelligence

 A frontier model is plenty smart enough to understand any one file in your repo. What it *can't* do
-is hold all 200,000 lines in its head at once — the context window is finite, and stuffing it full of
+is hold all 200,000 lines in its head at once. The context window is finite, and stuffing it full of
 irrelevant code makes the model worse, not better. So the skill here isn't "give the AI more." It's
 **give the AI the right slice, and a way to fetch more on demand.**

@@ -116,7 +116,7 @@ of access that turn a guessing model into a grounded one:

 - **The filesystem and code search** — so it can grep for every caller of a function instead of
  assuming it found them all.
- **Language-server intelligence** — go-to-definition, find-references, type info — so "where is this
+- **Language-server intelligence** (go-to-definition, find-references, type info) so "where is this
  used?" is answered by the toolchain, not by the model's guess.
 - **The surrounding systems** — the issue tracker (Module 9), CI results (Module 14), the running
  app's logs — so the AI maps the code *and* the context it lives in.
@@ -146,16 +146,16 @@ in unfamiliar code," they encode *exactly* what careful means, as steps the AI f

 Onboard a human to a legacy codebase and the advice is familiar: read the README, ask a senior dev.
 What's specific here is that **the AI is both the thing reading the codebase and the thing most
-likely to confidently misread it** — and the bigger the repo, the wider that gap between "sounds
+likely to confidently misread it.** The bigger the repo, the wider that gap between "sounds
 authoritative" and "is correct."

 So the AI-specific discipline is verification, not exploration. The model is genuinely excellent at
-the grunt work of orientation — reading a hundred files, summarizing structure, tracing a call path —
-which is exactly the work that's tedious and slow for a human. But it will narrate a wrong map with
+the grunt work of orientation: reading a hundred files, summarizing structure, tracing a call path.
+That's exactly the work that's tedious and slow for a human. But it will narrate a wrong map with
 the same fluent confidence as a right one. Your job shifts from "explore the code" (let the AI do
 that) to "make the AI prove its map against real files, and keep its changes small enough that a
-wrong map can't do much damage." The whole earlier toolchain — version control, branches, review,
-tests, recovery — is what turns "the AI might be wrong about this huge system" from a catastrophe
+wrong map can't do much damage." The whole earlier toolchain (version control, branches, review,
+tests, recovery) is what turns "the AI might be wrong about this huge system" from a catastrophe
 into a revertable diff.

 ---
@@ -167,7 +167,8 @@ This lab does **not** use `tasks-app` — the entire point is a codebase you *di

 **You'll need:**

- Git, Python 3.10+, and your agentic AI tool from Module 4.
+- Git, Python 3.10+, and the agentic AI tool from Module 4. The lab uses Claude Code as the worked
+  example (`claude --version  # sub your own agent`); the steps survive a tool swap.
 - A real, small-to-medium open-source repo to clone. Pick something with **tests** and a clear
  build/test command, in a language you can at least read. Good traits: a few thousand lines, an
  obvious entry point, a documented install (`pip install -e .`, `npm install`, `go mod download`,
@@ -208,38 +209,44 @@ This lab does **not** use `tasks-app` — the entire point is a codebase you *di

 ### Part C — One small, scoped, tested change

-6. Pick a genuinely small change — a clearer error message, a fixed edge case, a tiny missing
-   validation, a documented-but-unhandled input. Something a single function owns. First **install
-   the project's dependencies** the way its README says — typically `pip install -e .` (Python),
-   `npm install` (JS/TS), `go mod download` (Go), or the equivalent — *then* run the existing tests
-   to establish a green baseline (`python -m unittest`, `pytest`, `npm test`, `go test ./...` —
-   whatever `ORIENT.md` and the README confirmed). A fresh clone usually won't run green until its
-   deps are installed; if it still won't go green on a clean clone *after* a documented install,
-   that's a setup problem, not your baseline — pick another repo rather than change code on top of an
-   environment you can't trust.
+6. Pick a genuinely small change: a clearer error message, a fixed edge case, a tiny missing
+   validation, a documented-but-unhandled input. Something a single function owns. Now load the
+   `safe-change` skill (`lab/skills/safe-change.md`) and let Claude Code (sub your own agent) do the
+   setup the skill assigns it. Tell it to install the project's dependencies the way the README says
+   (typically `pip install -e .` for Python, `npm install` for JS/TS, `go mod download` for Go) and
+   run the existing tests to establish a green baseline. **Your job is to verify the result**, not to
+   type the commands. Confirm the suite is actually green, and apply the judgment the skill leaves to
+   you: a fresh clone usually won't run green until its deps are installed, but if it still won't go
+   green on a clean clone *after* a documented install, that's a setup problem rather than your
+   baseline. Pick another repo before you change code on top of an environment you can't trust.

-7. Branch, then load the `safe-change` skill (`lab/skills/safe-change.md`) and work the change with
-   the AI:
+7. Direct the AI through the change with the `safe-change` skill loaded. Its first action is to
+   create the branch (Step 1 of the skill), so you don't type `git switch` yourself; **verify** it
+   did by running:

   ```bash
-   git switch -c scoped-change
+   git status        # confirm you're on e.g. scoped-change, not the default branch
   ```

-   Make it find the blast radius (every caller) before editing. Keep the edit minimal. Add a test
-   that fails without the change and passes with it. Run the **full** suite.
+   Then direct the rest: make it find the blast radius (every caller) before editing, keep the edit
+   minimal, and add a test that fails without the change and passes with it. Have it run the **full**
+   suite and confirm green.

-8. **Review the diff like it's a stranger's PR (Module 10):**
+8. **Review the diff like it's a stranger's PR (Module 10).** This part you do by hand; reviewing
+   what the AI wrote is the skill that doesn't transfer to the AI:

   ```bash
   git diff
   ```

   Every changed line should be necessary and explainable. If the AI snuck in a reformat or a
-   rename, revert it — that's the sprawl this whole module exists to prevent. Commit only when the
-   diff is exactly the change and nothing more.
+   rename, tell it to revert that and keep only the scoped change. Once the diff is exactly the
+   change and nothing more, instruct the AI to commit it, then verify the result with
+   `git show` so the commit holds only what you approved.

-9. Write the PR description the `safe-change` skill asks for: what changed, why, the blast radius,
-   how you tested it, and what you deliberately did *not* touch.
+9. Have the AI draft the PR description the `safe-change` skill asks for (what changed, why, the
+   blast radius, how it was tested, and what it deliberately did *not* touch), then edit it into your
+   own words before it goes up.

 ---

@@ -247,7 +254,7 @@ This lab does **not** use `tasks-app` — the entire point is a codebase you *di

 - **A confident map is still just a hypothesis.** The AI will produce a fluent, plausible
  architecture summary for a repo it half-read. Fluency is not correctness. The citation-checking in
-  Part B isn't optional ceremony — it's the only thing standing between you and changing code based on
+  Part B isn't optional ceremony; it's the only thing standing between you and changing code based on
  a fiction. Verify at least a few claims by hand, every time.
 - **The context window is a hard ceiling.** On a truly large monorepo, the AI cannot see everything,
  and it usually won't *tell* you what it didn't read. Its map is only as good as the slice it
@@ -256,7 +263,7 @@ This lab does **not** use `tasks-app` — the entire point is a codebase you *di
  a claim to distrust.
 - **"Small change" can hide a big blast radius.** A one-line edit to a heavily-called function can
  ripple through code you never opened. The blast-radius search in the `safe-change` skill is the
-  defense, but it's only as good as the AI's ability to find *every* caller — dynamic dispatch,
+  defense, but it's only as good as the AI's ability to find *every* caller: dynamic dispatch,
  reflection, config-driven wiring, and string-based lookups all defeat naive search. When in doubt,
  the tests are your backstop, which is why a repo *without* tests is genuinely dangerous to change
  this way.
@@ -287,7 +294,7 @@ This lab does **not** use `tasks-app` — the entire point is a codebase you *di
  one-off heroics session.

 If your change is a clean, tested, reviewable one-liner in a system you couldn't have described an
-hour ago — and you trust it — you've got the motion.
+hour ago, and you trust it, you've got the motion.

 ---

@@ -1,7 +1,7 @@
 # Skill: Map this repo

 A navigation playbook (a Module 21 skill) for orienting in a codebase you didn't write.
-Point your agentic tool at this file as a skill, or paste it in as instructions. The goal is a
+Point Claude Code (or sub your own agent) at this file as a skill, or paste it in as instructions. The goal is a
 **read-only** mental model — no edits happen here.

 ## When to use
@@ -11,7 +11,7 @@ At the start of any session on an unfamiliar repo, before any change is discusse
 - **Read only.** Do not edit, create, or delete files while mapping. No exceptions.
 - **Cite real paths.** Every claim about the code must point to a file and, ideally, a line range.
  If you can't cite it, say "unverified" instead of guessing.
- **Breadth before depth.** Establish the whole shape before diving into any one area.
+- **Breadth before depth.** Establish the whole shape before going deep on any one area.
 - **No conclusions from file names alone.** A file called `auth.py` may not be where auth lives.

 ## Steps