Reframe sweep M7-27 + capstone (AI drives git, lesson=theory, de-slop) (#93)

Co-authored-by: claude <claude@jpaul.io> Co-committed-by: claude <claude@jpaul.io>
2026-06-22 21:58:36 -04:00
parent a29823f4b3
commit 513d7e7ac8
38 changed files with 1735 additions and 1424 deletions
@@ -2,9 +2,9 @@

 > **One feature, taken end to end, with every module doing its job in sequence.** This is the finale:
 > not new material, but proof that the twenty-seven pieces you learned separately are actually one
-> motion. By the end you'll have shipped a real change to `tasks-app` — prompt to running container —
-> and felt the thing the whole course was for: the model did the typing, but the *workflow* is what
-> made it safe and repeatable.
+> motion. By the end you'll have shipped a real change to `tasks-app`, from prompt to running
+> container. The model did the typing. The *workflow* is what made that safe and repeatable, and the
+> workflow is the part you built.

 ---

@@ -13,13 +13,14 @@
 There's nothing to learn here that the modules didn't already teach. The capstone exists to **wire it
 together**. Every step below names the module it comes from, so you can see the dependency chain you
 climbed now collapse into a single fluent pass. If a step feels unfamiliar, that's a pointer back to
-the module to re-read — not new content to absorb.
+the module to re-read, not new content to absorb.

 You'll do it twice:

-1. **The main loop** — you driving, the AI assisting. The full pipeline, by hand, once.
-2. **The stretch variant (optional)** — the *same* feature run the Unit 5 way, with agents inside the
-   pipeline, so you watch the workflow start to run itself.
+1. **The main loop.** You direct, the AI executes. You file the issue and make the calls; the AI does
+   the git and the edits; you verify each result. The full pipeline, once.
+2. **The stretch variant (optional).** The *same* feature run the Unit 5 way, with autonomous agents
+   inside the pipeline, so you watch the workflow start to run itself.

 ---

@@ -52,7 +53,7 @@ add **due dates**:
  running container, not just the CLI.

 This deliberately spans the core (`tasks.py`), the CLI (`cli.py`), and the deployable service
-(`serve.py`) — one feature, three surfaces, exactly the kind of change that used to mean three
+(`serve.py`): one feature, three surfaces, exactly the kind of change that used to mean three
 copy-paste sessions and a prayer (Module 1). And it has a built-in trap for the review step: "is a
 task due *today* overdue?" is the kind of off-by-one an AI will answer confidently and wrongly.

@@ -66,37 +67,36 @@ Read this once as a map before you touch the keyboard. Each arrow is a module.
 *"Add optional due dates to tasks, an `overdue` command, and a `/overdue` endpoint."* Acceptance
 criteria in the body. Label it. The issue is the contract the rest of the loop closes against.

-**Issue → branch (M6/M11).** Never work on `main`. Branch named after the issue:
-`git switch -c 47-due-dates`. The branch is a sandbox you can throw away wholesale (M6) — which is the
-only reason letting the AI loose on three files at once is a calm decision instead of a gamble.
+**Issue → branch (M6/M11).** Never work on `main`. Have the AI branch off main, named for the issue
+(something like `47-due-dates`). The branch is a sandbox you can throw away wholesale (M6); that
+disposability is what lets you turn the AI loose on three files at once without risking `main`.

 **Branch → AI implementation (M4), config already in place (M5).** Now the AI edits the files
-directly in your editor or CLI — no browser, no paste. It already knows your conventions because the
+directly in your editor or CLI, with no browser and no paste. It already knows your conventions because the
 committed instructions file has been in the repo since the first commit (M5): core logic in
 `tasks.py`, CLI wiring in `cli.py`, standard library only, run the tests before claiming done. You
 didn't re-explain any of that. That's the file earning its keep.

 **Implementation → tests (M13).** The feature isn't done when it runs; it's done when it's *pinned*.
-Have the AI extend `test_tasks.py` with cases for the new logic — and write the boundary cases
-yourself or demand them by name, because the boundary is exactly where the AI guesses: due yesterday
-(overdue), due tomorrow (not), **due today (not — yet)**, no due date at all (never overdue, never
-crashes).
+Have the AI extend `test_tasks.py` with cases for the new logic, and name the boundary cases
+yourself, because the boundary is exactly where the AI guesses: due yesterday (overdue), due tomorrow
+(not), **due today (not yet)**, no due date at all (never overdue, never crashes).

-**Secrets stay clean (M17).** This feature needs no new secret — it reads the system clock. The
+**Secrets stay clean (M17).** This feature needs no new secret; it reads the system clock. The
 discipline is that nothing got hardcoded *anyway*: the service still reads its config from the
-environment via `.env`, and `.env.example` documents any new keys. The win here is a non-event, which
-is the point — the failure mode (M17: AI hardcodes a value) simply didn't happen, because the pattern
-was already there.
+environment via `.env`, and `.env.example` documents any new keys. The win here is a non-event, and
+that is the point. The failure mode (M17: AI hardcodes a value) simply didn't happen, because the
+pattern was already there.

-**Tests → PR (M10/M11).** Push the branch, open a PR, and put `Closes #47` in the description so the
-merge closes the issue automatically (M11). The PR is the review gate even though it's your own code —
-*especially* because an AI wrote most of it.
+**Tests → PR (M10/M11).** Have the AI push the branch and open the PR, with `Closes #47` in the
+description so the merge closes the issue automatically (M11). The PR is the review gate even though
+it's your own code, and *especially* because an AI wrote most of it.

 **PR → CI → security scan (M14/M15/M19).** Opening the PR triggers the pipeline on your runner (M19):
-lint, build, tests (M14), then the security gate (M15) — dependency audit, secret scan, SAST. The
-feature added no dependencies, so SCA should be quiet; the secret scan confirms you didn't smuggle a
-key into a fixture. CI is the tireless reviewer that catches the code that *looks* right (M14); the
-security scan catches the failure classes a build check never would (M15).
+lint, build, tests (M14), then the security gate (M15): dependency audit, secret scan, SAST. The
+feature added no dependencies, so SCA should be quiet, and the secret scan confirms you didn't smuggle
+a key into a fixture. CI catches code that *looks* right (M14); the security scan catches the failure
+classes a build check never would (M15).

 **Review (M10).** Green CI is necessary, not sufficient. Read the diff like you didn't write it
 (M10). Go straight for the plausibility trap: open `overdue()` and check the comparison. Did it use
@@ -109,31 +109,29 @@ is now ahead by one clean, tested, scanned commit.

 **Merge → containerized deploy (M16/M18).** The merge to `main` triggers delivery (M18): CI builds the
 image from your `Dockerfile` (M16), tags it with the new commit SHA (immutable, not `latest`), runs
-`deploy.sh` to start the container with env injected (M17), polls `/health`, and — if health fails —
-rolls back to the previous SHA. Hit `GET /overdue` on the running container. The feature is live, in a
+`deploy.sh` to start the container with env injected (M17), polls `/health`, and rolls back to the
+previous SHA if health fails. Hit `GET /overdue` on the running container. The feature is live, in a
 reproducible artifact, behind a health check that can undo itself.

-**If it goes wrong (M12).** Something slips past every gate eventually. Because you squash-merged (one
-commit on `main`, not a two-parent merge), a bad change reverts cleanly with plain
-`git revert <squash-sha>` — a new commit, safe on shared history, no rewriting what teammates pulled
-(M12). Skip the `-m 1` you saw in Module 12: that flag is only for true merge commits, the kind
-`git merge --no-ff` makes, and a squash merge isn't one. A bad deploy is already handled by
-`deploy.sh`'s rollback to the last good SHA. Recovery is a discipline you rehearsed, not a panic.
+**If it goes wrong (M12).** Something slips past every gate eventually. Because you squash-merged, the
+bad change is one ordinary commit on `main`, so you direct the AI to revert it and verify the revert
+lands as a clean new commit on shared history, without needing the `-m 1` flag (M12). A bad deploy is
+already handled by `deploy.sh`'s rollback to the last good SHA. Recovery is a move you rehearsed.

 That's the whole motion. Notice what carried it: not the model. **The model wrote the diff; the
 workflow is everything that made the diff safe to merge and trivial to undo.** Swap the model next
-quarter and every arrow above is unchanged. That's the Module 1 thesis — *the model is the cheap,
-swappable part; the workflow is the durable skill* — now demonstrated rather than asserted.
+quarter and every arrow above is unchanged. That's the Module 1 thesis (*the model is the cheap,
+swappable part; the workflow is the durable skill*), and you just lived it instead of reading it.

 ---

 ## Hands-on lab

-**Lab language:** shell + Python, on the `tasks-app` repo. You'll use your editor-integrated or CLI
-agent (M4) for the implementation; everything else is your normal toolchain.
+**Lab language:** shell + Python, on the `tasks-app` repo. You'll direct Claude Code (`claude` — sub
+your own agent) to do the git and the edits (M4); you make the calls and verify each result.

-**You'll need:** the `tasks-app` repo in the prerequisite state above, your agentic tool, your forge
-account, and a working Docker install.
+**You'll need:** the `tasks-app` repo in the prerequisite state above, Claude Code (or your own
+agent), your forge account, and a working Docker install.

 ### Part A — Issue and branch (M9, M6, M11)

@@ -146,28 +144,33 @@ account, and a working Docker install.
   - A task due **today** is **not** overdue. A task with **no** due date is **never** overdue.
   - `serve.py` exposes `GET /overdue` returning the same set as the CLI.

-2. Branch off `main`, named for the issue:
+2. Point Claude Code at the repo and tell it to sync `main` and cut the branch:
+
+   > *"Sync `main` with the remote, then create a branch named `47-due-dates` for issue #47."* (Use
+   > your real issue number.)
+
+   Then verify it did what you asked:

   ```bash
   cd ~/ai-workflow-course/tasks-app
-   git switch main && git pull
-   git switch -c 47-due-dates        # use your real issue number
+   git status        # on 47-due-dates, clean, up to date with main
+   git branch        # the new branch exists and is checked out
   ```

 ### Part B — Implement with the AI (M4, M5)

-3. In your editor/CLI agent, give it the issue, not a vague wish:
+3. Give Claude Code the issue, not a vague wish:

   > *"Implement issue #47. Add an optional due date to tasks (core in `tasks.py`), wire `--due` into
   > the `add` command and a new `overdue` command in `cli.py`, and add a `GET /overdue` endpoint to
   > `serve.py`. Follow the acceptance criteria exactly. Run the tests before you tell me it's done."*

-   You should *not* have to specify "stdlib only" or "don't touch `tasks.json`" — that's in the
+   You should *not* have to specify "stdlib only" or "don't touch `tasks.json`"; that's in the
   committed instructions file (M5). If the agent reaches for a date library or hand-edits the JSON,
-   your file needs a line; that's signal, not failure.
+   your file is missing a line, and that gap is the useful signal.

-4. Run it by hand to confirm it's real. Choose the two dates relative to *your* today — one comfortably
-   in the future, one safely in the past — so the assertion below holds whenever you run this:
+4. Run it yourself to confirm it's real. Choose the two dates relative to *your* today (one comfortably
+   in the future, one safely in the past) so the assertion below holds whenever you run this:

   ```bash
   python cli.py add "file taxes" --due <a date a few months out>   # future → NOT overdue
@@ -181,26 +184,28 @@ account, and a working Docker install.
 ### Part C — Tests (M13)

 5. Have the AI extend `test_tasks.py`, then **read the test names** and confirm the boundaries are
-   actually covered. If "due today" and "no due date" aren't each their own test, add them — by hand
-   or by demanding them. Run the suite:
+   actually covered. If "due today" and "no due date" aren't each their own test, tell the AI to add
+   them by name. Confirm the suite is green:

   ```bash
   pytest        # or: python -m unittest
   ```

-   Commit only when it's green:
+   Once it's green, tell the AI to commit the change. Then verify what it actually staged and wrote:

   ```bash
-   git add -A && git commit -m "Add task due dates, overdue command, and /overdue endpoint"
+   git show --stat HEAD     # the right files, with a sensible message
+   git status               # nothing stray left uncommitted
   ```

 ### Part D — PR, CI, security, review (M10, M11, M14, M15, M19)

-6. Push and open the PR with the closing keyword:
+6. Tell the AI to push the branch and open the PR, with `Closes #47` in the description. Then verify
+   on the forge that the PR exists, targets `main`, and carries the closing keyword:

   ```bash
-   git push -u origin 47-due-dates
-   # open the PR on your forge; put "Closes #47" in the description
+   git log --oneline origin/47-due-dates -1   # the branch is on the remote
+   # then open the PR in the forge UI and confirm "Closes #47" is in the description
   ```

 7. Watch the pipeline run on your runner (M19): lint + tests (M14), then the security scan (M15).
@@ -211,8 +216,8 @@ account, and a working Docker install.
   - Is the comparison strict (`<` today) or inclusive (`<=`)? A task due today must **not** appear.
   - What happens for a task with `due == None`? It must be skipped, not crash, not counted.

-   If either is wrong — and an AI gets at least one of these wrong more often than you'd like — request
-   the fix on the branch, let CI re-run, and review again. Catching this *here*, before merge, is the
+   If either is wrong (and an AI gets at least one of these wrong more often than you'd like), have the
+   AI fix it on the branch, let CI re-run, and review again. Catching this *here*, before merge, is the
   entire point of the gate.

 ### Part E — Merge and deploy (M11, M16, M18, M17)
@@ -226,92 +231,95 @@ account, and a working Docker install.
    curl localhost:8000/overdue
    ```

-    You should see your overdue task served from the running container — the feature live in a
+    You should see your overdue task served from the running container: the feature live in a
    reproducible artifact (M16), configured from the environment (M17), behind a self-rolling-back
    health check (M18).

 ### Part F — Rehearse recovery (M12)

-11. **Sync local `main` first.** The squash-merge in step 9 happened on the forge, so the new commit
-    lives only on the remote — your local `main` is one behind. Pull it down and capture the SHA of
-    the squash commit you're about to rehearse undoing:
+11. **Have the AI sync local `main` first.** The squash-merge in step 9 happened on the forge, so the
+    new commit lives only on the remote and your local `main` is one behind. Tell the AI to pull
+    `main` and report the SHA of the squash commit you're about to rehearse undoing. Verify:

    ```bash
-    git switch main && git pull      # bring the squash-merge commit into local main
-    git log --oneline -1             # the top line IS your squash commit — note its SHA
+    git log --oneline -1     # the top line is your squash commit; note its SHA
    ```

-12. Prove you can undo it. Cut a throwaway branch off the freshly-synced `main` and revert that squash
-    commit, just to watch it work, then delete the branch:
+12. Prove you can undo it, without typing the git yourself. Direct the AI:
+
+    > *"Cut a throwaway branch off `main`, revert the squash commit `<sha>`, run the tests, then delete
+    > the branch. The squash merge is a single-parent commit, so confirm a plain revert is correct and
+    > that you do not need `-m 1`."*
+
+    The `-m 1` check is the teaching point you carried from Module 12: that flag is only for the
+    two-parent merge commits `git merge --no-ff` makes, and a squash merge isn't one. Have the AI say
+    which it used and why. Then verify the rehearsal landed and left no mess:

    ```bash
-    git switch -c throwaway-revert-test
-    git revert <squash-sha>     # plain revert: a squash merge is one ordinary commit, so no -m 1
-    pytest && git switch main && git branch -D throwaway-revert-test
+    git branch       # throwaway-revert-test is gone; you're back on main
+    git status       # clean
    ```

-    No `-m 1` here, and nothing to "find": that flag is only for the two-parent merge commits Module 12
-    rehearsed with `git merge --no-ff`. A squash merge produces a single-parent commit, so plain
-    `git revert <squash-sha>` is the right undo. You just confirmed the escape hatch is real *before*
-    you ever need it in anger.
+    You just confirmed the escape hatch is real before you need it.

 ---

 ## Stretch variant — run the same feature the Unit 5 way (optional)

-Everything above had you in the driver's seat. Now run the **identical** feature with agents *inside*
-the pipeline and watch how much of the loop keeps running when you step back. Do this only after the
-main loop succeeded — you can't supervise a pipeline you haven't run by hand.
+The main loop kept you in the driver's seat, directing each step. Now run the **identical** feature
+with autonomous agents *inside* the pipeline and watch how much of the loop keeps running when you
+step back. Do this only after the main loop succeeded; you can't supervise a pipeline you haven't
+driven yourself once.

 The feature, the branch flow, the gates, and the deploy are unchanged. What changes is *who does each
 step*:

 1. **Issue-to-PR agent does the first pass (M25).** Assign the issue to an autonomous agent instead of
-   opening your editor. It reads issue #47, creates the branch, implements across `tasks.py`,
-   `cli.py`, and `serve.py`, writes tests, and opens the PR — all landing as a reviewable PR behind
-   CI, exactly like a human contributor's. It is allowed to *propose*, never to merge. The supervision
-   is structural: the same CI (M14) and security (M15) gates stand whether the author is a human or an
-   agent.
+   driving the work step by step yourself. It reads issue #47, creates the branch, implements across
+   `tasks.py`, `cli.py`, and `serve.py`, writes tests, and opens the PR, all landing as a reviewable
+   PR behind CI, exactly like a human contributor's. It is allowed to *propose*, never to merge. The
+   supervision is structural: the same CI (M14) and security (M15) gates stand whether the author is a
+   human or an agent.

 2. **An assistive reviewer comments first (M24).** Before you look, an AI reviewer reads the diff
-   against your committed rubric and posts comments on the PR — flagging, ideally, the very `overdue()`
-   boundary you hunted by hand. It comments; it does not approve and does not merge (M24). A human
+   against your committed rubric and posts comments on the PR, flagging, ideally, the very `overdue()`
+   boundary you hunted yourself. It comments; it does not approve and does not merge (M24). A human
   still decides. You read its comments, then read the diff yourself, and notice the reviewer caught
-   the off-by-one — or notice it *missed* it, which is its own lesson about not trusting the assistant
+   the off-by-one, or notice it *missed* it, which is its own lesson about not trusting the assistant
   blindly.

 3. **Evals tell you whether to trust any of it (M27).** Turn the boundary cases from Part C into an
-   eval set — due yesterday, due today, due tomorrow, no due date — and score the agent's
-   implementation against it. Now do the thing the whole course was building to: **swap the model**
-   behind the agent and re-run the *same* eval. If the new model's `overdue()` regresses on the
-   "due today" case, the eval catches it before the PR ever merges. That's the close of the thesis —
-   evals are how you judge a model swap, so the swap you *will* make stays safe (M27).
+   eval set (due yesterday, due today, due tomorrow, no due date) and score the agent's implementation
+   against it. Now do the thing the whole course was building to: **swap the model** behind the agent
+   and re-run the *same* eval. If the new model's `overdue()` regresses on the "due today" case, the
+   eval catches it before the PR ever merges. That closes the thesis: evals are how you judge a model
+   swap, so the swap you *will* make stays safe (M27).

 When this runs, look at what's left for you: filing a crisp issue, reading a diff the assistant
-already annotated, and reading an eval score. The agent drafted; the gates held; the eval judged. The
-workflow didn't just make AI safe to use — it started running itself, with you supervising instead of
-typing. That only works because every catch-net from Units 2–3 was already in place. Take those away
-and "let an agent open a PR" is reckless; with them, it's just another contributor (M11).
+already annotated, and reading an eval score. The agent drafted, the gates held, the eval judged. The
+workflow didn't just make AI safe to use; it started running itself, with you supervising. That only
+works because every catch-net from Units 2–3 was already in place. Take those away and "let an agent
+open a PR" is reckless; with them, it's just another contributor (M11).

 ---

 ## Where it breaks

 - **A finale is not a shortcut.** The loop is fluent *because* you climbed the modules. Running the
-  capstone without the foundation — no protected `main`, no CI, no tests — isn't "the full loop," it's
+  capstone without the foundation (no protected `main`, no CI, no tests) isn't "the full loop," it's
  the copy-paste problem with extra steps. The pipeline's value is entirely in the gates; skip them
  and you've kept the ceremony and thrown away the safety.
 - **Green CI is not correctness.** Every gate in this loop is a filter, not a guarantee. CI proves the
  tests pass; it can't prove the tests test the right thing. The `overdue()` boundary trap passes a
-  weak test suite happily. The human review step (M10) is load-bearing and stays load-bearing — the
+  weak test suite happily. The human review step (M10) is load-bearing and stays load-bearing; the
  automation raises the floor, it doesn't remove the ceiling.
 - **The stretch variant moves the work, it doesn't delete it.** An issue-to-PR agent doesn't reduce
-  the importance of a well-written issue — it *raises* it, because a vague issue now produces a vague
-  PR with no human in the authoring loop to course-correct. You trade typing for specifying and
-  judging. That's a better trade, not a free one.
+  the importance of a well-written issue; it *raises* it, because a vague issue now produces a vague
+  PR with no human in the authoring loop to course-correct. The work shifts from typing toward
+  specifying and judging. That shift is a good one, but it isn't free.
 - **Evals are only as honest as their cases.** An eval set that omits the "due today" boundary will
-  bless a broken model swap. The eval doesn't know what you forgot to test (M27). It scales your
-  judgment; it doesn't supply it.
+  bless a broken model swap. The eval doesn't know what you forgot to test (M27); it can only scale
+  the judgment you already bring to the cases you write.

 ---

@@ -323,15 +331,15 @@ and "let an agent open a PR" is reckless; with them, it's just another contribut
  .../overdue` returns the right tasks from the deployed artifact.
 - Issue #47 closed itself on merge, `main` is one clean commit ahead, and you caught (or consciously
  verified) the `overdue()` boundary in review rather than in production.
- You can point at each step and name the module it came from without looking — and explain why the
+- You can point at each step and name the module it came from without looking, and explain why the
  *order* is the dependency chain, not an arbitrary checklist.
 - You can state, from what you just did rather than from the syllabus, why the model is the swappable
  part: every step would survive replacing the model, and the stretch variant's eval is exactly how
  you'd prove a swap was safe.

 If you ran the stretch variant, add one more: you watched an agent author the PR and an assistant
-review it, and you can say precisely which catch-nets from earlier units made handing that work to an
-agent a calm decision instead of a leap.
+review it, and you can name precisely which catch-nets from earlier units made it reasonable to hand
+that work to an agent at all.

-That's the course. The model wrote the code. **You built the workflow that made the code matter** —
+That's the course. The model wrote the code. **You built the workflow that made the code matter**,
 and that's the part that's still yours when the next model ships.