fix(M7-27+capstone): apply AI-drives-git reframe, lesson=theory, de-slop course-wide

Phase 2 sweep — all modules are post-pivot, so the learner directs the AI agent (Claude Code as the worked example) to do the git/setup work and verifies, instead of typing commands by hand; no re-teaching basics. Lesson sections are theory with example output; all execution lives in the labs. De-slopped ("prose" etc. gone course-wide, em-dash density thinned). /path/to placeholders -> ~/ai-workflow-course. Every deliberate teaching device verified intact: M10 ai-change.patch trap, M12 bad-clear-snippet, M13/M27 planted pending_count bug, M15 secret+typosquat+MD5, M18 BREAK=1, M21 absent-.gitignore, M22 poisoned skill, M24 no-op patch, M25 --simulate. Labs compile/parse (py/sh/yaml/json); no junk. Closes #83 Closes #86 Closes #89 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01TfzV5QvtPDz8LJS3Pu5VLT
2026-06-22 21:58:17 -04:00
parent a29823f4b3
commit f925fd9645
38 changed files with 1735 additions and 1424 deletions
@@ -9,29 +9,29 @@
 ## Prerequisites

 This is the module the whole back half of the course was load-bearing for. It assumes a lot, on
-purpose — each piece is a wall the autonomous agent has to land behind.
+purpose; each piece is a wall the autonomous agent has to land behind.

- **Module 24** — assistive agents, where the AI helped and *you* decided every step. This module is
+- **Module 24**: assistive agents, where the AI helped and *you* decided every step. This module is
  the escalation: the agent now takes a step on its own. The only reason that's responsible is the
  rest of this list.
- **Module 9** — issues as an agent's task specification, including the `ready` label and the idea of
+- **Module 9**: issues as an agent's task specification, including the `ready` label and the idea of
  an agent as an *assignee*. An issue is the agent's input here.
- **Module 6** — branches. The agent's work goes on a branch, never straight onto `main`.
- **Modules 10 and 11** — the PR review gate and the full issue → branch → implementation → PR →
+- **Module 6**: branches. The agent's work goes on a branch, never straight onto `main`.
+- **Modules 10 and 11**: the PR review gate and the full issue → branch → implementation → PR →
  review → merge → close loop. The PR *is* the unit of supervision in this module.
- **Modules 13 and 14** — tests and CI. The automated gate that runs on the agent's PR.
- **Module 15** — security scanning as another gate on the same pushes. Autonomy makes this
+- **Modules 13 and 14**: tests and CI. The automated gate that runs on the agent's PR.
+- **Module 15**: security scanning as another gate on the same pushes. Autonomy makes this
  non-optional, not optional.
- **Module 19** — runners. A triggered or scheduled agent is just a runner job; you need to know
+- **Module 19**: runners. A triggered or scheduled agent is just a runner job; you need to know
  what's executing it and whose compute it's burning.
- **Module 12** — revert, reset, recovery. The backstop for when a gate misses something.
- **Module 5** — your committed AI instructions file: the agent's standing brief, the half of the
+- **Module 12**: revert, reset, recovery. The backstop for when a gate misses something.
+- **Module 5**: your committed AI instructions file: the agent's standing brief, the half of the
  spec that isn't in the issue.
- **Modules 16, 17, 22** — containers (sandboxing), secrets (scoped credentials), and the prompt-
+- **Modules 16, 17, 22**: containers (sandboxing), secrets (scoped credentials), and the prompt-
  injection attack surface. An unattended agent with a push token is a security boundary; these are
  why.

-If you skipped straight here, the lesson will read as reckless — because without those gates, it
+If you skipped straight here, the lesson will read as reckless, because without those gates, it
 *would* be.

 ---
@@ -48,7 +48,7 @@ By the end of this module you can:
   `main`, and explain why that's *structural* supervision rather than *behavioral*.
 4. Build a bounded self-healing loop: when a gate fails, feed the failure back to the agent for a
   fix, capped at N attempts, with the result landing as a PR you review.
-5. Decide how much autonomy to grant by reasoning about the strength of your gates — not the
+5. Decide how much autonomy to grant by reasoning about the strength of your gates, not the
   intelligence of your model.

 ---
@@ -99,15 +99,15 @@ issue (assigned/labeled)  →  agent reads it  →  branch  →  implement  →

 What the agent reads as its brief is two artifacts you already maintain:

- **The issue** (Module 9) — the *specific* task: title, context, acceptance criteria, scope. The
+- **The issue** (Module 9): the *specific* task: title, context, acceptance criteria, scope. The
  acceptance criteria are the agent's literal definition of done.
- **The committed config** (Module 5) — the *standing* brief: conventions, the build and test
+- **The committed config** (Module 5): the *standing* brief: conventions, the build and test
  commands, "don't touch these files," house style. Every assignee inherits it, including this one.

 Together they're enough for the agent to attempt the work with **no live conversation**. That's the
 point of having spent modules making both artifacts good: a well-formed issue plus a committed config
 is a complete, handoff-ready spec. Hand it a vague issue and you get the Module 9 failure mode at
-full volume — a confident, plausible, wrong PR that costs more to review than the work would have
+full volume: a confident, plausible, wrong PR that costs more to review than the work would have
 taken.

 Crucially: the agent's last step is **open a PR**, not **merge**. The output is a proposal. Nothing
@@ -129,14 +129,14 @@ push  →  CI fails  →  agent reads the failure  →  proposes a fix  →  pus
                                                                       green? PR for review
 ```

-Two design rules make this safe rather than a money-burning loop:
+Two design rules make this safe rather than a runaway loop:

 1. **Bound the retries.** Two or three attempts, then stop and tag a human. An agent that can retry
   forever *will*, on a flaky test, producing an endless stream of plausible "fixes" and a runner
   bill to match.
 2. **Watch what it's fixing.** The classic failure mode: the test fails, so the agent "fixes" it by
   *editing the test to pass* instead of fixing the bug. That's why the green result still lands as a
-   **reviewable PR** — a human confirms it fixed the code, not the evidence. Self-healing CI proposes
+   **reviewable PR**: a human confirms it fixed the code, not the evidence. Self-healing CI proposes
   a fix; it doesn't certify one.

 ### Pattern 3 — Triggered and scheduled agent jobs
@@ -145,9 +145,9 @@ How does an agent *start* without you launching it? It runs as a runner job (Mod
 machinery that runs your CI, pointed at an agent instead of a test suite. Two triggers cover almost
 everything:

- **Triggered** — an event fires the job: an issue gets a `ready`/`agent` label, a comment says
+- **Triggered**: an event fires the job: an issue gets a `ready`/`agent` label, a comment says
  `/agent fix this`, a CI run goes red. Event in, agent runs, PR out.
- **Scheduled** — a cron-style timer fires it: "every night, attempt the top `ready`-labelled issue,"
+- **Scheduled**: a cron-style timer fires it: "every night, attempt the top `ready`-labelled issue,"
  or "hourly, retry any red `main` build." This is where "the workflow starts running itself" stops
  being a slogan.

@@ -170,7 +170,7 @@ Here's the load-bearing idea of the module, and it's not about the model:
 If your test suite covers 30% of behavior, an autonomous agent can silently break the other 70% and
 still go green. If your only "review" is rubber-stamping the diff, the review gate isn't real and the
 agent is effectively merging unseen. The work of making agents trustworthy is mostly the unglamorous
-work of making your gates strong — which is the work of Modules 10, 13, 14, and 15. Autonomy doesn't
+work of making your gates strong, which is the work of Modules 10, 13, 14, and 15. Autonomy doesn't
 ask you to trust the model more. It asks you to trust your gates more, and to have earned it.

 ---
@@ -181,22 +181,22 @@ Scripting a runner job is ordinary automation. What's specific to AI here is tha
 the job is non-deterministic and persuasive**, and that changes what "automation" has to mean:

 - **The output is a proposal, not a result.** A normal scheduled job (back up the database, rotate
-  logs) you trust to *complete*. An agent job you trust only to *propose* — because its output is a
+  logs) you trust to *complete*. An agent job you trust only to *propose*, because its output is a
  confident artifact that might be subtly wrong. That's why the universal endpoint is a PR behind a
  gate, never a merge. The structure absorbs the non-determinism.
 - **Supervision shifts from the action to the gate.** With deterministic automation you review the
-  *script* once. With an agent you can't, because it writes something new every run — so you review
+  *script* once. With an agent you can't, because it writes something new every run, so you review
  the *output* every run, automatically (CI, security) and by sample (human review). The supervision
  didn't disappear; it moved from watching the agent to hardening the wall it hits.
 - **Self-healing tempts the worst shortcut in the toolkit.** Pointed at a failing test, an agent will
-  cheerfully delete or weaken the test, because that does technically make CI green. A human would
-  feel the dishonesty; the agent just optimizes the objective you gave it. The defense is structural:
-  the fix is a reviewable diff, and the reviewer's job (Module 10) explicitly includes reading the
-  `-` lines on the *test* file.
+  delete or weaken the test, because that does technically make CI green. A human would feel the
+  dishonesty; the agent just optimizes the objective you gave it. The defense is structural: the fix
+  is a reviewable diff, and the reviewer's job (Module 10) explicitly includes reading the `-` lines
+  on the *test* file.
 - **Autonomy multiplies your earlier discipline, for good or ill.** A clean repo with strong gates
-  and a good committed config turns an agent into a tireless contributor. A repo with flaky tests, no
-  security scanning, and an empty config turns the same agent into an automated mess-generator running
-  on a timer. The agent doesn't fix your engineering — it amplifies it.
+  and a good committed config lets an agent contribute real work on a timer. A repo with flaky tests,
+  no security scanning, and an empty config lets the same agent generate mess on a timer. The agent
+  doesn't fix your engineering; it amplifies it.

 ---

@@ -216,11 +216,11 @@ shows how the exact same flow runs on a real forge as a triggered/scheduled job.
  `pytest` and `ruff` installed (`pip install pytest ruff`). The lab runs these as the CI gate,
  locally — the same checks `ci.yml` runs in Module 14.
 - The starter files in this module's `lab/` folder:
-  - `agent_runner.py` — the orchestrator. Drives the agent (real or simulated), then runs the gate,
+  - `agent_runner.py`: the orchestrator. Drives the agent (real or simulated), then runs the gate,
    and only ever produces a branch + PR proposal, never a merge.
-  - `issue-delete-command.md` — a well-formed issue (Module 9 format) for a `delete <index>` command:
+  - `issue-delete-command.md`: a well-formed issue (Module 9 format) for a `delete <index>` command:
    the agent's input.
-  - `agent-job.yml` — a reference forge workflow showing the triggered + scheduled runner version.
+  - `agent-job.yml`: a reference forge workflow showing the triggered + scheduled runner version.
    Read it; you'll run it for real only in Part D.
 - *Optional, for the "for real" path:* an agentic coding tool that has a non-interactive / headless /
  one-shot mode (most expose a flag for running a single prompt without the interactive UI). If you
@@ -240,22 +240,23 @@ shows how the exact same flow runs on a real forge as a triggered/scheduled job.

 Copy `agent_runner.py` and `issue-delete-command.md` into your `tasks-app` folder, along with this
 module's `lab/.gitignore` (append its lines to the `.gitignore` you already have from Module 2 rather
-than overwriting it). Commit that `.gitignore` first — it keeps the lab scaffolding and Python caches
-out of the agent's `git add -A`, so the change you review in Part B is clean. Then, from a clean
-branch:
+than overwriting it). Direct your agent (Claude Code as the worked example; sub your own) to commit
+that updated `.gitignore`, then verify with `git log`. It keeps the lab scaffolding and Python caches
+out of the agent's `git add -A`, so the change you review in Part B is clean. Then, from
+`~/ai-workflow-course/tasks-app`, run the orchestrator:

 ```bash
-cd ~/ai-workflow-course/tasks-app
-git checkout -b agent/delete-command
-
 # Simulate an agent that produces a BROKEN change, then run the gate on it:
 python agent_runner.py issue-to-pr issue-delete-command.md --simulate bad
 ```

-Watch the output. The "agent" plants a change, the script runs the gate (`ruff check` then
-`pytest -q`), a test fails, and the script **stops and refuses to call the work ready** — exit code
-non-zero, no PR proposed. That is structural supervision: it didn't matter that the change looked
-plausible; the gate caught it. Nothing reached `main`.
+The orchestrator creates and switches to its own `agent/issue-delete-command` branch first (the same
+`git switch -c` the runner does in `agent-job.yml`), so you direct the automation and verify the
+branch with `git branch` rather than typing `git checkout`. Then watch the output: the "agent" plants
+a change, the script runs the gate (`ruff check` then `pytest -q`), a test fails, and the script
+**stops and refuses to call the work ready**, exit code non-zero, no PR proposed. That is structural
+supervision. It didn't matter that the change looked plausible; the gate caught it, and nothing
+reached `main`.

 ### Part B — See a good change land as a PR proposal

@@ -264,19 +265,21 @@ python agent_runner.py issue-to-pr issue-delete-command.md --simulate good
 ```

 This time the planted change is correct. The gate passes, the script commits to the branch and prints
-the diff for review plus the exact `git push` / open-PR command. **It does not merge.** Open the diff
-and review it with the Module 10 checklist. Remember (from the note above) that the simulated diff is
-the self-contained `discount()` stand-in, not a `delete` command — but the review *motion* is the real
-lesson: you are the human gate, and that step doesn't go away just because an agent did the typing.
+the diff plus the push / open-PR command it would run. **It does not merge.** Review the diff with the
+Module 10 checklist, then direct your agent (Claude Code; sub your own) to run that push and open the
+PR, and verify the PR appeared. Remember (from the note above) that the simulated diff is the
+self-contained `discount()` stand-in, not a `delete` command. The review *motion* is the real lesson:
+you are the human gate, and that step doesn't go away just because an agent did the typing. The agent
+stops at a PR; it never merges.

 ### Part C — Run the self-healing loop

 ```bash
-git checkout -b agent/self-heal
 python agent_runner.py self-heal --simulate bad
 ```

-The script plants a failing change, runs the gate (red), feeds the failure back to the "agent" for a
+The orchestrator switches to its own `agent/self-heal` branch (again, you direct the automation, not
+your fingers), then plants a failing change, runs the gate (red), feeds the failure back to the "agent" for a
 fix, re-runs the gate, and repeats up to its retry cap. With `--simulate bad` the fix succeeds on the
 second attempt and the result is offered as a PR proposal. Run it with `--simulate stuck` to watch the
 cap trip: after N attempts it gives up and tags the work for a human instead of looping forever.
@@ -311,7 +314,7 @@ Two ways to go from simulation to a genuine autonomous run:
 The honest limits — and for autonomous agents, the limits *are* the lesson:

 - **Your gates are the ceiling, and most gates are weaker than they look.** Thin test coverage,
-  skipped security scans, or review-by-rubber-stamp don't just reduce quality — they directly set how
+  skipped security scans, or review-by-rubber-stamp don't just reduce quality, they directly set how
  much an autonomous agent can quietly break. Don't grant more autonomy than your gates can verify.
  The honest version of "should I let an agent do this unattended?" is "would my CI catch it if it got
  it wrong?"
@@ -352,8 +355,8 @@ The honest limits — and for autonomous agents, the limits *are* the lesson:
 - You can name the three patterns (issue-to-PR, self-healing CI, triggered/scheduled jobs) and the
  four gates that make any of them safe (review M10, CI M14, security M15, recovery M12).

-When "let the agent take the first pass" feels safe because you trust the wall it lands behind — not
-because you trust the model — you've got the model right. Module 26 takes the next step: more than one
+When "let the agent take the first pass" feels safe because you trust the wall it lands behind, not
+because you trust the model. You've got the model right. Module 26 takes the next step: more than one
 agent working at once without colliding, which is where the worktrees from Module 7 finally pay off at
 scale.

@@ -161,6 +161,18 @@ def in_git_repo() -> bool:
                          capture_output=True).returncode == 0


+def ensure_branch(name: str) -> None:
+    """Create and switch to the agent's working branch. The orchestrator owns this git step the same
+    way agent-job.yml's runner does (`git switch -c`) — you direct the automation and then verify the
+    branch (`git branch`), instead of typing `git checkout` by hand. No-op outside a Git repo."""
+    if not in_git_repo():
+        return
+    exists = subprocess.run(["git", "rev-parse", "--verify", "--quiet", name],
+                            capture_output=True).returncode == 0
+    subprocess.run(["git", "switch", name] if exists else ["git", "switch", "-c", name])
+    print(f"[git] working on branch {name} (the orchestrator created/switched it for you).")
+
+
 def propose_pr(message: str) -> None:
    print("\n" + "=" * 80)
    print("GATE PASSED. Proposing a PR — NOT merging. A human reviews the diff (Module 10).")
@@ -202,6 +214,7 @@ def reject(reason: str, gate_output: str, *, simulated: bool = False) -> None:
 # --------------------------------------------------------------------------------------------------
 def cmd_issue_to_pr(issue_path: Path, simulate: str | None) -> int:
    print(f"[issue-to-pr] brief: {issue_path}")
+    ensure_branch(f"agent/{issue_path.stem}")
    if simulate:
        print(f"[issue-to-pr] simulating a '{simulate}' agent on the self-contained demo target.")
        simulate_implement(simulate)
@@ -218,6 +231,7 @@ def cmd_issue_to_pr(issue_path: Path, simulate: str | None) -> int:


 def cmd_self_heal(simulate: str | None) -> int:
+    ensure_branch("agent/self-heal")
    # Establish a failing state to heal. In a real pipeline this is "CI just went red on a push".
    if simulate:
        print(f"[self-heal] simulating a red build ('{simulate}') on the demo target.")