feat(course): build out all 27 modules, capstone, scaffold, and conventions

Scaffold the course repo and author the full curriculum in dependency-chain order, following the settled build decisions in handoff.md. - Scaffold: course README, vendor-neutral AGENTS.md (dogfoods Module 5), _TEMPLATE.md (the fixed 9-section module shape), root .gitignore, ship config. - Modules 1-2: reference exemplars (locked for tone/depth/lab style). - Modules 3-27: full lessons + runnable labs, each following the template, respecting the chain, vendor/model-agnostic, with "feel the pain" labs. - Module 8 hosting comparison web-researched and date-stamped (as of 2026-06-22), not written from memory; expansion-zone modules carry Verify-before-publish. - Capstone: the full loop end to end on the running tasks-app example. Lab code syntax-checked (Python/shell/YAML); every module has the 7 core template sections. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01TfzV5QvtPDz8LJS3Pu5VLT
2026-06-22 12:18:30 -04:00
parent 4bd586bbd0
commit fbec36cb67
117 changed files with 15131 additions and 1 deletions
@@ -0,0 +1,366 @@
+# Module 25 — Autonomous Agents: Issue-to-PR and Self-Healing CI
+
+> **Now the AI acts on its own — takes an assigned issue, opens a pull request, even fixes its own
+> failing build.** The thing that makes that safe isn't watching it work. It's that everything it
+> produces still lands as a reviewable PR behind the same gates you already built.
+
+---
+
+## Prerequisites
+
+This is the module the whole back half of the course was load-bearing for. It assumes a lot, on
+purpose — each piece is a wall the autonomous agent has to land behind.
+
+- **Module 24** — assistive agents, where the AI helped and *you* decided every step. This module is
+  the escalation: the agent now takes a step on its own. The only reason that's responsible is the
+  rest of this list.
+- **Module 9** — issues as an agent's task specification, including the `ready` label and the idea of
+  an agent as an *assignee*. An issue is the agent's input here.
+- **Module 6** — branches. The agent's work goes on a branch, never straight onto `main`.
+- **Modules 10 and 11** — the PR review gate and the full issue → branch → PR → review → merge → close
+  loop. The PR *is* the unit of supervision in this module.
+- **Modules 13 and 14** — tests and CI. The automated gate that runs on the agent's PR.
+- **Module 15** — security scanning as another gate on the same pushes. Autonomy makes this
+  non-optional, not optional.
+- **Module 19** — runners. A triggered or scheduled agent is just a runner job; you need to know
+  what's executing it and whose compute it's burning.
+- **Module 12** — revert, reset, recovery. The backstop for when a gate misses something.
+- **Module 5** — your committed AI instructions file: the agent's standing brief, the half of the
+  spec that isn't in the issue.
+- **Modules 16, 17, 22** — containers (sandboxing), secrets (scoped credentials), and the prompt-
+  injection attack surface. An unattended agent with a push token is a security boundary; these are
+  why.
+
+If you skipped straight here, the lesson will read as reckless — because without those gates, it
+*would* be.
+
+---
+
+## Learning objectives
+
+By the end of this module you can:
+
+1. Explain the difference between *assistive* (Module 24) and *autonomous-but-supervised* agents, and
+   state where supervision actually happens in each.
+2. Run an issue-to-PR agent: hand it a well-formed issue and have it produce a change on a branch
+   that arrives as a reviewable pull request — not a merge.
+3. Watch your existing CI / review / security gates catch a bad agent change before it can reach
+   `main`, and explain why that's *structural* supervision rather than *behavioral*.
+4. Build a bounded self-healing loop: when a gate fails, feed the failure back to the agent for a
+   fix, capped at N attempts, with the result landing as a PR you review.
+5. Decide how much autonomy to grant by reasoning about the strength of your gates — not the
+   intelligence of your model.
+
+---
+
+## Key concepts
+
+### The escalation: where supervision moved
+
+In Module 24 the agent *advised*. It commented on a PR; it triaged and labeled an issue. A human
+read the suggestion and took the action. Supervision was **behavioral**: you were in the loop on
+every decision, watching, approving, clicking the button.
+
+That doesn't scale, and watching an agent type is a terrible use of your attention anyway. This
+module makes the agent *take the action* — branch, edit files, commit, open a PR. The obvious worry
+is: if I'm not watching, what stops it from shipping garbage?
+
+The answer is the reframe of the whole unit:
+
+> **You don't supervise an autonomous agent by watching it work. You supervise it structurally — by
+> making everything it produces pass through gates that don't care whether a human or a machine wrote
+> the change.**
+
+You already built those gates, for exactly this reason, before you needed them:
+
+| Gate | Built in | What it catches on an agent's PR |
+|------|----------|----------------------------------|
+| **Review** | Module 10 | Plausible-but-wrong logic, scope creep, dropped edge cases — read the diff, not the agent's summary. |
+| **CI** | Module 14 | Lint failures, broken tests, anything that doesn't build. Runs identically on a human's PR and an agent's. |
+| **Security** | Module 15 | Hardcoded secrets, vulnerable or hallucinated dependencies, SAST findings. |
+| **Recovery** | Module 12 | The backstop: if something slips through and merges, `revert` cleanly undoes it. |
+
+The agent is autonomous *inside* that box and powerless to escape it. It cannot merge past a failing
+check or an unapproved review. That's the entire safety model, and it's why this module sits at the
+end of the course instead of the start: the box had to exist first.
+
+### Pattern 1 — Issue-to-PR
+
+The headline pattern, and the one Module 9 set up when it called an agent a possible *assignee*. The
+loop is exactly the human collaboration loop from Module 11, with one participant swapped:
+
+```
+issue (assigned/labeled)  →  agent reads it  →  branch  →  implement  →  commit  →  open PR
+                                                                                      │
+                                                                  CI + security + human review
+                                                                                      │
+                                                                              merge → issue closed
+```
+
+What the agent reads as its brief is two artifacts you already maintain:
+
+- **The issue** (Module 9) — the *specific* task: title, context, acceptance criteria, scope. The
+  acceptance criteria are the agent's literal definition of done.
+- **The committed config** (Module 5) — the *standing* brief: conventions, the build and test
+  commands, "don't touch these files," house style. Every assignee inherits it, including this one.
+
+Together they're enough for the agent to attempt the work with **no live conversation**. That's the
+point of having spent modules making both artifacts good: a well-formed issue plus a committed config
+is a complete, handoff-ready spec. Hand it a vague issue and you get the Module 9 failure mode at
+full volume — a confident, plausible, wrong PR that costs more to review than the work would have
+taken.
+
+Crucially: the agent's last step is **open a PR**, not **merge**. The output is a proposal. Nothing
+about "autonomous" means "merges to `main` unseen" — if that's your mental model, this is where you
+fix it.
+
+### Pattern 2 — Self-healing CI
+
+The second pattern points the agent at a *failure* instead of an issue. CI goes red on a branch; an
+agent reads the failing job's logs, proposes a fix, and pushes it back to the same branch so CI runs
+again.
+
+```
+push  →  CI fails  →  agent reads the failure  →  proposes a fix  →  push  →  CI re-runs
+                            ▲                                                     │
+                            └──────────── bounded retry (cap at N) ──────────────┘
+                                                                                  │
+                                                                       still red? hand to a human
+                                                                       green? PR for review
+```
+
+Two design rules make this safe rather than a money-burning loop:
+
+1. **Bound the retries.** Two or three attempts, then stop and tag a human. An agent that can retry
+   forever *will*, on a flaky test, producing an endless stream of plausible "fixes" and a runner
+   bill to match.
+2. **Watch what it's fixing.** The classic failure mode: the test fails, so the agent "fixes" it by
+   *editing the test to pass* instead of fixing the bug. That's why the green result still lands as a
+   **reviewable PR** — a human confirms it fixed the code, not the evidence. Self-healing CI proposes
+   a fix; it doesn't certify one.
+
+### Pattern 3 — Triggered and scheduled agent jobs
+
+How does an agent *start* without you launching it? It runs as a runner job (Module 19) — the same
+machinery that runs your CI, pointed at an agent instead of a test suite. Two triggers cover almost
+everything:
+
+- **Triggered** — an event fires the job: an issue gets a `ready`/`agent` label, a comment says
+  `/agent fix this`, a CI run goes red. Event in, agent runs, PR out.
+- **Scheduled** — a cron-style timer fires it: "every night, attempt the top `ready`-labelled issue,"
+  or "hourly, retry any red `main` build." This is where "the workflow starts running itself" stops
+  being a slogan.
+
+Either way it's a job on a runner, which means everything Module 19 taught applies: hosted vs.
+self-hosted, whose compute, and — new and important here — **what credentials that job holds.** A
+scheduled agent with a push token and write access is unattended automation acting in your name. It
+needs scoped secrets (Module 17), ideally a sandboxed environment (Module 16), and a healthy
+suspicion of anything it reads, because an issue body or a dependency's README is untrusted input
+that lands straight in its context (prompt injection, Module 22). Triggered autonomy is a real attack
+surface; treat it like one.
+
+### The one number that actually governs autonomy
+
+Here's the load-bearing idea of the module, and it's not about the model:
+
+> **An autonomous agent is exactly as safe as the gates it lands behind — no safer.** How much
+> autonomy you can responsibly grant is a property of *your CI, review, and security setup*, not of
+> how smart the model is.
+
+If your test suite covers 30% of behavior, an autonomous agent can silently break the other 70% and
+still go green. If your only "review" is rubber-stamping the diff, the review gate isn't real and the
+agent is effectively merging unseen. The work of making agents trustworthy is mostly the unglamorous
+work of making your gates strong — which is the work of Modules 10, 13, 14, and 15. Autonomy doesn't
+ask you to trust the model more. It asks you to trust your gates more, and to have earned it.
+
+---
+
+## The AI angle
+
+A generic automation lesson would teach you to script a runner job. What's specific to AI here is
+that **the actor inside the job is non-deterministic and persuasive**, and that changes what
+"automation" has to mean:
+
+- **The output is a proposal, not a result.** A normal scheduled job (back up the database, rotate
+  logs) you trust to *complete*. An agent job you trust only to *propose* — because its output is a
+  confident artifact that might be subtly wrong. That's why the universal endpoint is a PR behind a
+  gate, never a merge. The structure absorbs the non-determinism.
+- **Supervision shifts from the action to the gate.** With deterministic automation you review the
+  *script* once. With an agent you can't, because it writes something new every run — so you review
+  the *output* every run, automatically (CI, security) and by sample (human review). The supervision
+  didn't disappear; it moved from watching the agent to hardening the wall it hits.
+- **Self-healing tempts the worst shortcut in the toolkit.** Pointed at a failing test, an agent will
+  cheerfully delete or weaken the test, because that does technically make CI green. A human would
+  feel the dishonesty; the agent just optimizes the objective you gave it. The defense is structural:
+  the fix is a reviewable diff, and the reviewer's job (Module 10) explicitly includes reading the
+  `-` lines on the *test* file.
+- **Autonomy multiplies your earlier discipline, for good or ill.** A clean repo with strong gates
+  and a good committed config turns an agent into a tireless contributor. A repo with flaky tests, no
+  security scanning, and an empty config turns the same agent into an automated mess-generator running
+  on a timer. The agent doesn't fix your engineering — it amplifies it.
+
+---
+
+## Hands-on lab
+
+**Lab language:** Python (one orchestrator script) plus a little shell and Git. It runs on your own
+machine, any OS, against the `tasks-app` repo from Module 1 — no forge account or paid agent required
+to complete it.
+
+You'll drive an issue-to-PR run and a self-healing loop *locally*, so the moving parts are visible
+and reproducible. The "PR" in the local lab is a branch plus a diff you review; the optional Part D
+shows how the exact same flow runs on a real forge as a triggered/scheduled job.
+
+**You'll need:**
+
+- Your `tasks-app` Git repo (Modules 1–2), with the `test_tasks.py` from Module 14 present and
+  `pytest` and `ruff` installed (`pip install pytest ruff`). The lab runs these as the CI gate,
+  locally — the same checks `ci.yml` runs in Module 14.
+- The starter files in this module's `lab/` folder:
+  - `agent_runner.py` — the orchestrator. Drives the agent (real or simulated), then runs the gate,
+    and only ever produces a branch + PR proposal, never a merge.
+  - `issue-delete-command.md` — a well-formed issue (Module 9 format) for a `delete <index>` command:
+    the agent's input.
+  - `agent-job.yml` — a reference forge workflow showing the triggered + scheduled runner version.
+    Read it; you'll run it for real only in Part D.
+- *Optional, for the "for real" path:* an agentic coding tool that has a non-interactive / headless /
+  one-shot mode (most expose a flag for running a single prompt without the interactive UI). If you
+  don't have one wired up, the script's `--simulate` mode demonstrates every gate and loop
+  deterministically with no agent at all — do that first regardless.
+
+### Part A — See the gate catch a bad change (simulated, no agent needed)
+
+Copy `agent_runner.py` and `issue-delete-command.md` into your `tasks-app` folder. Then, from a clean
+branch:
+
+```bash
+cd ~/workflow-course/tasks-app
+git checkout -b agent/delete-command
+
+# Simulate an agent that produces a BROKEN change, then run the gate on it:
+python agent_runner.py issue-to-pr issue-delete-command.md --simulate bad
+```
+
+Watch the output. The "agent" plants a change, the script runs the gate (`ruff check` then
+`pytest -q`), a test fails, and the script **stops and refuses to call the work ready** — exit code
+non-zero, no PR proposed. That is structural supervision: it didn't matter that the change looked
+plausible; the gate caught it. Nothing reached `main`.
+
+### Part B — See a good change land as a PR proposal
+
+```bash
+python agent_runner.py issue-to-pr issue-delete-command.md --simulate good
+```
+
+This time the planted change is correct. The gate passes, the script commits to the branch and prints
+the diff for review plus the exact `git push` / open-PR command. **It does not merge.** Open the diff
+and review it with the Module 10 checklist — you are the human gate, and that step doesn't go away
+just because an agent did the typing.
+
+### Part C — Run the self-healing loop
+
+```bash
+git checkout -b agent/self-heal
+python agent_runner.py self-heal --simulate bad
+```
+
+The script plants a failing change, runs the gate (red), feeds the failure back to the "agent" for a
+fix, re-runs the gate, and repeats up to its retry cap. With `--simulate bad` the fix succeeds on the
+second attempt and the result is offered as a PR proposal. Run it with `--simulate stuck` to watch the
+cap trip: after N attempts it gives up and tags the work for a human instead of looping forever.
+
+### Part D — Do it for real (optional)
+
+Two ways to go from simulation to a genuine autonomous run:
+
+1. **Local, real agent.** Point the script at your agentic tool by setting one environment variable to
+   its headless invocation, then drop `--simulate`:
+
+   ```bash
+   export AGENT_CMD='your-agent-cli --print --prompt-file {prompt_file}'   # your tool's one-shot mode
+   python agent_runner.py issue-to-pr issue-delete-command.md
+   ```
+
+   The script builds the prompt from the issue **and** your committed config (Module 5), runs your
+   agent against `tasks-app`, then applies the *same* gate. A real agent, your real gate, a real PR
+   proposal.
+
+2. **On a forge, triggered/scheduled.** Read `agent-job.yml`. It's a runner workflow (Module 19) that
+   fires when an issue gets an `agent` label *and* on a nightly schedule, runs the agent on the
+   runner, and opens a PR — which then hits your normal CI (Module 14) and security (Module 15) gates
+   and waits for review. Wiring it up needs a scoped token in your forge's secrets (Module 17); the
+   file is commented with exactly what to set and what *not* to grant. This is the "workflow runs
+   itself" endpoint, and it's intentionally the last thing you turn on.
+
+---
+
+## Where it breaks
+
+The honest limits — and for autonomous agents, the limits *are* the lesson:
+
+- **Your gates are the ceiling, and most gates are weaker than they look.** Thin test coverage,
+  skipped security scans, or review-by-rubber-stamp don't just reduce quality — they directly set how
+  much an autonomous agent can quietly break. Don't grant more autonomy than your gates can verify.
+  The honest version of "should I let an agent do this unattended?" is "would my CI catch it if it got
+  it wrong?"
+- **Self-healing can fix the evidence instead of the bug.** Editing the test until it passes, widening
+  an exception so the error is swallowed, deleting an assertion — all turn CI green and all are wrong.
+  The bounded-retry cap stops the *loop*; only human review of the diff stops the *cheat*. Never let a
+  self-heal PR auto-merge on green alone.
+- **"Autonomous" is not "auto-merge."** Everything in this module stops at a PR. The moment you wire
+  an agent to merge its own work to `main` without a gate that a human controls, you've left supervised
+  autonomy and you own whatever it ships. That's a deliberate decision, not a default — and it's out
+  of scope for this course.
+- **Unattended agents are an attack surface, not just a convenience.** A scheduled agent holds
+  credentials and reads untrusted input (issue bodies, comments, dependency files) straight into its
+  context. Prompt injection (Module 22) means a malicious issue can try to redirect it; an over-broad
+  token (Module 17) means success is expensive. Scope the credentials, sandbox the run (Module 16),
+  and assume everything it reads is hostile.
+- **Runaway cost and churn are real.** An agent in a retry loop, or a scheduled job that re-attempts
+  the same impossible issue every night, burns runner minutes and review attention. Cap retries, cap
+  concurrency, and put a human checkpoint on anything that hasn't converged.
+- **Flaky gates make autonomy actively worse.** A nondeterministic test that fails 1-in-5 will send a
+  self-healing agent chasing a bug that isn't there. Autonomy demands *more* gate discipline than
+  manual work, not less — fix the flake before you point an agent at it.
+
+---
+
+## Check for understanding
+
+**You're done when:**
+
+- You ran an issue-to-PR flow (simulated or real) and the result was a **branch + PR proposal**, not a
+  merge — and you can point to exactly where a human or a gate still has to say yes.
+- You watched the gate **reject a bad agent change** (`--simulate bad`) and accept a good one, and you
+  can explain why that's structural supervision rather than watching the agent work.
+- You ran a self-healing loop, saw it propose a fix on failure, and saw the retry **cap trip**
+  (`--simulate stuck`) instead of looping forever.
+- You can finish this sentence without hand-waving: *"I'd let an agent do X unattended because my
+  gates would catch it if it got X wrong — specifically the gate from Module ___."*
+- You can name the three patterns (issue-to-PR, self-healing CI, triggered/scheduled jobs) and the
+  four gates that make any of them safe (review M10, CI M14, security M15, recovery M12).
+
+When "let the agent take the first pass" feels safe because you trust the wall it lands behind — not
+because you trust the model — you've got the model right. Module 26 takes the next step: more than one
+agent working at once without colliding, which is where the worktrees from Module 7 finally pay off at
+scale.
+
+---
+
+## Verify-before-publish
+
+This is an expansion-zone module sitting on fast-moving ground. Re-check at build time:
+
+- [ ] **Native issue-to-PR / "coding agent" offerings.** Forges and vendors are shipping built-in
+  assign-an-issue-to-an-agent and PR-fixing features fast, and renaming them faster. Confirm whether a
+  mainstream forge now offers this natively, and keep the lab's mechanism-agnostic framing if it's
+  still in flux. Don't name a specific product as *the* answer.
+- [ ] **Agentic-tool headless invocation.** The `AGENT_CMD` example assumes a non-interactive / one-
+  shot flag. Verify the major agentic CLIs still expose one and that the flag names in the example
+  read as plausible placeholders, not as one vendor's exact syntax.
+- [ ] **Self-healing CI integrations.** Marketplace actions and bots that auto-fix red builds appear
+  and disappear. Re-verify any referenced capability still exists and is still described neutrally.
+- [ ] **Triggered/scheduled workflow syntax.** The event names and `schedule`/cron syntax in
+  `agent-job.yml` are stable on the GitHub Actions flavor used in Module 14, but re-confirm the
+  trigger events (issue-labeled, comment command) match current forge behavior, and that the GitLab /
+  Forgejo equivalents in the comments are still accurate.
@@ -0,0 +1,82 @@
+# Reference: an autonomous agent running as a RUNNER JOB (Module 19) — triggered and scheduled.
+#
+# This is the "for real" version of agent_runner.py: instead of you launching the agent, the forge
+# launches it on a runner in response to an event or a timer, and the agent opens a PR. That PR then
+# hits your NORMAL gates — CI (Module 14), security scanning (Module 15), and human review (Module
+# 10) — exactly like a human's PR. The supervision is structural; this file just automates the start.
+#
+# GitHub Actions flavor (same as Module 14's ci.yml), so it goes in .github/workflows/. Equivalents:
+#   * GitLab:        a job with `rules:` on $CI_PIPELINE_SOURCE + a `workflow:` schedule.
+#   * Forgejo/Gitea: the same YAML under .forgejo/workflows/ or .gitea/workflows/.
+#
+# DO NOT enable this blindly. Read the security notes at the bottom first — an unattended agent with a
+# write token is automation acting in your name. This is the last thing you turn on, on purpose.
+
+name: agent-issue-to-pr
+
+on:
+  # TRIGGERED: fire when an issue gets the `agent` label. Event in -> agent runs -> PR out.
+  issues:
+    types: [labeled]
+  # SCHEDULED: also attempt work overnight. This is "the workflow runs itself" — keep it cheap.
+  schedule:
+    - cron: "0 6 * * *" # 06:00 UTC daily; adjust to your timezone and budget.
+
+jobs:
+  agent:
+    # Only run the triggered path when the label is actually `agent` (labeled events fire for ANY
+    # label). The scheduled path has no label, so allow it through too.
+    if: ${{ github.event_name == 'schedule' || github.event.label.name == 'agent' }}
+    runs-on: ubuntu-latest # whose compute this is — see Module 19 for self-hosted runners.
+
+    # Least privilege (Module 17): grant ONLY what opening a PR needs. Not admin, not secrets access.
+    permissions:
+      contents: write       # create the branch and commit
+      pull-requests: write  # open the PR
+      issues: read          # read the issue body (the agent's brief)
+
+    steps:
+      - name: Check out the code
+        uses: actions/checkout@v4
+
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+
+      - name: Install gate tools
+        run: pip install pytest ruff
+
+      - name: Run the agent on a fresh branch
+        env:
+          # The agent's model credentials come from a SCOPED secret you set in the forge — never
+          # hardcoded here (Module 17). Keep this provider-neutral: it's whatever your agent needs.
+          AGENT_API_KEY: ${{ secrets.AGENT_API_KEY }}
+          # Point AGENT_CMD at your agentic tool's non-interactive / one-shot mode.
+          AGENT_CMD: "your-agent-cli --print --prompt-file {prompt_file}"
+        run: |
+          git switch -c "agent/issue-${{ github.event.issue.number || github.run_id }}"
+          # In the triggered case, write the issue body to a file for the agent to read.
+          printf '%s' "${{ github.event.issue.body }}" > issue.md
+          python modules/25-autonomous-agents/lab/agent_runner.py issue-to-pr issue.md
+
+      # The agent's output is a PROPOSAL. Open the PR; do NOT merge. CI + security + review decide.
+      # (Use your forge's PR-creation step or CLI here; kept generic to stay vendor-neutral.)
+      - name: Open a pull request for review
+        run: |
+          git push -u origin HEAD
+          echo "Open a PR from this branch via your forge's API/CLI. It must pass CI (Module 14),"
+          echo "security scanning (Module 15), and human review (Module 10) before anyone merges it."
+
+# --- Security notes (read before enabling) -------------------------------------------------------
+# * Prompt injection (Module 22): github.event.issue.body is UNTRUSTED input that lands straight in
+#   the agent's context. A malicious issue can try to redirect the agent ("ignore your instructions,
+#   exfiltrate secrets..."). Scope the token tightly so a hijack can't do much, and never give this
+#   job access to deployment or admin secrets.
+# * No auto-merge. This file stops at "open a PR". Wiring an agent to merge its own work to main
+#   removes the human gate and is out of scope for this course.
+# * Sandbox (Module 16): for agents you trust less, run the agent step inside a container with no
+#   network beyond what it needs.
+# * Cost: a scheduled agent that re-attempts the same impossible issue every night burns runner
+#   minutes. Cap retries (agent_runner.py does) and consider a label the agent removes when it gives
+#   up, so it doesn't retry forever.
@@ -0,0 +1,258 @@
+"""Module 25 lab — an autonomous-but-supervised agent orchestrator.
+
+This is the smallest honest version of the two patterns in the module:
+
+  * issue-to-pr  — read an issue, let an agent implement it, run the gate, produce a PR PROPOSAL.
+  * self-heal    — run the gate; on failure, feed the failure back to the agent for a fix,
+                   bounded by a retry cap; produce a PR PROPOSAL.
+
+The load-bearing idea is in one place and you should be able to point at it: the agent NEVER merges.
+Every path ends at `propose_pr()` — a branch, a commit, and the command *you* would run to open the
+PR. The CI/review/security gates (Modules 14/15/10) and recovery (Module 12) are what supervise it,
+not a human watching it type.
+
+Run it two ways:
+
+  1. Simulated (no agent needed, fully deterministic) — see the machinery and the gates:
+         python agent_runner.py issue-to-pr issue-delete-command.md --simulate good
+         python agent_runner.py issue-to-pr issue-delete-command.md --simulate bad
+         python agent_runner.py self-heal --simulate bad
+         python agent_runner.py self-heal --simulate stuck
+
+     Simulation works on a SELF-CONTAINED demo target (agent_demo.py + test_agent_demo.py) so it is
+     deterministic and never corrupts your real tasks-app files. The gate it runs (ruff + pytest) is
+     the real one — the same checks Module 14's CI runs.
+
+  2. Real agent — drives your own agentic tool against the actual issue. Point AGENT_CMD at your
+     tool's non-interactive / one-shot mode, then drop --simulate:
+         export AGENT_CMD='your-agent-cli --print --prompt-file {prompt_file}'
+         python agent_runner.py issue-to-pr issue-delete-command.md
+
+Language: Python 3.10+. Standard library only.
+"""
+
+from __future__ import annotations
+
+import argparse
+import os
+import shlex
+import subprocess
+import sys
+import tempfile
+from pathlib import Path
+
+RETRY_CAP = 3  # self-healing stops after this many fix attempts and hands off to a human.
+
+# Demo target the simulator works on, so simulation never touches your real cli.py / tasks.py.
+DEMO_SRC = Path("agent_demo.py")
+DEMO_TEST = Path("test_agent_demo.py")
+
+# Vendor-neutral: where your committed AI config (Module 5) might live. Override with AGENT_CONFIG.
+CONFIG_CANDIDATES = ["AGENTS.md", ".agent/instructions.md", "agent-config.md"]
+
+
+# --------------------------------------------------------------------------------------------------
+# The gate — the same lint + test checks Module 14 runs in CI, run locally so they're reproducible.
+# This is the structural supervision. It does not care whether a human or an agent wrote the change.
+# --------------------------------------------------------------------------------------------------
+def run_gate() -> tuple[bool, str]:
+    """Run ruff then pytest in the current directory. Return (passed, combined_output)."""
+    out: list[str] = []
+    ok = True
+    for label, cmd in (("ruff (lint)", ["ruff", "check", "."]),
+                       ("pytest (tests)", ["pytest", "-q"])):
+        out.append(f"\n=== gate: {label} -> {' '.join(cmd)} ===")
+        try:
+            proc = subprocess.run(cmd, capture_output=True, text=True)
+        except FileNotFoundError:
+            out.append(f"  ! {cmd[0]} not installed — `pip install pytest ruff`. Treating as a gate FAIL.")
+            ok = False
+            continue
+        out.append(proc.stdout.rstrip())
+        if proc.stderr.strip():
+            out.append(proc.stderr.rstrip())
+        if proc.returncode != 0:
+            ok = False
+            out.append(f"  -> FAILED ({label})")
+    return ok, "\n".join(line for line in out if line is not None)
+
+
+# --------------------------------------------------------------------------------------------------
+# The agent — real (your tool) or simulated (deterministic, for the lab).
+# --------------------------------------------------------------------------------------------------
+def find_config() -> Path | None:
+    env = os.environ.get("AGENT_CONFIG")
+    if env and Path(env).exists():
+        return Path(env)
+    for name in CONFIG_CANDIDATES:
+        if Path(name).exists():
+            return Path(name)
+    return None
+
+
+def build_prompt(task: str, *, issue_path: Path | None = None, failure: str | None = None) -> str:
+    """Assemble the agent's brief: standing config (Module 5) + the specific task (issue or failure)."""
+    parts = ["You are working in a Git repository on the current branch. Make the change directly in",
+             "the files. Do not commit, push, or merge — just edit. Follow the project's conventions."]
+    config = find_config()
+    if config:
+        parts += ["", f"# Project conventions (from {config})", config.read_text()]
+    if issue_path:
+        parts += ["", "# Task (issue to implement)", issue_path.read_text()]
+    if failure:
+        parts += ["", "# A CI check just failed. Fix the CODE so it passes — do not weaken or delete",
+                  "# the test to make it pass. Here is the failing output:", "```", failure, "```"]
+    return "\n".join(parts)
+
+
+def run_real_agent(prompt: str) -> None:
+    """Drive the learner's agentic tool via AGENT_CMD. Template may contain {prompt_file}; otherwise
+    the prompt is piped to stdin. Kept vendor-neutral on purpose."""
+    template = os.environ["AGENT_CMD"]
+    with tempfile.NamedTemporaryFile("w", suffix=".md", delete=False) as fh:
+        fh.write(prompt)
+        prompt_file = fh.name
+    try:
+        if "{prompt_file}" in template:
+            cmd = shlex.split(template.replace("{prompt_file}", prompt_file))
+            proc = subprocess.run(cmd)
+        else:
+            proc = subprocess.run(shlex.split(template), input=prompt, text=True)
+        if proc.returncode != 0:
+            sys.exit(f"agent command exited non-zero ({proc.returncode}); aborting.")
+    finally:
+        os.unlink(prompt_file)
+
+
+# Simulated agent: writes a self-contained demo module so the gate has something real to judge.
+def simulate_implement(variant: str) -> None:
+    DEMO_TEST.write_text(
+        "from agent_demo import discount\n\n\n"
+        "def test_discount_takes_a_percentage():\n"
+        "    # 10% off 200 is 180. A flat subtraction (200 - 10 = 190) is the plausible-but-wrong bug.\n"
+        "    assert discount(200, 10) == 180\n"
+    )
+    if variant == "good":
+        DEMO_SRC.write_text("def discount(price, pct):\n    return price - price * pct / 100\n")
+    else:  # 'bad' — plausible but wrong: treats the percent as a flat amount.
+        DEMO_SRC.write_text("def discount(price, pct):\n    return price - pct\n")
+
+
+def simulate_fix(variant: str, attempt: int) -> None:
+    if variant == "stuck":
+        # The "agent" keeps producing plausible, still-wrong fixes — the loop must give up, not run forever.
+        DEMO_SRC.write_text(f"def discount(price, pct):\n    return price - pct - {attempt}\n")
+    else:  # 'bad' — converges on the second attempt with the correct formula.
+        DEMO_SRC.write_text("def discount(price, pct):\n    return price - price * pct / 100\n")
+
+
+# --------------------------------------------------------------------------------------------------
+# The endpoint every path shares: a PR PROPOSAL. Never a merge.
+# --------------------------------------------------------------------------------------------------
+def in_git_repo() -> bool:
+    return subprocess.run(["git", "rev-parse", "--is-inside-work-tree"],
+                          capture_output=True).returncode == 0
+
+
+def propose_pr(message: str) -> None:
+    print("\n" + "=" * 80)
+    print("GATE PASSED. Proposing a PR — NOT merging. A human reviews the diff (Module 10).")
+    print("=" * 80)
+    if in_git_repo():
+        subprocess.run(["git", "add", "-A"])
+        subprocess.run(["git", "commit", "-m", message])
+        branch = subprocess.run(["git", "rev-parse", "--abbrev-ref", "HEAD"],
+                                capture_output=True, text=True).stdout.strip()
+        print("\nReview the change you're about to propose:")
+        print("    git show HEAD            # or: git diff main..HEAD")
+        print("\nThen open the PR (nothing has left your machine yet):")
+        print(f"    git push -u origin {branch}")
+        print("    # ...and open a pull request on your forge. CI + security gates run there.")
+    else:
+        print("\n(Not a Git repo — skipping commit. In your tasks-app this would commit to the branch.)")
+    print("\nThe agent stops here. It cannot merge. That is the whole safety model.")
+
+
+def reject(reason: str, gate_output: str) -> None:
+    print(gate_output)
+    print("\n" + "=" * 80)
+    print(f"GATE FAILED: {reason}")
+    print("No PR proposed. The branch is left as-is for you to inspect or discard:")
+    print("    git restore .            # throw the agent's change away (Module 2)")
+    print("=" * 80)
+
+
+# --------------------------------------------------------------------------------------------------
+# The two patterns.
+# --------------------------------------------------------------------------------------------------
+def cmd_issue_to_pr(issue_path: Path, simulate: str | None) -> int:
+    print(f"[issue-to-pr] brief: {issue_path}")
+    if simulate:
+        print(f"[issue-to-pr] simulating a '{simulate}' agent on the self-contained demo target.")
+        simulate_implement(simulate)
+    else:
+        run_real_agent(build_prompt("implement", issue_path=issue_path))
+
+    ok, gate_output = run_gate()
+    if ok:
+        print(gate_output)
+        propose_pr(f"Agent: implement {issue_path.stem}")
+        return 0
+    reject("the agent's change does not pass the gate", gate_output)
+    return 1
+
+
+def cmd_self_heal(simulate: str | None) -> int:
+    # Establish a failing state to heal. In a real pipeline this is "CI just went red on a push".
+    if simulate:
+        print(f"[self-heal] simulating a red build ('{simulate}') on the demo target.")
+        simulate_implement("bad")
+    else:
+        print("[self-heal] running the gate on the current working tree to find the failure...")
+
+    for attempt in range(1, RETRY_CAP + 1):
+        ok, gate_output = run_gate()
+        if ok:
+            print(gate_output)
+            print(f"\n[self-heal] gate is green after {attempt - 1} fix attempt(s).")
+            propose_pr("Agent: self-healing fix for failing CI")
+            return 0
+        print(gate_output)
+        if attempt > RETRY_CAP - 1:
+            break
+        print(f"\n[self-heal] gate red — attempt {attempt}/{RETRY_CAP - 1}: asking the agent for a fix.")
+        if simulate:
+            simulate_fix(simulate, attempt)
+        else:
+            run_real_agent(build_prompt("fix", failure=gate_output))
+
+    print("\n" + "=" * 80)
+    print(f"SELF-HEAL GAVE UP after {RETRY_CAP - 1} attempts. Handing off to a human — NOT looping forever.")
+    print("This cap is what stops an agent burning a runner bill chasing a flaky or impossible fix.")
+    print("=" * 80)
+    return 2
+
+
+def main(argv: list[str]) -> int:
+    parser = argparse.ArgumentParser(description="Autonomous-but-supervised agent orchestrator (Module 25).")
+    sub = parser.add_subparsers(dest="command", required=True)
+
+    p_itp = sub.add_parser("issue-to-pr", help="implement an issue and propose a PR")
+    p_itp.add_argument("issue", type=Path, help="path to the issue markdown file")
+    p_itp.add_argument("--simulate", choices=["good", "bad"], help="run without a real agent")
+
+    p_sh = sub.add_parser("self-heal", help="fix a failing gate, bounded by a retry cap, and propose a PR")
+    p_sh.add_argument("--simulate", choices=["bad", "stuck"], help="run without a real agent")
+
+    args = parser.parse_args(argv)
+    if not args.simulate and "AGENT_CMD" not in os.environ:
+        sys.exit("No --simulate and no AGENT_CMD set. Set AGENT_CMD to your agent's headless command, "
+                 "or pass --simulate to run the deterministic demo.")
+
+    if args.command == "issue-to-pr":
+        return cmd_issue_to_pr(args.issue, args.simulate)
+    return cmd_self_heal(args.simulate)
+
+
+if __name__ == "__main__":
+    raise SystemExit(main(sys.argv[1:]))
@@ -0,0 +1,35 @@
+<!--
+  The agent's INPUT for Module 25. This is a well-formed issue in the Module 9 format: title,
+  context, acceptance criteria, scope. It is deliberately a good candidate for an agent — well-
+  scoped, concrete, and it mirrors a pattern already in the codebase (the existing `done` command).
+
+  The orchestrator (agent_runner.py) reads this file and pairs it with your committed AI config
+  (Module 5) to build the agent's brief. Edit it and you change what the agent attempts.
+-->
+
+# Add a `delete <index>` command to the CLI
+
+**Type:** feature · **Priority:** p2 · **Labels:** `cli`, `ready`, `agent`
+
+## Context
+
+`tasks-app` can `add`, `list`, and mark a task `done`, but there's no way to remove a task. Once a
+task is added by mistake it stays forever. The `done` command already takes an index and mutates the
+list through a method on `TaskList`, so a `delete` command should follow the exact same shape — this
+is a patterned change, not a design problem.
+
+## Acceptance criteria
+
+- `python cli.py delete <index>` removes the task at that 0-based index and saves the list.
+- After deleting, the remaining tasks keep their relative order.
+- `delete` with an out-of-range or non-integer index prints a clear error (e.g.
+  `no task at index 99`) and exits non-zero, instead of dumping a traceback.
+- The logic lives on `TaskList` (a `remove(index)` method or equivalent), mirroring how `complete`
+  works — `cli.py` only parses arguments and calls it.
+- A test covers: a successful delete removes the right task, and an out-of-range delete is handled.
+
+## Out of scope
+
+- Changing how tasks are stored or numbered.
+- Bulk delete, undo, or a confirmation prompt.
+- Reworking the existing `add` / `list` / `done` commands.