feat(course): build out all 27 modules, capstone, scaffold, and conventions

Scaffold the course repo and author the full curriculum in dependency-chain
order, following the settled build decisions in handoff.md.

- Scaffold: course README, vendor-neutral AGENTS.md (dogfoods Module 5),
  _TEMPLATE.md (the fixed 9-section module shape), root .gitignore, ship config.
- Modules 1-2: reference exemplars (locked for tone/depth/lab style).
- Modules 3-27: full lessons + runnable labs, each following the template,
  respecting the chain, vendor/model-agnostic, with "feel the pain" labs.
- Module 8 hosting comparison web-researched and date-stamped (as of 2026-06-22),
  not written from memory; expansion-zone modules carry Verify-before-publish.
- Capstone: the full loop end to end on the running tasks-app example.

Lab code syntax-checked (Python/shell/YAML); every module has the 7 core
template sections.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TfzV5QvtPDz8LJS3Pu5VLT
This commit is contained in:
2026-06-22 12:18:30 -04:00
parent 4bd586bbd0
commit fbec36cb67
117 changed files with 15131 additions and 1 deletions
+366
View File
@@ -0,0 +1,366 @@
# Module 25 — Autonomous Agents: Issue-to-PR and Self-Healing CI
> **Now the AI acts on its own — takes an assigned issue, opens a pull request, even fixes its own
> failing build.** The thing that makes that safe isn't watching it work. It's that everything it
> produces still lands as a reviewable PR behind the same gates you already built.
---
## Prerequisites
This is the module the whole back half of the course was load-bearing for. It assumes a lot, on
purpose — each piece is a wall the autonomous agent has to land behind.
- **Module 24** — assistive agents, where the AI helped and *you* decided every step. This module is
the escalation: the agent now takes a step on its own. The only reason that's responsible is the
rest of this list.
- **Module 9** — issues as an agent's task specification, including the `ready` label and the idea of
an agent as an *assignee*. An issue is the agent's input here.
- **Module 6** — branches. The agent's work goes on a branch, never straight onto `main`.
- **Modules 10 and 11** — the PR review gate and the full issue → branch → PR → review → merge → close
loop. The PR *is* the unit of supervision in this module.
- **Modules 13 and 14** — tests and CI. The automated gate that runs on the agent's PR.
- **Module 15** — security scanning as another gate on the same pushes. Autonomy makes this
non-optional, not optional.
- **Module 19** — runners. A triggered or scheduled agent is just a runner job; you need to know
what's executing it and whose compute it's burning.
- **Module 12** — revert, reset, recovery. The backstop for when a gate misses something.
- **Module 5** — your committed AI instructions file: the agent's standing brief, the half of the
spec that isn't in the issue.
- **Modules 16, 17, 22** — containers (sandboxing), secrets (scoped credentials), and the prompt-
injection attack surface. An unattended agent with a push token is a security boundary; these are
why.
If you skipped straight here, the lesson will read as reckless — because without those gates, it
*would* be.
---
## Learning objectives
By the end of this module you can:
1. Explain the difference between *assistive* (Module 24) and *autonomous-but-supervised* agents, and
state where supervision actually happens in each.
2. Run an issue-to-PR agent: hand it a well-formed issue and have it produce a change on a branch
that arrives as a reviewable pull request — not a merge.
3. Watch your existing CI / review / security gates catch a bad agent change before it can reach
`main`, and explain why that's *structural* supervision rather than *behavioral*.
4. Build a bounded self-healing loop: when a gate fails, feed the failure back to the agent for a
fix, capped at N attempts, with the result landing as a PR you review.
5. Decide how much autonomy to grant by reasoning about the strength of your gates — not the
intelligence of your model.
---
## Key concepts
### The escalation: where supervision moved
In Module 24 the agent *advised*. It commented on a PR; it triaged and labeled an issue. A human
read the suggestion and took the action. Supervision was **behavioral**: you were in the loop on
every decision, watching, approving, clicking the button.
That doesn't scale, and watching an agent type is a terrible use of your attention anyway. This
module makes the agent *take the action* — branch, edit files, commit, open a PR. The obvious worry
is: if I'm not watching, what stops it from shipping garbage?
The answer is the reframe of the whole unit:
> **You don't supervise an autonomous agent by watching it work. You supervise it structurally — by
> making everything it produces pass through gates that don't care whether a human or a machine wrote
> the change.**
You already built those gates, for exactly this reason, before you needed them:
| Gate | Built in | What it catches on an agent's PR |
|------|----------|----------------------------------|
| **Review** | Module 10 | Plausible-but-wrong logic, scope creep, dropped edge cases — read the diff, not the agent's summary. |
| **CI** | Module 14 | Lint failures, broken tests, anything that doesn't build. Runs identically on a human's PR and an agent's. |
| **Security** | Module 15 | Hardcoded secrets, vulnerable or hallucinated dependencies, SAST findings. |
| **Recovery** | Module 12 | The backstop: if something slips through and merges, `revert` cleanly undoes it. |
The agent is autonomous *inside* that box and powerless to escape it. It cannot merge past a failing
check or an unapproved review. That's the entire safety model, and it's why this module sits at the
end of the course instead of the start: the box had to exist first.
### Pattern 1 — Issue-to-PR
The headline pattern, and the one Module 9 set up when it called an agent a possible *assignee*. The
loop is exactly the human collaboration loop from Module 11, with one participant swapped:
```
issue (assigned/labeled) → agent reads it → branch → implement → commit → open PR
CI + security + human review
merge → issue closed
```
What the agent reads as its brief is two artifacts you already maintain:
- **The issue** (Module 9) — the *specific* task: title, context, acceptance criteria, scope. The
acceptance criteria are the agent's literal definition of done.
- **The committed config** (Module 5) — the *standing* brief: conventions, the build and test
commands, "don't touch these files," house style. Every assignee inherits it, including this one.
Together they're enough for the agent to attempt the work with **no live conversation**. That's the
point of having spent modules making both artifacts good: a well-formed issue plus a committed config
is a complete, handoff-ready spec. Hand it a vague issue and you get the Module 9 failure mode at
full volume — a confident, plausible, wrong PR that costs more to review than the work would have
taken.
Crucially: the agent's last step is **open a PR**, not **merge**. The output is a proposal. Nothing
about "autonomous" means "merges to `main` unseen" — if that's your mental model, this is where you
fix it.
### Pattern 2 — Self-healing CI
The second pattern points the agent at a *failure* instead of an issue. CI goes red on a branch; an
agent reads the failing job's logs, proposes a fix, and pushes it back to the same branch so CI runs
again.
```
push → CI fails → agent reads the failure → proposes a fix → push → CI re-runs
▲ │
└──────────── bounded retry (cap at N) ──────────────┘
still red? hand to a human
green? PR for review
```
Two design rules make this safe rather than a money-burning loop:
1. **Bound the retries.** Two or three attempts, then stop and tag a human. An agent that can retry
forever *will*, on a flaky test, producing an endless stream of plausible "fixes" and a runner
bill to match.
2. **Watch what it's fixing.** The classic failure mode: the test fails, so the agent "fixes" it by
*editing the test to pass* instead of fixing the bug. That's why the green result still lands as a
**reviewable PR** — a human confirms it fixed the code, not the evidence. Self-healing CI proposes
a fix; it doesn't certify one.
### Pattern 3 — Triggered and scheduled agent jobs
How does an agent *start* without you launching it? It runs as a runner job (Module 19) — the same
machinery that runs your CI, pointed at an agent instead of a test suite. Two triggers cover almost
everything:
- **Triggered** — an event fires the job: an issue gets a `ready`/`agent` label, a comment says
`/agent fix this`, a CI run goes red. Event in, agent runs, PR out.
- **Scheduled** — a cron-style timer fires it: "every night, attempt the top `ready`-labelled issue,"
or "hourly, retry any red `main` build." This is where "the workflow starts running itself" stops
being a slogan.
Either way it's a job on a runner, which means everything Module 19 taught applies: hosted vs.
self-hosted, whose compute, and — new and important here — **what credentials that job holds.** A
scheduled agent with a push token and write access is unattended automation acting in your name. It
needs scoped secrets (Module 17), ideally a sandboxed environment (Module 16), and a healthy
suspicion of anything it reads, because an issue body or a dependency's README is untrusted input
that lands straight in its context (prompt injection, Module 22). Triggered autonomy is a real attack
surface; treat it like one.
### The one number that actually governs autonomy
Here's the load-bearing idea of the module, and it's not about the model:
> **An autonomous agent is exactly as safe as the gates it lands behind — no safer.** How much
> autonomy you can responsibly grant is a property of *your CI, review, and security setup*, not of
> how smart the model is.
If your test suite covers 30% of behavior, an autonomous agent can silently break the other 70% and
still go green. If your only "review" is rubber-stamping the diff, the review gate isn't real and the
agent is effectively merging unseen. The work of making agents trustworthy is mostly the unglamorous
work of making your gates strong — which is the work of Modules 10, 13, 14, and 15. Autonomy doesn't
ask you to trust the model more. It asks you to trust your gates more, and to have earned it.
---
## The AI angle
A generic automation lesson would teach you to script a runner job. What's specific to AI here is
that **the actor inside the job is non-deterministic and persuasive**, and that changes what
"automation" has to mean:
- **The output is a proposal, not a result.** A normal scheduled job (back up the database, rotate
logs) you trust to *complete*. An agent job you trust only to *propose* — because its output is a
confident artifact that might be subtly wrong. That's why the universal endpoint is a PR behind a
gate, never a merge. The structure absorbs the non-determinism.
- **Supervision shifts from the action to the gate.** With deterministic automation you review the
*script* once. With an agent you can't, because it writes something new every run — so you review
the *output* every run, automatically (CI, security) and by sample (human review). The supervision
didn't disappear; it moved from watching the agent to hardening the wall it hits.
- **Self-healing tempts the worst shortcut in the toolkit.** Pointed at a failing test, an agent will
cheerfully delete or weaken the test, because that does technically make CI green. A human would
feel the dishonesty; the agent just optimizes the objective you gave it. The defense is structural:
the fix is a reviewable diff, and the reviewer's job (Module 10) explicitly includes reading the
`-` lines on the *test* file.
- **Autonomy multiplies your earlier discipline, for good or ill.** A clean repo with strong gates
and a good committed config turns an agent into a tireless contributor. A repo with flaky tests, no
security scanning, and an empty config turns the same agent into an automated mess-generator running
on a timer. The agent doesn't fix your engineering — it amplifies it.
---
## Hands-on lab
**Lab language:** Python (one orchestrator script) plus a little shell and Git. It runs on your own
machine, any OS, against the `tasks-app` repo from Module 1 — no forge account or paid agent required
to complete it.
You'll drive an issue-to-PR run and a self-healing loop *locally*, so the moving parts are visible
and reproducible. The "PR" in the local lab is a branch plus a diff you review; the optional Part D
shows how the exact same flow runs on a real forge as a triggered/scheduled job.
**You'll need:**
- Your `tasks-app` Git repo (Modules 12), with the `test_tasks.py` from Module 14 present and
`pytest` and `ruff` installed (`pip install pytest ruff`). The lab runs these as the CI gate,
locally — the same checks `ci.yml` runs in Module 14.
- The starter files in this module's `lab/` folder:
- `agent_runner.py` — the orchestrator. Drives the agent (real or simulated), then runs the gate,
and only ever produces a branch + PR proposal, never a merge.
- `issue-delete-command.md` — a well-formed issue (Module 9 format) for a `delete <index>` command:
the agent's input.
- `agent-job.yml` — a reference forge workflow showing the triggered + scheduled runner version.
Read it; you'll run it for real only in Part D.
- *Optional, for the "for real" path:* an agentic coding tool that has a non-interactive / headless /
one-shot mode (most expose a flag for running a single prompt without the interactive UI). If you
don't have one wired up, the script's `--simulate` mode demonstrates every gate and loop
deterministically with no agent at all — do that first regardless.
### Part A — See the gate catch a bad change (simulated, no agent needed)
Copy `agent_runner.py` and `issue-delete-command.md` into your `tasks-app` folder. Then, from a clean
branch:
```bash
cd ~/workflow-course/tasks-app
git checkout -b agent/delete-command
# Simulate an agent that produces a BROKEN change, then run the gate on it:
python agent_runner.py issue-to-pr issue-delete-command.md --simulate bad
```
Watch the output. The "agent" plants a change, the script runs the gate (`ruff check` then
`pytest -q`), a test fails, and the script **stops and refuses to call the work ready** — exit code
non-zero, no PR proposed. That is structural supervision: it didn't matter that the change looked
plausible; the gate caught it. Nothing reached `main`.
### Part B — See a good change land as a PR proposal
```bash
python agent_runner.py issue-to-pr issue-delete-command.md --simulate good
```
This time the planted change is correct. The gate passes, the script commits to the branch and prints
the diff for review plus the exact `git push` / open-PR command. **It does not merge.** Open the diff
and review it with the Module 10 checklist — you are the human gate, and that step doesn't go away
just because an agent did the typing.
### Part C — Run the self-healing loop
```bash
git checkout -b agent/self-heal
python agent_runner.py self-heal --simulate bad
```
The script plants a failing change, runs the gate (red), feeds the failure back to the "agent" for a
fix, re-runs the gate, and repeats up to its retry cap. With `--simulate bad` the fix succeeds on the
second attempt and the result is offered as a PR proposal. Run it with `--simulate stuck` to watch the
cap trip: after N attempts it gives up and tags the work for a human instead of looping forever.
### Part D — Do it for real (optional)
Two ways to go from simulation to a genuine autonomous run:
1. **Local, real agent.** Point the script at your agentic tool by setting one environment variable to
its headless invocation, then drop `--simulate`:
```bash
export AGENT_CMD='your-agent-cli --print --prompt-file {prompt_file}' # your tool's one-shot mode
python agent_runner.py issue-to-pr issue-delete-command.md
```
The script builds the prompt from the issue **and** your committed config (Module 5), runs your
agent against `tasks-app`, then applies the *same* gate. A real agent, your real gate, a real PR
proposal.
2. **On a forge, triggered/scheduled.** Read `agent-job.yml`. It's a runner workflow (Module 19) that
fires when an issue gets an `agent` label *and* on a nightly schedule, runs the agent on the
runner, and opens a PR — which then hits your normal CI (Module 14) and security (Module 15) gates
and waits for review. Wiring it up needs a scoped token in your forge's secrets (Module 17); the
file is commented with exactly what to set and what *not* to grant. This is the "workflow runs
itself" endpoint, and it's intentionally the last thing you turn on.
---
## Where it breaks
The honest limits — and for autonomous agents, the limits *are* the lesson:
- **Your gates are the ceiling, and most gates are weaker than they look.** Thin test coverage,
skipped security scans, or review-by-rubber-stamp don't just reduce quality — they directly set how
much an autonomous agent can quietly break. Don't grant more autonomy than your gates can verify.
The honest version of "should I let an agent do this unattended?" is "would my CI catch it if it got
it wrong?"
- **Self-healing can fix the evidence instead of the bug.** Editing the test until it passes, widening
an exception so the error is swallowed, deleting an assertion — all turn CI green and all are wrong.
The bounded-retry cap stops the *loop*; only human review of the diff stops the *cheat*. Never let a
self-heal PR auto-merge on green alone.
- **"Autonomous" is not "auto-merge."** Everything in this module stops at a PR. The moment you wire
an agent to merge its own work to `main` without a gate that a human controls, you've left supervised
autonomy and you own whatever it ships. That's a deliberate decision, not a default — and it's out
of scope for this course.
- **Unattended agents are an attack surface, not just a convenience.** A scheduled agent holds
credentials and reads untrusted input (issue bodies, comments, dependency files) straight into its
context. Prompt injection (Module 22) means a malicious issue can try to redirect it; an over-broad
token (Module 17) means success is expensive. Scope the credentials, sandbox the run (Module 16),
and assume everything it reads is hostile.
- **Runaway cost and churn are real.** An agent in a retry loop, or a scheduled job that re-attempts
the same impossible issue every night, burns runner minutes and review attention. Cap retries, cap
concurrency, and put a human checkpoint on anything that hasn't converged.
- **Flaky gates make autonomy actively worse.** A nondeterministic test that fails 1-in-5 will send a
self-healing agent chasing a bug that isn't there. Autonomy demands *more* gate discipline than
manual work, not less — fix the flake before you point an agent at it.
---
## Check for understanding
**You're done when:**
- You ran an issue-to-PR flow (simulated or real) and the result was a **branch + PR proposal**, not a
merge — and you can point to exactly where a human or a gate still has to say yes.
- You watched the gate **reject a bad agent change** (`--simulate bad`) and accept a good one, and you
can explain why that's structural supervision rather than watching the agent work.
- You ran a self-healing loop, saw it propose a fix on failure, and saw the retry **cap trip**
(`--simulate stuck`) instead of looping forever.
- You can finish this sentence without hand-waving: *"I'd let an agent do X unattended because my
gates would catch it if it got X wrong — specifically the gate from Module ___."*
- You can name the three patterns (issue-to-PR, self-healing CI, triggered/scheduled jobs) and the
four gates that make any of them safe (review M10, CI M14, security M15, recovery M12).
When "let the agent take the first pass" feels safe because you trust the wall it lands behind — not
because you trust the model — you've got the model right. Module 26 takes the next step: more than one
agent working at once without colliding, which is where the worktrees from Module 7 finally pay off at
scale.
---
## Verify-before-publish
This is an expansion-zone module sitting on fast-moving ground. Re-check at build time:
- [ ] **Native issue-to-PR / "coding agent" offerings.** Forges and vendors are shipping built-in
assign-an-issue-to-an-agent and PR-fixing features fast, and renaming them faster. Confirm whether a
mainstream forge now offers this natively, and keep the lab's mechanism-agnostic framing if it's
still in flux. Don't name a specific product as *the* answer.
- [ ] **Agentic-tool headless invocation.** The `AGENT_CMD` example assumes a non-interactive / one-
shot flag. Verify the major agentic CLIs still expose one and that the flag names in the example
read as plausible placeholders, not as one vendor's exact syntax.
- [ ] **Self-healing CI integrations.** Marketplace actions and bots that auto-fix red builds appear
and disappear. Re-verify any referenced capability still exists and is still described neutrally.
- [ ] **Triggered/scheduled workflow syntax.** The event names and `schedule`/cron syntax in
`agent-job.yml` are stable on the GitHub Actions flavor used in Module 14, but re-confirm the
trigger events (issue-labeled, comment command) match current forge behavior, and that the GitLab /
Forgejo equivalents in the comments are still accurate.
@@ -0,0 +1,82 @@
# Reference: an autonomous agent running as a RUNNER JOB (Module 19) — triggered and scheduled.
#
# This is the "for real" version of agent_runner.py: instead of you launching the agent, the forge
# launches it on a runner in response to an event or a timer, and the agent opens a PR. That PR then
# hits your NORMAL gates — CI (Module 14), security scanning (Module 15), and human review (Module
# 10) — exactly like a human's PR. The supervision is structural; this file just automates the start.
#
# GitHub Actions flavor (same as Module 14's ci.yml), so it goes in .github/workflows/. Equivalents:
# * GitLab: a job with `rules:` on $CI_PIPELINE_SOURCE + a `workflow:` schedule.
# * Forgejo/Gitea: the same YAML under .forgejo/workflows/ or .gitea/workflows/.
#
# DO NOT enable this blindly. Read the security notes at the bottom first — an unattended agent with a
# write token is automation acting in your name. This is the last thing you turn on, on purpose.
name: agent-issue-to-pr
on:
# TRIGGERED: fire when an issue gets the `agent` label. Event in -> agent runs -> PR out.
issues:
types: [labeled]
# SCHEDULED: also attempt work overnight. This is "the workflow runs itself" — keep it cheap.
schedule:
- cron: "0 6 * * *" # 06:00 UTC daily; adjust to your timezone and budget.
jobs:
agent:
# Only run the triggered path when the label is actually `agent` (labeled events fire for ANY
# label). The scheduled path has no label, so allow it through too.
if: ${{ github.event_name == 'schedule' || github.event.label.name == 'agent' }}
runs-on: ubuntu-latest # whose compute this is — see Module 19 for self-hosted runners.
# Least privilege (Module 17): grant ONLY what opening a PR needs. Not admin, not secrets access.
permissions:
contents: write # create the branch and commit
pull-requests: write # open the PR
issues: read # read the issue body (the agent's brief)
steps:
- name: Check out the code
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Install gate tools
run: pip install pytest ruff
- name: Run the agent on a fresh branch
env:
# The agent's model credentials come from a SCOPED secret you set in the forge — never
# hardcoded here (Module 17). Keep this provider-neutral: it's whatever your agent needs.
AGENT_API_KEY: ${{ secrets.AGENT_API_KEY }}
# Point AGENT_CMD at your agentic tool's non-interactive / one-shot mode.
AGENT_CMD: "your-agent-cli --print --prompt-file {prompt_file}"
run: |
git switch -c "agent/issue-${{ github.event.issue.number || github.run_id }}"
# In the triggered case, write the issue body to a file for the agent to read.
printf '%s' "${{ github.event.issue.body }}" > issue.md
python modules/25-autonomous-agents/lab/agent_runner.py issue-to-pr issue.md
# The agent's output is a PROPOSAL. Open the PR; do NOT merge. CI + security + review decide.
# (Use your forge's PR-creation step or CLI here; kept generic to stay vendor-neutral.)
- name: Open a pull request for review
run: |
git push -u origin HEAD
echo "Open a PR from this branch via your forge's API/CLI. It must pass CI (Module 14),"
echo "security scanning (Module 15), and human review (Module 10) before anyone merges it."
# --- Security notes (read before enabling) -------------------------------------------------------
# * Prompt injection (Module 22): github.event.issue.body is UNTRUSTED input that lands straight in
# the agent's context. A malicious issue can try to redirect the agent ("ignore your instructions,
# exfiltrate secrets..."). Scope the token tightly so a hijack can't do much, and never give this
# job access to deployment or admin secrets.
# * No auto-merge. This file stops at "open a PR". Wiring an agent to merge its own work to main
# removes the human gate and is out of scope for this course.
# * Sandbox (Module 16): for agents you trust less, run the agent step inside a container with no
# network beyond what it needs.
# * Cost: a scheduled agent that re-attempts the same impossible issue every night burns runner
# minutes. Cap retries (agent_runner.py does) and consider a label the agent removes when it gives
# up, so it doesn't retry forever.
@@ -0,0 +1,258 @@
"""Module 25 lab — an autonomous-but-supervised agent orchestrator.
This is the smallest honest version of the two patterns in the module:
* issue-to-pr — read an issue, let an agent implement it, run the gate, produce a PR PROPOSAL.
* self-heal — run the gate; on failure, feed the failure back to the agent for a fix,
bounded by a retry cap; produce a PR PROPOSAL.
The load-bearing idea is in one place and you should be able to point at it: the agent NEVER merges.
Every path ends at `propose_pr()` — a branch, a commit, and the command *you* would run to open the
PR. The CI/review/security gates (Modules 14/15/10) and recovery (Module 12) are what supervise it,
not a human watching it type.
Run it two ways:
1. Simulated (no agent needed, fully deterministic) — see the machinery and the gates:
python agent_runner.py issue-to-pr issue-delete-command.md --simulate good
python agent_runner.py issue-to-pr issue-delete-command.md --simulate bad
python agent_runner.py self-heal --simulate bad
python agent_runner.py self-heal --simulate stuck
Simulation works on a SELF-CONTAINED demo target (agent_demo.py + test_agent_demo.py) so it is
deterministic and never corrupts your real tasks-app files. The gate it runs (ruff + pytest) is
the real one — the same checks Module 14's CI runs.
2. Real agent — drives your own agentic tool against the actual issue. Point AGENT_CMD at your
tool's non-interactive / one-shot mode, then drop --simulate:
export AGENT_CMD='your-agent-cli --print --prompt-file {prompt_file}'
python agent_runner.py issue-to-pr issue-delete-command.md
Language: Python 3.10+. Standard library only.
"""
from __future__ import annotations
import argparse
import os
import shlex
import subprocess
import sys
import tempfile
from pathlib import Path
RETRY_CAP = 3 # self-healing stops after this many fix attempts and hands off to a human.
# Demo target the simulator works on, so simulation never touches your real cli.py / tasks.py.
DEMO_SRC = Path("agent_demo.py")
DEMO_TEST = Path("test_agent_demo.py")
# Vendor-neutral: where your committed AI config (Module 5) might live. Override with AGENT_CONFIG.
CONFIG_CANDIDATES = ["AGENTS.md", ".agent/instructions.md", "agent-config.md"]
# --------------------------------------------------------------------------------------------------
# The gate — the same lint + test checks Module 14 runs in CI, run locally so they're reproducible.
# This is the structural supervision. It does not care whether a human or an agent wrote the change.
# --------------------------------------------------------------------------------------------------
def run_gate() -> tuple[bool, str]:
"""Run ruff then pytest in the current directory. Return (passed, combined_output)."""
out: list[str] = []
ok = True
for label, cmd in (("ruff (lint)", ["ruff", "check", "."]),
("pytest (tests)", ["pytest", "-q"])):
out.append(f"\n=== gate: {label} -> {' '.join(cmd)} ===")
try:
proc = subprocess.run(cmd, capture_output=True, text=True)
except FileNotFoundError:
out.append(f" ! {cmd[0]} not installed — `pip install pytest ruff`. Treating as a gate FAIL.")
ok = False
continue
out.append(proc.stdout.rstrip())
if proc.stderr.strip():
out.append(proc.stderr.rstrip())
if proc.returncode != 0:
ok = False
out.append(f" -> FAILED ({label})")
return ok, "\n".join(line for line in out if line is not None)
# --------------------------------------------------------------------------------------------------
# The agent — real (your tool) or simulated (deterministic, for the lab).
# --------------------------------------------------------------------------------------------------
def find_config() -> Path | None:
env = os.environ.get("AGENT_CONFIG")
if env and Path(env).exists():
return Path(env)
for name in CONFIG_CANDIDATES:
if Path(name).exists():
return Path(name)
return None
def build_prompt(task: str, *, issue_path: Path | None = None, failure: str | None = None) -> str:
"""Assemble the agent's brief: standing config (Module 5) + the specific task (issue or failure)."""
parts = ["You are working in a Git repository on the current branch. Make the change directly in",
"the files. Do not commit, push, or merge — just edit. Follow the project's conventions."]
config = find_config()
if config:
parts += ["", f"# Project conventions (from {config})", config.read_text()]
if issue_path:
parts += ["", "# Task (issue to implement)", issue_path.read_text()]
if failure:
parts += ["", "# A CI check just failed. Fix the CODE so it passes — do not weaken or delete",
"# the test to make it pass. Here is the failing output:", "```", failure, "```"]
return "\n".join(parts)
def run_real_agent(prompt: str) -> None:
"""Drive the learner's agentic tool via AGENT_CMD. Template may contain {prompt_file}; otherwise
the prompt is piped to stdin. Kept vendor-neutral on purpose."""
template = os.environ["AGENT_CMD"]
with tempfile.NamedTemporaryFile("w", suffix=".md", delete=False) as fh:
fh.write(prompt)
prompt_file = fh.name
try:
if "{prompt_file}" in template:
cmd = shlex.split(template.replace("{prompt_file}", prompt_file))
proc = subprocess.run(cmd)
else:
proc = subprocess.run(shlex.split(template), input=prompt, text=True)
if proc.returncode != 0:
sys.exit(f"agent command exited non-zero ({proc.returncode}); aborting.")
finally:
os.unlink(prompt_file)
# Simulated agent: writes a self-contained demo module so the gate has something real to judge.
def simulate_implement(variant: str) -> None:
DEMO_TEST.write_text(
"from agent_demo import discount\n\n\n"
"def test_discount_takes_a_percentage():\n"
" # 10% off 200 is 180. A flat subtraction (200 - 10 = 190) is the plausible-but-wrong bug.\n"
" assert discount(200, 10) == 180\n"
)
if variant == "good":
DEMO_SRC.write_text("def discount(price, pct):\n return price - price * pct / 100\n")
else: # 'bad' — plausible but wrong: treats the percent as a flat amount.
DEMO_SRC.write_text("def discount(price, pct):\n return price - pct\n")
def simulate_fix(variant: str, attempt: int) -> None:
if variant == "stuck":
# The "agent" keeps producing plausible, still-wrong fixes — the loop must give up, not run forever.
DEMO_SRC.write_text(f"def discount(price, pct):\n return price - pct - {attempt}\n")
else: # 'bad' — converges on the second attempt with the correct formula.
DEMO_SRC.write_text("def discount(price, pct):\n return price - price * pct / 100\n")
# --------------------------------------------------------------------------------------------------
# The endpoint every path shares: a PR PROPOSAL. Never a merge.
# --------------------------------------------------------------------------------------------------
def in_git_repo() -> bool:
return subprocess.run(["git", "rev-parse", "--is-inside-work-tree"],
capture_output=True).returncode == 0
def propose_pr(message: str) -> None:
print("\n" + "=" * 80)
print("GATE PASSED. Proposing a PR — NOT merging. A human reviews the diff (Module 10).")
print("=" * 80)
if in_git_repo():
subprocess.run(["git", "add", "-A"])
subprocess.run(["git", "commit", "-m", message])
branch = subprocess.run(["git", "rev-parse", "--abbrev-ref", "HEAD"],
capture_output=True, text=True).stdout.strip()
print("\nReview the change you're about to propose:")
print(" git show HEAD # or: git diff main..HEAD")
print("\nThen open the PR (nothing has left your machine yet):")
print(f" git push -u origin {branch}")
print(" # ...and open a pull request on your forge. CI + security gates run there.")
else:
print("\n(Not a Git repo — skipping commit. In your tasks-app this would commit to the branch.)")
print("\nThe agent stops here. It cannot merge. That is the whole safety model.")
def reject(reason: str, gate_output: str) -> None:
print(gate_output)
print("\n" + "=" * 80)
print(f"GATE FAILED: {reason}")
print("No PR proposed. The branch is left as-is for you to inspect or discard:")
print(" git restore . # throw the agent's change away (Module 2)")
print("=" * 80)
# --------------------------------------------------------------------------------------------------
# The two patterns.
# --------------------------------------------------------------------------------------------------
def cmd_issue_to_pr(issue_path: Path, simulate: str | None) -> int:
print(f"[issue-to-pr] brief: {issue_path}")
if simulate:
print(f"[issue-to-pr] simulating a '{simulate}' agent on the self-contained demo target.")
simulate_implement(simulate)
else:
run_real_agent(build_prompt("implement", issue_path=issue_path))
ok, gate_output = run_gate()
if ok:
print(gate_output)
propose_pr(f"Agent: implement {issue_path.stem}")
return 0
reject("the agent's change does not pass the gate", gate_output)
return 1
def cmd_self_heal(simulate: str | None) -> int:
# Establish a failing state to heal. In a real pipeline this is "CI just went red on a push".
if simulate:
print(f"[self-heal] simulating a red build ('{simulate}') on the demo target.")
simulate_implement("bad")
else:
print("[self-heal] running the gate on the current working tree to find the failure...")
for attempt in range(1, RETRY_CAP + 1):
ok, gate_output = run_gate()
if ok:
print(gate_output)
print(f"\n[self-heal] gate is green after {attempt - 1} fix attempt(s).")
propose_pr("Agent: self-healing fix for failing CI")
return 0
print(gate_output)
if attempt > RETRY_CAP - 1:
break
print(f"\n[self-heal] gate red — attempt {attempt}/{RETRY_CAP - 1}: asking the agent for a fix.")
if simulate:
simulate_fix(simulate, attempt)
else:
run_real_agent(build_prompt("fix", failure=gate_output))
print("\n" + "=" * 80)
print(f"SELF-HEAL GAVE UP after {RETRY_CAP - 1} attempts. Handing off to a human — NOT looping forever.")
print("This cap is what stops an agent burning a runner bill chasing a flaky or impossible fix.")
print("=" * 80)
return 2
def main(argv: list[str]) -> int:
parser = argparse.ArgumentParser(description="Autonomous-but-supervised agent orchestrator (Module 25).")
sub = parser.add_subparsers(dest="command", required=True)
p_itp = sub.add_parser("issue-to-pr", help="implement an issue and propose a PR")
p_itp.add_argument("issue", type=Path, help="path to the issue markdown file")
p_itp.add_argument("--simulate", choices=["good", "bad"], help="run without a real agent")
p_sh = sub.add_parser("self-heal", help="fix a failing gate, bounded by a retry cap, and propose a PR")
p_sh.add_argument("--simulate", choices=["bad", "stuck"], help="run without a real agent")
args = parser.parse_args(argv)
if not args.simulate and "AGENT_CMD" not in os.environ:
sys.exit("No --simulate and no AGENT_CMD set. Set AGENT_CMD to your agent's headless command, "
"or pass --simulate to run the deterministic demo.")
if args.command == "issue-to-pr":
return cmd_issue_to_pr(args.issue, args.simulate)
return cmd_self_heal(args.simulate)
if __name__ == "__main__":
raise SystemExit(main(sys.argv[1:]))
@@ -0,0 +1,35 @@
<!--
The agent's INPUT for Module 25. This is a well-formed issue in the Module 9 format: title,
context, acceptance criteria, scope. It is deliberately a good candidate for an agent — well-
scoped, concrete, and it mirrors a pattern already in the codebase (the existing `done` command).
The orchestrator (agent_runner.py) reads this file and pairs it with your committed AI config
(Module 5) to build the agent's brief. Edit it and you change what the agent attempts.
-->
# Add a `delete <index>` command to the CLI
**Type:** feature · **Priority:** p2 · **Labels:** `cli`, `ready`, `agent`
## Context
`tasks-app` can `add`, `list`, and mark a task `done`, but there's no way to remove a task. Once a
task is added by mistake it stays forever. The `done` command already takes an index and mutates the
list through a method on `TaskList`, so a `delete` command should follow the exact same shape — this
is a patterned change, not a design problem.
## Acceptance criteria
- `python cli.py delete <index>` removes the task at that 0-based index and saves the list.
- After deleting, the remaining tasks keep their relative order.
- `delete` with an out-of-range or non-integer index prints a clear error (e.g.
`no task at index 99`) and exits non-zero, instead of dumping a traceback.
- The logic lives on `TaskList` (a `remove(index)` method or equivalent), mirroring how `complete`
works — `cli.py` only parses arguments and calls it.
- A test covers: a successful delete removes the right task, and an out-of-range delete is handled.
## Out of scope
- Changing how tasks are stored or numbered.
- Bulk delete, undo, or a confirmation prompt.
- Reworking the existing `add` / `list` / `done` commands.