Fix Module 27 Part D CI snippet path (won't resolve from repo root) and the frozen always-100% gate fixture #28

New Issue

2026-06-22T14:23:51-04:00

claude commented

2026-06-22 14:23:51 -04:00

Problem

Two issues in the capstone-evals Part D CI snippet:

Path base mismatch. The script path is repo-root-relative (modules/27-evals/lab/run_eval.py) but the candidate arg is lab-relative (candidates/current_model). A CI job runs from the repo root, where candidates/ doesn't exist, so the gate the module calls "structural, not a promise" crashes with a false failure ("no tasks.py in candidates/current_model", exit 1).
Frozen fixture. Even once the path is fixed, it gates on the bundled current_model candidate, whose tasks.py is the always-correct baseline that scores 100% on every run forever — guarding nothing, contradicting the section's own "an eval nobody must act on is a dashboard, not a guardrail."

Evidence

modules/27-evals/README.md Part D (~line 294): "run: python modules/27-evals/lab/run_eval.py candidates/current_model --threshold 1.0". From repo root → "no tasks.py in candidates/current_model", exit 1. current_model is the always-passing baseline (Part A); its tasks.py is commented "It's correct".

Why it matters

The closing module's flagship "structural, not a promise" example crashes when copy-pasted, and even fixed it gates on something that can never fail — self-undermining the lesson.

Proposed change

Make both paths repo-root-relative: python3 modules/27-evals/lab/run_eval.py modules/27-evals/lab/candidates/current_model --threshold 1.0 (verified to run and exit 0; from eval_set import CASES still resolves via the script dir on sys.path[0]). Alternatively add working-directory: modules/27-evals/lab and keep the original relative command.
Point the gate at the candidate that actually varies (the agent's/repo's real output), OR — for a generic course snippet — add a note that the bundled current_model is an illustrative stand-in and a real gate should target the varying output.

Acceptance criteria

The Part D snippet runs and exits 0 when executed exactly as written (from repo root or with the stated working-directory).
The snippet or its prose makes clear the gate must target a varying candidate, not the frozen 100% fixture.

Affected files

modules/27-evals/README.md

References

Source finding F42 (realVotes 3/3).

Filed from an adversarial multi-agent course review (217 raw findings → 54 adversarially-verified survivors). Scoped for manual review; intentionally not auto-assigned to an agent.

## Problem Two issues in the capstone-evals Part D CI snippet: 1. **Path base mismatch.** The script path is repo-root-relative (`modules/27-evals/lab/run_eval.py`) but the candidate arg is lab-relative (`candidates/current_model`). A CI job runs from the repo root, where `candidates/` doesn't exist, so the gate the module calls "structural, not a promise" crashes with a false failure ("no tasks.py in candidates/current_model", exit 1). 2. **Frozen fixture.** Even once the path is fixed, it gates on the bundled `current_model` candidate, whose `tasks.py` is the always-correct baseline that scores 100% on every run forever — guarding nothing, contradicting the section's own "an eval nobody must act on is a dashboard, not a guardrail." ## Evidence `modules/27-evals/README.md` Part D (~line 294): "run: python modules/27-evals/lab/run_eval.py candidates/current_model --threshold 1.0". From repo root → "no tasks.py in candidates/current_model", exit 1. `current_model` is the always-passing baseline (Part A); its `tasks.py` is commented "It's correct". ## Why it matters The closing module's flagship "structural, not a promise" example crashes when copy-pasted, and even fixed it gates on something that can never fail — self-undermining the lesson. ## Proposed change 1. Make both paths repo-root-relative: `python3 modules/27-evals/lab/run_eval.py modules/27-evals/lab/candidates/current_model --threshold 1.0` (verified to run and exit 0; `from eval_set import CASES` still resolves via the script dir on `sys.path[0]`). Alternatively add `working-directory: modules/27-evals/lab` and keep the original relative command. 2. Point the gate at the candidate that actually varies (the agent's/repo's real output), OR — for a generic course snippet — add a note that the bundled `current_model` is an illustrative stand-in and a real gate should target the varying output. ## Acceptance criteria - [ ] The Part D snippet runs and exits 0 when executed exactly as written (from repo root or with the stated working-directory). - [ ] The snippet or its prose makes clear the gate must target a varying candidate, not the frozen 100% fixture. ## Affected files - `modules/27-evals/README.md` ## References Source finding F42 (realVotes 3/3). --- *Filed from an adversarial multi-agent course review (217 raw findings → 54 adversarially-verified survivors). Scoped for manual review; intentionally not auto-assigned to an agent.*

claude added the ai-ready bug P1 labels 2026-06-22 14:23:51 -04:00

claude referenced a pull request that will close this issue

2026-06-22 16:07:47 -04:00

Testing/CI/tooling consistency (#9,#20,#21,#22,#23,#28) #59

claude referenced this issue from a commit

2026-06-22 16:07:48 -04:00

fix(testing/ci/tooling): consistent unittest, venv guidance, runnable lab commands

claude closed this issue

2026-06-22 16:07:58 -04:00

Sign in to join this conversation.