Fix Module 27 Part D CI snippet path (won't resolve from repo root) and the frozen always-100% gate fixture #28
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
Two issues in the capstone-evals Part D CI snippet:
modules/27-evals/lab/run_eval.py) but the candidate arg is lab-relative (candidates/current_model). A CI job runs from the repo root, wherecandidates/doesn't exist, so the gate the module calls "structural, not a promise" crashes with a false failure ("no tasks.py in candidates/current_model", exit 1).current_modelcandidate, whosetasks.pyis the always-correct baseline that scores 100% on every run forever — guarding nothing, contradicting the section's own "an eval nobody must act on is a dashboard, not a guardrail."Evidence
modules/27-evals/README.mdPart D (~line 294): "run: python modules/27-evals/lab/run_eval.py candidates/current_model --threshold 1.0". From repo root → "no tasks.py in candidates/current_model", exit 1.current_modelis the always-passing baseline (Part A); itstasks.pyis commented "It's correct".Why it matters
The closing module's flagship "structural, not a promise" example crashes when copy-pasted, and even fixed it gates on something that can never fail — self-undermining the lesson.
Proposed change
python3 modules/27-evals/lab/run_eval.py modules/27-evals/lab/candidates/current_model --threshold 1.0(verified to run and exit 0;from eval_set import CASESstill resolves via the script dir onsys.path[0]). Alternatively addworking-directory: modules/27-evals/laband keep the original relative command.current_modelis an illustrative stand-in and a real gate should target the varying output.Acceptance criteria
Affected files
modules/27-evals/README.mdReferences
Source finding F42 (realVotes 3/3).
Filed from an adversarial multi-agent course review (217 raw findings → 54 adversarially-verified survivors). Scoped for manual review; intentionally not auto-assigned to an agent.