Files
ai-workflow-course/modules/27-evals/lab/candidates/swapped_model/tasks.py
T
claude 389ac2e460 style(no-slop): remove every em-dash + banned words across all modules + capstone
Apply the no-ai-slop standard (now binding in AGENTS.md): the em-dash character is
banned outright (restructured, not blind-replaced), plus the banned word/phrase
list (delve, leverage, robust, seamless, truly, unlock, etc.). 0 em-dashes remain
in modules + capstone; the only "robust" left is the planted M10 ai-change.patch
trap. Module H1 titles use a colon separator.

All deliberate teaching devices preserved; labs compile/parse (py/sh/yaml/json);
no junk. AGENTS.md updated with the hard no-slop rules.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TfzV5QvtPDz8LJS3Pu5VLT
2026-06-22 23:21:09 -04:00

51 lines
1.6 KiB
Python

"""Candidate output: a SWAPPED model/prompt.
Same task, different model (or a tweaked prompt). This output "looks right" and
passes a casual manual check; adding three tasks and calling count returns 3.
But pending_count() returns the total number of tasks, not the number of
*pending* ones, so it's wrong the moment anything is marked done.
Nobody would notice this by skimming. The eval set notices it instantly. That's
the regression eval catching an unsafe swap, exactly the scenario this module
exists for. Replace this with your own swapped-model output when you run it for
real; you may get lucky and have it pass, or you may catch a regression like
this one.
"""
from dataclasses import dataclass, field
@dataclass
class Task:
title: str
done: bool = False
@dataclass
class TaskList:
tasks: list[Task] = field(default_factory=list)
def add(self, title: str) -> Task:
task = Task(title=title)
self.tasks.append(task)
return task
def complete(self, index: int) -> None:
self.tasks[index].done = True
def pending(self) -> list[Task]:
return [t for t in self.tasks if not t.done]
def pending_count(self) -> int:
# WRONG, but plausibly so: counts every task, not just pending ones.
return len(self.tasks)
def render(self) -> str:
if not self.tasks:
return "(no tasks yet)"
lines = []
for i, task in enumerate(self.tasks):
box = "[x]" if task.done else "[ ]"
lines.append(f"{i}. {box} {task.title}")
return "\n".join(lines)