Files
ai-workflow-course/modules/10-reviewing-code-you-didnt-write/lab/ai-diff-review-checklist.md
T
claude f925fd9645 fix(M7-27+capstone): apply AI-drives-git reframe, lesson=theory, de-slop course-wide
Phase 2 sweep — all modules are post-pivot, so the learner directs the AI agent
(Claude Code as the worked example) to do the git/setup work and verifies, instead
of typing commands by hand; no re-teaching basics. Lesson sections are theory with
example output; all execution lives in the labs. De-slopped ("prose" etc. gone
course-wide, em-dash density thinned). /path/to placeholders -> ~/ai-workflow-course.

Every deliberate teaching device verified intact: M10 ai-change.patch trap,
M12 bad-clear-snippet, M13/M27 planted pending_count bug, M15 secret+typosquat+MD5,
M18 BREAK=1, M21 absent-.gitignore, M22 poisoned skill, M24 no-op patch, M25 --simulate.
Labs compile/parse (py/sh/yaml/json); no junk.

Closes #83
Closes #86
Closes #89

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TfzV5QvtPDz8LJS3Pu5VLT
2026-06-22 21:58:17 -04:00

2.6 KiB

Reviewing an AI-generated diff — working checklist

Keep this open while you read a diff the AI produced. The point is not to re-read the whole file; it's to interrogate the change against the prompt you gave. Work top to bottom.

0. Frame the review

  • What did I actually ask for? Write the request in one sentence. Every changed line should trace back to it.
  • Read the diff, not the summary. Ignore the AI's account of what it did; the diff is the only ground truth. (git diff main..<branch>)

1. Scope — did it change only what was asked?

  • Every hunk maps to the request. Anything outside it is scope creep until proven otherwise.
  • No unrelated files touched (formatting churn, import reshuffles, version bumps).
  • No "while I was here" refactors of code the request never mentioned.

2. Deletions — what did it take away?

  • Read every - line. Deletions are higher-risk than additions and skim right past you.
  • Edge-case handling still there? Bounds checks, None/empty guards, try/except, validation, error returns — confirm none were dropped or weakened.
  • An error that used to be raised/logged isn't now silently swallowed (except: pass).

3. Plausibility — does it only look right?

  • Invented APIs. Every function, method, kwarg, attribute, import, env var, CLI flag, config key, and endpoint actually exists. Confidence is not evidence — verify the unfamiliar ones against real docs/source.
  • Invented behavior. It isn't relying on a flag/option that doesn't do what the name suggests (e.g. assuming list.pop takes a default like dict.pop).
  • Off-by-one / boundary logic. Indexing, ranges, slicing, loop bounds, 0- vs 1-based.
  • Inverted or weakened conditions. if not x vs if x, < vs <=, and vs or, a filter quietly dropped from a comprehension.

4. Behavior change — would the happy path hide it?

  • Does any existing command/function behave differently now? Trace one real call through.
  • Run the failure case, not the success case. The trap usually survives the happy path. Feed it bad input, an empty list, a missing file, a duplicate.
  • Return values / exit codes unchanged where callers depend on them.

5. Decide

  • I can explain, in my own words, what every hunk does and why it's correct.
  • If I can't, I request changes — the burden of proof is on the diff, not on me.

Rule of thumb: a diff is guilty until proven correct. "It runs" is the weakest possible evidence; "I read every - line and ran the failure case" is the bar.