De-slop: remove every em-dash + banned words across all modules + capstone (#94)
Sync course wiki / sync-wiki (push) Successful in 4s
Sync course wiki / sync-wiki (push) Successful in 4s
Co-authored-by: claude <claude@jpaul.io> Co-committed-by: claude <claude@jpaul.io>
This commit was merged in pull request #94.
This commit is contained in:
@@ -1,29 +1,29 @@
|
||||
# Module 23 — Working with Existing Codebases
|
||||
# Module 23: Working with Existing Codebases
|
||||
|
||||
> **Every module so far quietly assumed you started the project. Most of your real work won't be
|
||||
> like that.** This module is about pointing AI at a large codebase you *didn't* write — and making
|
||||
> like that.** This module is about pointing AI at a large codebase you *didn't* write, and making
|
||||
> changes that don't break a system nobody fully understands.
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
This module needs only the **Module 4** tooling to *attempt* — an agentic, editor-integrated AI that
|
||||
This module needs only the **Module 4** tooling to *attempt*: an agentic, editor-integrated AI that
|
||||
can read and edit your files. But it's placed at the back on purpose, because the basics are exactly
|
||||
what make changing unfamiliar code survivable. Lean on:
|
||||
|
||||
- **Module 2 — Version control as a safety net.** You're about to let an AI touch code you don't
|
||||
- **Module 2: Version control as a safety net.** You're about to let an AI touch code you don't
|
||||
understand. The commit you can return to is the only reason that's not reckless.
|
||||
- **Module 6 — Branches.** Every change here happens on a branch, isolated from working code.
|
||||
- **Module 10 — Reviewing code you didn't write.** The core skill of this whole course, now aimed at
|
||||
- **Module 6: Branches.** Every change here happens on a branch, isolated from working code.
|
||||
- **Module 10: Reviewing code you didn't write.** The core skill of this whole course, now aimed at
|
||||
a diff in a codebase you *also* didn't write. Double the unfamiliarity, double the discipline.
|
||||
- **Module 12 — Revert, reset, and recovery.** When a change in a system you don't understand goes
|
||||
- **Module 12: Revert, reset, and recovery.** When a change in a system you don't understand goes
|
||||
wrong, recovery is how you get out clean.
|
||||
- **Module 13 — Testing.** The existing test suite is your contract for "did I break anything I
|
||||
- **Module 13: Testing.** The existing test suite is your contract for "did I break anything I
|
||||
can't see?"
|
||||
- **Module 20 — MCP servers.** Real, structured access to the code and the tools around it, instead
|
||||
- **Module 20: MCP servers.** Real, structured access to the code and the tools around it, instead
|
||||
of pasting fragments.
|
||||
- **Module 21 — Skills.** Where you codify the navigation and safe-change playbooks this module
|
||||
- **Module 21: Skills.** Where you codify the navigation and safe-change playbooks this module
|
||||
teaches, so you don't re-explain them every session.
|
||||
|
||||
---
|
||||
@@ -34,13 +34,13 @@ By the end of this module you can:
|
||||
|
||||
1. Give an AI enough **factual, verifiable context** about a large repo to be useful in it, instead
|
||||
of letting it work from a few pasted fragments.
|
||||
2. Have the AI **map and explain** an unfamiliar area — architecture, entry points, where things
|
||||
live — and verify that map against the actual files *before* anything is touched.
|
||||
2. Have the AI **map and explain** an unfamiliar area (architecture, entry points, where things
|
||||
live) and verify that map against the actual files *before* anything is touched.
|
||||
3. Scope a change down to the **smallest reviewable diff** that solves the problem, and refuse the
|
||||
sweeping rewrite the AI will happily offer.
|
||||
4. Use **MCP (Module 20)** to give the AI real access to the code and surrounding tools, and
|
||||
**skills (Module 21)** to make your navigation and safe-change process repeatable.
|
||||
5. Make one **small, scoped, tested, reviewable** change to a codebase you didn't write — and know
|
||||
5. Make one **small, scoped, tested, reviewable** change to a codebase you didn't write, and know
|
||||
why it's safe.
|
||||
|
||||
---
|
||||
@@ -75,21 +75,21 @@ real files, and force every change to stay small and reviewable.**
|
||||
|
||||
Three phases, strictly in order. Skipping ahead is the mistake.
|
||||
|
||||
**1. Orient — establish ground truth before any opinion.** Before the AI gets to reason about the
|
||||
**1. Orient: establish ground truth before any opinion.** Before the AI gets to reason about the
|
||||
codebase, give it facts it can't hallucinate: the actual file list, the real entry points, the
|
||||
languages by volume, the build and test commands, the biggest files (often the spine of the system),
|
||||
the recent commit history. This is mechanical and cheap — a script produces it (the lab's `orient.py`
|
||||
the recent commit history. This is mechanical and cheap; a script produces it (the lab's `orient.py`
|
||||
does exactly this). It anchors everything that follows in reality. You're not asking the AI "what is
|
||||
this project?" cold; you're handing it the facts and asking it to *interpret* them.
|
||||
|
||||
**2. Map — explain the area before touching it.** Now the AI builds a mental model, and the only
|
||||
**2. Map: explain the area before touching it.** Now the AI builds a mental model, and the only
|
||||
acceptable model is one **traced through real files with citations.** Don't accept "the request
|
||||
flows through the controller layer." Demand: "trace one request from entry point to response, naming
|
||||
each file it passes through." The deliverable is an architecture summary plus a "where things live"
|
||||
table — and crucially, a list of **open questions the code didn't answer.** A map with honest gaps is
|
||||
table, and crucially a list of **open questions the code didn't answer.** A map with honest gaps is
|
||||
trustworthy. A map with no gaps is fiction. This phase is **read-only**; nothing changes on disk.
|
||||
|
||||
**3. Change — the smallest scoped, tested, reviewable diff.** Only now do you edit. One change, one
|
||||
**3. Change: the smallest scoped, tested, reviewable diff.** Only now do you edit. One change, one
|
||||
branch (Module 6). Find the blast radius first, every caller of what you're touching, and if you
|
||||
can't enumerate them, you're not ready. Make the minimal edit, add a test that fails without it,
|
||||
run the *full* existing suite, and self-review the diff like it's someone else's PR (Module 10). No
|
||||
@@ -114,12 +114,12 @@ between pastes. **MCP (Module 20) gives the AI real, structured access to the co
|
||||
around it** so it can navigate on its own instead of waiting for you to feed it fragments. The kinds
|
||||
of access that turn a guessing model into a grounded one:
|
||||
|
||||
- **The filesystem and code search** — so it can grep for every caller of a function instead of
|
||||
- **The filesystem and code search**, so it can grep for every caller of a function instead of
|
||||
assuming it found them all.
|
||||
- **Language-server intelligence** (go-to-definition, find-references, type info) so "where is this
|
||||
used?" is answered by the toolchain, not by the model's guess.
|
||||
- **The surrounding systems** — the issue tracker (Module 9), CI results (Module 14), the running
|
||||
app's logs — so the AI maps the code *and* the context it lives in.
|
||||
- **The surrounding systems**: the issue tracker (Module 9), CI results (Module 14), the running
|
||||
app's logs, so the AI maps the code *and* the context it lives in.
|
||||
|
||||
The orientation pack is the cold-start. MCP is how the AI keeps the map accurate as it digs, by
|
||||
pulling real answers from real tools instead of inferring them.
|
||||
@@ -127,13 +127,13 @@ pulling real answers from real tools instead of inferring them.
|
||||
### Where skills earn their place (Module 21)
|
||||
|
||||
The orient/map/change motion is the same on every repo. That makes it a perfect candidate for a
|
||||
**skill (Module 21)** — a committed, reusable playbook so you don't re-explain "map before you touch,
|
||||
**skill (Module 21)**: a committed, reusable playbook so you don't re-explain "map before you touch,
|
||||
cite real files, keep the diff small" every single session. This module ships two starter skills in
|
||||
`lab/skills/`:
|
||||
|
||||
- **`map-this-repo`** — the read-only navigation playbook: orient, find entry points, trace one path
|
||||
- **`map-this-repo`**: the read-only navigation playbook: orient, find entry points, trace one path
|
||||
end to end, produce a cited architecture summary with honest open questions.
|
||||
- **`safe-change`** — the safe-change playbook: branch first, find the blast radius, baseline the
|
||||
- **`safe-change`**: the safe-change playbook: branch first, find the blast radius, baseline the
|
||||
tests, make the minimal edit, cover it, self-review, and a set of **stop conditions** that tell the
|
||||
AI to escalate to a human instead of pushing on.
|
||||
|
||||
@@ -163,7 +163,7 @@ into a revertable diff.
|
||||
## Hands-on lab
|
||||
|
||||
**Lab language:** shell + the provided Python script (`orient.py`); you run it, you don't write it.
|
||||
This lab does **not** use `tasks-app` — the entire point is a codebase you *didn't* write.
|
||||
This lab does **not** use `tasks-app`; the entire point is a codebase you *didn't* write.
|
||||
|
||||
**You'll need:**
|
||||
|
||||
@@ -172,14 +172,14 @@ This lab does **not** use `tasks-app` — the entire point is a codebase you *di
|
||||
- A real, small-to-medium open-source repo to clone. Pick something with **tests** and a clear
|
||||
build/test command, in a language you can at least read. Good traits: a few thousand lines, an
|
||||
obvious entry point, a documented install (`pip install -e .`, `npm install`, `go mod download`,
|
||||
…), and a test suite that **goes green on a clean clone after that documented install** — confirm
|
||||
that before you rely on it as a baseline. (Avoid giant frameworks for a first run — you want a
|
||||
…), and a test suite that **goes green on a clean clone after that documented install**. Confirm
|
||||
that before you rely on it as a baseline. (Avoid giant frameworks for a first run; you want a
|
||||
system you can't fully hold in your head, but whose test suite finishes in under a minute.)
|
||||
**First time? Pick a small Python repo**, so the Module 13 testing toolchain you already have
|
||||
transfers with the least friction.
|
||||
- The starter files from this module's `lab/` folder: `orient.py` and `skills/`.
|
||||
|
||||
### Part A — Clone and orient
|
||||
### Part A: Clone and orient
|
||||
|
||||
1. Clone your chosen repo and copy `orient.py` into its root:
|
||||
|
||||
@@ -191,23 +191,23 @@ This lab does **not** use `tasks-app` — the entire point is a codebase you *di
|
||||
```
|
||||
|
||||
2. Read `ORIENT.md` yourself first. In 30 seconds you should know the language, the likely entry
|
||||
point, the probable test command, and which files are biggest. These are **facts** — the AI can't
|
||||
point, the probable test command, and which files are biggest. These are **facts**; the AI can't
|
||||
argue with them. (Don't commit `ORIENT.md`; it's scratch context.)
|
||||
|
||||
### Part B — Map before you touch (read-only)
|
||||
### Part B: Map before you touch (read-only)
|
||||
|
||||
3. Start a fresh AI session, load the `map-this-repo` skill (`lab/skills/map-this-repo.md`) or paste
|
||||
it as instructions, and give it `ORIENT.md` as the opening context.
|
||||
|
||||
4. Ask it to produce the architecture summary: what the project does, a "where things live" table,
|
||||
the confirmed build/test command, and a traced path for one real operation end to end —
|
||||
the confirmed build/test command, and a traced path for one real operation end to end,
|
||||
**with every claim citing a real file.** Demand the list of open questions it couldn't resolve.
|
||||
|
||||
5. **Verify the map.** Open two or three files it cited and confirm they say what it claimed. This is
|
||||
the step everyone wants to skip and the one that catches the confident-but-wrong map. If a
|
||||
citation doesn't hold up, the map is suspect — push back and make it re-trace.
|
||||
citation doesn't hold up, the map is suspect; push back and make it re-trace.
|
||||
|
||||
### Part C — One small, scoped, tested change
|
||||
### Part C: One small, scoped, tested change
|
||||
|
||||
6. Pick a genuinely small change: a clearer error message, a fixed edge case, a tiny missing
|
||||
validation, a documented-but-unhandled input. Something a single function owns. Now load the
|
||||
@@ -256,10 +256,10 @@ This lab does **not** use `tasks-app` — the entire point is a codebase you *di
|
||||
architecture summary for a repo it half-read. Fluency is not correctness. The citation-checking in
|
||||
Part B isn't optional ceremony; it's the only thing standing between you and changing code based on
|
||||
a fiction. Verify at least a few claims by hand, every time.
|
||||
- **The context window is a hard ceiling.** On a truly large monorepo, the AI cannot see everything,
|
||||
- **The context window is a hard ceiling.** On a genuinely large monorepo, the AI cannot see everything,
|
||||
and it usually won't *tell* you what it didn't read. Its map is only as good as the slice it
|
||||
actually loaded. MCP-backed search and language-server tools (Module 20) shrink this problem by
|
||||
letting it fetch on demand, but they don't erase it — treat "I've reviewed the whole codebase" as
|
||||
letting it fetch on demand, but they don't erase it; treat "I've reviewed the whole codebase" as
|
||||
a claim to distrust.
|
||||
- **"Small change" can hide a big blast radius.** A one-line edit to a heavily-called function can
|
||||
ripple through code you never opened. The blast-radius search in the `safe-change` skill is the
|
||||
@@ -273,7 +273,7 @@ This lab does **not** use `tasks-app` — the entire point is a codebase you *di
|
||||
"match local conventions" rule help, but you'll still catch drift in review.
|
||||
- **Some changes shouldn't be a small diff.** A genuine architectural problem won't be fixed by the
|
||||
smallest-possible edit, and forcing it to be makes things worse. This module's discipline is for
|
||||
the common case — a scoped change in a system you don't own. Recognizing when a change is actually
|
||||
the common case: a scoped change in a system you don't own. Recognizing when a change is actually
|
||||
a *project* (and escalating it as one) is its own judgment call the tooling won't make for you.
|
||||
|
||||
---
|
||||
@@ -283,7 +283,7 @@ This lab does **not** use `tasks-app` — the entire point is a codebase you *di
|
||||
**You're done when:**
|
||||
|
||||
- You can hand an AI a factual orientation pack and get back an architecture summary whose citations
|
||||
you've **personally verified** against the real files — including the open questions it couldn't
|
||||
you've **personally verified** against the real files, including the open questions it couldn't
|
||||
resolve.
|
||||
- You've made one change to a codebase you didn't write that is on its own branch, covered by a test
|
||||
that fails without it, passing the full existing suite, and whose `git diff` is *exactly* the
|
||||
@@ -305,11 +305,11 @@ This is an expansion-zone module; the durable motion is stable, but the tooling
|
||||
- [ ] Confirm `orient.py` runs unchanged on current Python (3.10+) and a freshly cloned repo on
|
||||
macOS, Linux, and Windows (git-bash / PowerShell).
|
||||
- [ ] Re-check the MCP capabilities cited (filesystem, code search, language-server intelligence,
|
||||
issue/CI/log access) against what's actually common in the current MCP ecosystem — the menu of
|
||||
issue/CI/log access) against what's actually common in the current MCP ecosystem; the menu of
|
||||
available servers changes fast. Keep it described as capabilities, not specific products.
|
||||
- [ ] Verify the cross-references still point to the right modules if any renumbering happened
|
||||
(4, 6, 9, 10, 12, 13, 20, 21).
|
||||
- [ ] Re-confirm the `SIGNALS`/`TEST_HINTS` tables in `orient.py` still reflect common manifests and
|
||||
test runners; add any that have become standard, but keep it language-agnostic.
|
||||
- [ ] Sanity-check the suggested "small-to-medium repo with a fast test suite" lab guidance still
|
||||
lands — recommend nothing by name that could rot.
|
||||
lands; recommend nothing by name that could rot.
|
||||
|
||||
@@ -1,9 +1,9 @@
|
||||
#!/usr/bin/env python3
|
||||
"""orient.py — build a factual orientation pack for a repo you didn't write.
|
||||
"""orient.py: build a factual orientation pack for a repo you didn't write.
|
||||
|
||||
Run it from the root of a cloned repo. It prints a Markdown summary of *ground truth*
|
||||
about the codebase — size, languages, project signals, the biggest (often most central)
|
||||
files, the top-level layout, and likely build/test commands — that you can paste in as the
|
||||
about the codebase (size, languages, project signals, the biggest (often most central)
|
||||
files, the top-level layout, and likely build/test commands) that you can paste in as the
|
||||
opening context for an AI session before asking it to map or change anything.
|
||||
|
||||
The point is NOT to replace the AI's own exploration. It's to anchor that exploration in
|
||||
@@ -46,10 +46,10 @@ SIGNALS: dict[str, str] = {
|
||||
".gitea": "Gitea Actions",
|
||||
".gitlab-ci.yml": "GitLab CI",
|
||||
"tox.ini": "Python test matrix",
|
||||
"README.md": "Has a README — read it first",
|
||||
"CONTRIBUTING.md": "Has contributor guidance — read before changing",
|
||||
"ARCHITECTURE.md": "Has an architecture doc — rare and valuable",
|
||||
# Committed AI-instruction files. Name the real ones across vendors — singling out one
|
||||
"README.md": "Has a README; read it first",
|
||||
"CONTRIBUTING.md": "Has contributor guidance; read before changing",
|
||||
"ARCHITECTURE.md": "Has an architecture doc; rare and valuable",
|
||||
# Committed AI-instruction files. Name the real ones across vendors; singling out one
|
||||
# would both miss files and cut against the vendor-neutral point (Module 5).
|
||||
"AGENTS.md": "Has a committed AI instructions file (Module 5)",
|
||||
"CLAUDE.md": "Has a committed AI instructions file (Module 5)",
|
||||
@@ -142,9 +142,9 @@ def main() -> int:
|
||||
if present:
|
||||
for name in SIGNALS:
|
||||
if name in present:
|
||||
w(f"- `{name}` — {SIGNALS[name]}")
|
||||
w(f"- `{name}`: {SIGNALS[name]}")
|
||||
else:
|
||||
w("- (none of the usual manifests/CI/docs at the root — look one level down)")
|
||||
w("- (none of the usual manifests/CI/docs at the root; look one level down)")
|
||||
|
||||
# --- likely test command ------------------------------------------------
|
||||
hints = [TEST_HINTS[name] for name in TEST_HINTS if name in present]
|
||||
@@ -175,7 +175,7 @@ def main() -> int:
|
||||
w("\n## Top-level layout (entries by tracked-file count)\n")
|
||||
for name, n in sorted(top_dirs.items(), key=lambda kv: (-kv[1], kv[0])):
|
||||
kind = "dir" if "/" in next(p for p in files if p.split("/", 1)[0] == name) else "file"
|
||||
w(f"- `{name}`{'/' if kind == 'dir' else ''} — {n}")
|
||||
w(f"- `{name}`{'/' if kind == 'dir' else ''}: {n}")
|
||||
|
||||
# --- recent activity ----------------------------------------------------
|
||||
recent = git("log", "--oneline", "-10")
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
A navigation playbook (a Module 21 skill) for orienting in a codebase you didn't write.
|
||||
Point Claude Code (or sub your own agent) at this file as a skill, or paste it in as instructions. The goal is a
|
||||
**read-only** mental model — no edits happen here.
|
||||
**read-only** mental model; no edits happen here.
|
||||
|
||||
## When to use
|
||||
At the start of any session on an unfamiliar repo, before any change is discussed.
|
||||
@@ -19,7 +19,7 @@ At the start of any session on an unfamiliar repo, before any change is discusse
|
||||
`ARCHITECTURE`, or committed AI-instructions file. Treat these as claims to verify, not truth.
|
||||
2. Identify the **entry points**: how does this thing start? (CLI `main`, web server, library
|
||||
exports.) Name the exact file(s).
|
||||
3. Trace **one representative request/command end to end** — from entry point to where it does its
|
||||
3. Trace **one representative request/command end to end**, from entry point to where it does its
|
||||
real work and back. List the files it passes through, in order.
|
||||
4. Produce an **architecture summary** (max ~1 page):
|
||||
- One paragraph: what this project does and how it's structured.
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
A safe-change playbook (a Module 21 skill) for modifying a codebase you don't fully understand.
|
||||
Use it only **after** `map-this-repo` has produced an architecture summary. The whole bet of this
|
||||
skill is: small, scoped, tested, reviewable — never a sweeping rewrite.
|
||||
skill is: small, scoped, tested, reviewable, never a sweeping rewrite.
|
||||
|
||||
## When to use
|
||||
When making a concrete change to an unfamiliar repo.
|
||||
@@ -10,10 +10,10 @@ When making a concrete change to an unfamiliar repo.
|
||||
## Rules
|
||||
- **One change, one branch.** Create a branch first (Module 6). Never work on the default branch.
|
||||
- **Smallest diff that solves it.** Touch the fewest files possible. If the change wants to sprawl,
|
||||
stop and re-scope — sprawl in code you don't understand is how you break things invisibly.
|
||||
stop and re-scope; sprawl in code you don't understand is how you break things invisibly.
|
||||
- **No drive-by edits.** Do not reformat, rename, or "clean up" unrelated code. Those bury the real
|
||||
change and make the diff unreviewable (Module 10).
|
||||
- **Match local conventions.** Mirror the surrounding code's style, naming, and patterns — not your
|
||||
- **Match local conventions.** Mirror the surrounding code's style, naming, and patterns, not your
|
||||
own defaults.
|
||||
- **Tests are the contract.** A change isn't done until it's covered (Module 13) and the existing
|
||||
suite still passes.
|
||||
@@ -22,12 +22,12 @@ When making a concrete change to an unfamiliar repo.
|
||||
1. **State the change in one sentence** and the acceptance criterion ("done when X").
|
||||
2. **Find the blast radius first:** search for every caller/usage of what you're about to touch.
|
||||
List them. If you can't enumerate them, you're not ready to change it.
|
||||
3. **Install the project's dependencies, then run the existing tests before touching anything** —
|
||||
3. **Install the project's dependencies, then run the existing tests before touching anything**;
|
||||
establish a green baseline. Tell two failures apart: if the suite errors with missing imports,
|
||||
"no module named …", or "no tests ran," that's an **unconfigured environment**, not a baseline —
|
||||
finish the documented install (and pick a different repo if it still won't go green on a clean
|
||||
"no module named …", or "no tests ran," that's an **unconfigured environment**, not a baseline.
|
||||
Finish the documented install (and pick a different repo if it still won't go green on a clean
|
||||
clone). A genuine **pre-existing failure** (install succeeded, but a real test fails) is the other
|
||||
case — note it so it doesn't get blamed on you, and don't build on top of it.
|
||||
case: note it so it doesn't get blamed on you, and don't build on top of it.
|
||||
4. **Make the minimal edit.** Keep it to the files identified in step 2.
|
||||
5. **Add or extend a test** that fails without your change and passes with it.
|
||||
6. **Run the full suite.** All green, including the baseline tests.
|
||||
|
||||
Reference in New Issue
Block a user