feat(course): build out all 27 modules, capstone, scaffold, and conventions
Scaffold the course repo and author the full curriculum in dependency-chain order, following the settled build decisions in handoff.md. - Scaffold: course README, vendor-neutral AGENTS.md (dogfoods Module 5), _TEMPLATE.md (the fixed 9-section module shape), root .gitignore, ship config. - Modules 1-2: reference exemplars (locked for tone/depth/lab style). - Modules 3-27: full lessons + runnable labs, each following the template, respecting the chain, vendor/model-agnostic, with "feel the pain" labs. - Module 8 hosting comparison web-researched and date-stamped (as of 2026-06-22), not written from memory; expansion-zone modules carry Verify-before-publish. - Capstone: the full loop end to end on the running tasks-app example. Lab code syntax-checked (Python/shell/YAML); every module has the 7 core template sections. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01TfzV5QvtPDz8LJS3Pu5VLT
This commit is contained in:
@@ -0,0 +1,299 @@
|
||||
# Module 23 — Working with Existing Codebases
|
||||
|
||||
> **Every module so far quietly assumed you started the project. Most of your real work won't be
|
||||
> like that.** This module is about pointing AI at a large codebase you *didn't* write — and making
|
||||
> changes that don't break a system nobody fully understands.
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
This module needs only the **Module 4** tooling to *attempt* — an agentic, editor-integrated AI that
|
||||
can read and edit your files. But it's placed at the back on purpose, because the basics are exactly
|
||||
what make changing unfamiliar code survivable. Lean on:
|
||||
|
||||
- **Module 2 — Version control as a safety net.** You're about to let an AI touch code you don't
|
||||
understand. The commit you can return to is the only reason that's not reckless.
|
||||
- **Module 6 — Branches.** Every change here happens on a branch, isolated from working code.
|
||||
- **Module 10 — Reviewing code you didn't write.** The core skill of this whole course, now aimed at
|
||||
a diff in a codebase you *also* didn't write. Double the unfamiliarity, double the discipline.
|
||||
- **Module 12 — Revert, reset, and recovery.** When a change in a system you don't understand goes
|
||||
wrong, recovery is how you get out clean.
|
||||
- **Module 13 — Testing.** The existing test suite is your contract for "did I break anything I
|
||||
can't see?"
|
||||
- **Module 20 — MCP servers.** Real, structured access to the code and the tools around it, instead
|
||||
of pasting fragments.
|
||||
- **Module 21 — Skills.** Where you codify the navigation and safe-change playbooks this module
|
||||
teaches, so you don't re-explain them every session.
|
||||
|
||||
---
|
||||
|
||||
## Learning objectives
|
||||
|
||||
By the end of this module you can:
|
||||
|
||||
1. Give an AI enough **factual, verifiable context** about a large repo to be useful in it, instead
|
||||
of letting it work from a few pasted fragments.
|
||||
2. Have the AI **map and explain** an unfamiliar area — architecture, entry points, where things
|
||||
live — and verify that map against the actual files *before* anything is touched.
|
||||
3. Scope a change down to the **smallest reviewable diff** that solves the problem, and refuse the
|
||||
sweeping rewrite the AI will happily offer.
|
||||
4. Use **MCP (Module 20)** to give the AI real access to the code and surrounding tools, and
|
||||
**skills (Module 21)** to make your navigation and safe-change process repeatable.
|
||||
5. Make one **small, scoped, tested, reviewable** change to a codebase you didn't write — and know
|
||||
why it's safe.
|
||||
|
||||
---
|
||||
|
||||
## Key concepts
|
||||
|
||||
### The greenfield assumption, and why it was a lie
|
||||
|
||||
Everything up to now used `tasks-app`: a tiny project you stood up, understood completely, and grew.
|
||||
That made the lessons clean. It also made them unrepresentative. The dominant reality for an IT pro
|
||||
is the opposite: a codebase that's **large, old, written by people who've left, and load-bearing for
|
||||
something that matters.** You're not asked to build it. You're asked to change one thing in it
|
||||
without breaking the other thousand things you've never read.
|
||||
|
||||
This is where AI is simultaneously most tempting and most dangerous. Tempting, because "just ask the
|
||||
AI to figure it out" feels like exactly the leverage you need against 200,000 lines you don't know.
|
||||
Dangerous, because the AI's two default failure modes get *worse* the bigger and less familiar the
|
||||
codebase is:
|
||||
|
||||
- **It maps from vibes.** A file named `auth.py` becomes "the authentication module" in its mental
|
||||
model whether or not the real auth lives there. It confidently describes structure it inferred
|
||||
from names, not from reading. In a small repo you'd catch it. In a huge one you won't.
|
||||
- **It rewrites instead of edits.** Ask for a small change and it hands you a "cleaned-up" version of
|
||||
the whole file — reformatted, renamed, restructured — burying your one-line fix in a 300-line diff
|
||||
nobody can review. In code you wrote, that's annoying. In code you didn't, it's how an invisible
|
||||
regression ships.
|
||||
|
||||
The entire job of this module is to deny the AI both of those defaults: **force it to map from the
|
||||
real files, and force every change to stay small and reviewable.**
|
||||
|
||||
### The motion: orient, map, then change
|
||||
|
||||
Three phases, strictly in order. Skipping ahead is the mistake.
|
||||
|
||||
**1. Orient — establish ground truth before any opinion.** Before the AI gets to reason about the
|
||||
codebase, give it facts it can't hallucinate: the actual file list, the real entry points, the
|
||||
languages by volume, the build and test commands, the biggest files (often the spine of the system),
|
||||
the recent commit history. This is mechanical and cheap — a script produces it (the lab's `orient.py`
|
||||
does exactly this). It anchors everything that follows in reality. You're not asking the AI "what is
|
||||
this project?" cold; you're handing it the facts and asking it to *interpret* them.
|
||||
|
||||
**2. Map — explain the area before touching it.** Now the AI builds a mental model, and the only
|
||||
acceptable model is one **traced through real files with citations.** Don't accept "the request
|
||||
flows through the controller layer." Demand: "trace one request from entry point to response, naming
|
||||
each file it passes through." The deliverable is an architecture summary plus a "where things live"
|
||||
table — and crucially, a list of **open questions the code didn't answer.** A map with honest gaps is
|
||||
trustworthy. A map with no gaps is fiction. This phase is **read-only**; nothing changes on disk.
|
||||
|
||||
**3. Change — the smallest scoped, tested, reviewable diff.** Only now do you edit. One change, one
|
||||
branch (Module 6). Find the blast radius first — every caller of what you're touching — and if you
|
||||
can't enumerate them, you're not ready. Make the minimal edit, add a test that fails without it,
|
||||
run the *full* existing suite, and self-review the diff like it's someone else's PR (Module 10). No
|
||||
drive-by reformatting. No "while I was in here." The diff a reviewer sees should be exactly the
|
||||
change and nothing else.
|
||||
|
||||
### Context is the bottleneck, not intelligence
|
||||
|
||||
A frontier model is plenty smart enough to understand any one file in your repo. What it *can't* do
|
||||
is hold all 200,000 lines in its head at once — the context window is finite, and stuffing it full of
|
||||
irrelevant code makes the model worse, not better. So the skill here isn't "give the AI more." It's
|
||||
**give the AI the right slice, and a way to fetch more on demand.**
|
||||
|
||||
That reframes the orientation pack: its job is to be a small, high-signal index that lets the AI
|
||||
decide what to read next, not a dump of the whole tree. And it's exactly why the next two tools
|
||||
matter so much in this module.
|
||||
|
||||
### Where MCP earns its place (Module 20)
|
||||
|
||||
Pasting files into a chat doesn't scale past a handful of them, and it makes the AI work blind
|
||||
between pastes. **MCP (Module 20) gives the AI real, structured access to the codebase and the tools
|
||||
around it** so it can navigate on its own instead of waiting for you to feed it fragments. The kinds
|
||||
of access that turn a guessing model into a grounded one:
|
||||
|
||||
- **The filesystem and code search** — so it can grep for every caller of a function instead of
|
||||
assuming it found them all.
|
||||
- **Language-server intelligence** — go-to-definition, find-references, type info — so "where is this
|
||||
used?" is answered by the toolchain, not by the model's guess.
|
||||
- **The surrounding systems** — the issue tracker (Module 9), CI results (Module 14), the running
|
||||
app's logs — so the AI maps the code *and* the context it lives in.
|
||||
|
||||
The orientation pack is the cold-start. MCP is how the AI keeps the map accurate as it digs, by
|
||||
pulling real answers from real tools instead of inferring them.
|
||||
|
||||
### Where skills earn their place (Module 21)
|
||||
|
||||
The orient/map/change motion is the same on every repo. That makes it a perfect candidate for a
|
||||
**skill (Module 21)** — a committed, reusable playbook so you don't re-explain "map before you touch,
|
||||
cite real files, keep the diff small" every single session. This module ships two starter skills in
|
||||
`lab/skills/`:
|
||||
|
||||
- **`map-this-repo`** — the read-only navigation playbook: orient, find entry points, trace one path
|
||||
end to end, produce a cited architecture summary with honest open questions.
|
||||
- **`safe-change`** — the safe-change playbook: branch first, find the blast radius, baseline the
|
||||
tests, make the minimal edit, cover it, self-review, and a set of **stop conditions** that tell the
|
||||
AI to escalate to a human instead of pushing on.
|
||||
|
||||
These are the structured big siblings of the committed config from Module 5: instead of "be careful
|
||||
in unfamiliar code," they encode *exactly* what careful means, as steps the AI follows every time.
|
||||
|
||||
---
|
||||
|
||||
## The AI angle
|
||||
|
||||
A generic "onboarding to a legacy codebase" guide would tell a human to read the README and ask a
|
||||
senior dev. What's specific here is that **the AI is both the thing reading the codebase and the
|
||||
thing most likely to confidently misread it** — and the bigger the repo, the wider that gap between
|
||||
"sounds authoritative" and "is correct."
|
||||
|
||||
So the AI-specific discipline is verification, not exploration. The model is genuinely excellent at
|
||||
the grunt work of orientation — reading a hundred files, summarizing structure, tracing a call path —
|
||||
which is exactly the work that's tedious and slow for a human. But it will narrate a wrong map with
|
||||
the same fluent confidence as a right one. Your job shifts from "explore the code" (let the AI do
|
||||
that) to "make the AI prove its map against real files, and keep its changes small enough that a
|
||||
wrong map can't do much damage." The whole earlier toolchain — version control, branches, review,
|
||||
tests, recovery — is what turns "the AI might be wrong about this huge system" from a catastrophe
|
||||
into a revertable diff.
|
||||
|
||||
---
|
||||
|
||||
## Hands-on lab
|
||||
|
||||
**Lab language:** shell + the provided Python script (`orient.py`); you run it, you don't write it.
|
||||
This lab does **not** use `tasks-app` — the entire point is a codebase you *didn't* write.
|
||||
|
||||
**You'll need:**
|
||||
|
||||
- Git, Python 3.10+, and your agentic AI tool from Module 4.
|
||||
- A real, small-to-medium open-source repo to clone. Pick something with **tests** and a clear
|
||||
build/test command, in a language you can at least read. Good traits: a few thousand lines, an
|
||||
obvious entry point, a green test suite. (Avoid giant frameworks for a first run — you want a
|
||||
system you can't fully hold in your head, but whose test suite finishes in under a minute.)
|
||||
- The starter files from this module's `lab/` folder: `orient.py` and `skills/`.
|
||||
|
||||
### Part A — Clone and orient
|
||||
|
||||
1. Clone your chosen repo and copy `orient.py` into its root:
|
||||
|
||||
```bash
|
||||
git clone <repo-url> unfamiliar-repo
|
||||
cd unfamiliar-repo
|
||||
# copy modules/23-working-with-existing-codebases/lab/orient.py into this folder
|
||||
python orient.py > ORIENT.md
|
||||
```
|
||||
|
||||
2. Read `ORIENT.md` yourself first. In 30 seconds you should know the language, the likely entry
|
||||
point, the probable test command, and which files are biggest. These are **facts** — the AI can't
|
||||
argue with them. (Don't commit `ORIENT.md`; it's scratch context.)
|
||||
|
||||
### Part B — Map before you touch (read-only)
|
||||
|
||||
3. Start a fresh AI session, load the `map-this-repo` skill (`lab/skills/map-this-repo.md`) or paste
|
||||
it as instructions, and give it `ORIENT.md` as the opening context.
|
||||
|
||||
4. Ask it to produce the architecture summary: what the project does, a "where things live" table,
|
||||
the confirmed build/test command, and a traced path for one real operation end to end —
|
||||
**with every claim citing a real file.** Demand the list of open questions it couldn't resolve.
|
||||
|
||||
5. **Verify the map.** Open two or three files it cited and confirm they say what it claimed. This is
|
||||
the step everyone wants to skip and the one that catches the confident-but-wrong map. If a
|
||||
citation doesn't hold up, the map is suspect — push back and make it re-trace.
|
||||
|
||||
### Part C — One small, scoped, tested change
|
||||
|
||||
6. Pick a genuinely small change — a clearer error message, a fixed edge case, a tiny missing
|
||||
validation, a documented-but-unhandled input. Something a single function owns. Run the existing
|
||||
tests first to establish a green baseline (`pytest`, `npm test`, `go test ./...` — whatever
|
||||
`ORIENT.md` and the README confirmed).
|
||||
|
||||
7. Branch, then load the `safe-change` skill (`lab/skills/safe-change.md`) and work the change with
|
||||
the AI:
|
||||
|
||||
```bash
|
||||
git switch -c scoped-change
|
||||
```
|
||||
|
||||
Make it find the blast radius (every caller) before editing. Keep the edit minimal. Add a test
|
||||
that fails without the change and passes with it. Run the **full** suite.
|
||||
|
||||
8. **Review the diff like it's a stranger's PR (Module 10):**
|
||||
|
||||
```bash
|
||||
git diff
|
||||
```
|
||||
|
||||
Every changed line should be necessary and explainable. If the AI snuck in a reformat or a
|
||||
rename, revert it — that's the sprawl this whole module exists to prevent. Commit only when the
|
||||
diff is exactly the change and nothing more.
|
||||
|
||||
9. Write the PR description the `safe-change` skill asks for: what changed, why, the blast radius,
|
||||
how you tested it, and what you deliberately did *not* touch.
|
||||
|
||||
---
|
||||
|
||||
## Where it breaks
|
||||
|
||||
- **A confident map is still just a hypothesis.** The AI will produce a fluent, plausible
|
||||
architecture summary for a repo it half-read. Fluency is not correctness. The citation-checking in
|
||||
Part B isn't optional ceremony — it's the only thing standing between you and changing code based on
|
||||
a fiction. Verify at least a few claims by hand, every time.
|
||||
- **The context window is a hard ceiling.** On a truly large monorepo, the AI cannot see everything,
|
||||
and it usually won't *tell* you what it didn't read. Its map is only as good as the slice it
|
||||
actually loaded. MCP-backed search and language-server tools (Module 20) shrink this problem by
|
||||
letting it fetch on demand, but they don't erase it — treat "I've reviewed the whole codebase" as
|
||||
a claim to distrust.
|
||||
- **"Small change" can hide a big blast radius.** A one-line edit to a heavily-called function can
|
||||
ripple through code you never opened. The blast-radius search in the `safe-change` skill is the
|
||||
defense, but it's only as good as the AI's ability to find *every* caller — dynamic dispatch,
|
||||
reflection, config-driven wiring, and string-based lookups all defeat naive search. When in doubt,
|
||||
the tests are your backstop, which is why a repo *without* tests is genuinely dangerous to change
|
||||
this way.
|
||||
- **The AI doesn't respect house style by default.** It writes in *its* idiom, not the repo's. In an
|
||||
existing codebase that's a tell that screams "an outsider touched this" and quietly degrades
|
||||
consistency. The committed instructions file (Module 5) and the `safe-change` skill's
|
||||
"match local conventions" rule help, but you'll still catch drift in review.
|
||||
- **Some changes shouldn't be a small diff.** A genuine architectural problem won't be fixed by the
|
||||
smallest-possible edit, and forcing it to be makes things worse. This module's discipline is for
|
||||
the common case — a scoped change in a system you don't own. Recognizing when a change is actually
|
||||
a *project* (and escalating it as one) is its own judgment call the tooling won't make for you.
|
||||
|
||||
---
|
||||
|
||||
## Check for understanding
|
||||
|
||||
**You're done when:**
|
||||
|
||||
- You can hand an AI a factual orientation pack and get back an architecture summary whose citations
|
||||
you've **personally verified** against the real files — including the open questions it couldn't
|
||||
resolve.
|
||||
- You've made one change to a codebase you didn't write that is on its own branch, covered by a test
|
||||
that fails without it, passing the full existing suite, and whose `git diff` is *exactly* the
|
||||
change with no drive-by edits.
|
||||
- You can explain why the orient -> map -> change order is non-negotiable, and name the two AI
|
||||
failure modes (mapping from vibes, rewriting instead of editing) this module is built to deny.
|
||||
- You can point to where MCP (Module 20) and skills (Module 21) make this repeatable rather than a
|
||||
one-off heroics session.
|
||||
|
||||
If your change is a clean, tested, reviewable one-liner in a system you couldn't have described an
|
||||
hour ago — and you trust it — you've got the motion.
|
||||
|
||||
---
|
||||
|
||||
## Verify-before-publish
|
||||
|
||||
This is an expansion-zone module; the durable motion is stable, but the tooling around it moves.
|
||||
|
||||
- [ ] Confirm `orient.py` runs unchanged on current Python (3.10+) and a freshly cloned repo on
|
||||
macOS, Linux, and Windows (git-bash / PowerShell).
|
||||
- [ ] Re-check the MCP capabilities cited (filesystem, code search, language-server intelligence,
|
||||
issue/CI/log access) against what's actually common in the current MCP ecosystem — the menu of
|
||||
available servers changes fast. Keep it described as capabilities, not specific products.
|
||||
- [ ] Verify the cross-references still point to the right modules if any renumbering happened
|
||||
(4, 6, 9, 10, 12, 13, 20, 21).
|
||||
- [ ] Re-confirm the `SIGNALS`/`TEST_HINTS` tables in `orient.py` still reflect common manifests and
|
||||
test runners; add any that have become standard, but keep it language-agnostic.
|
||||
- [ ] Sanity-check the suggested "small-to-medium repo with a fast test suite" lab guidance still
|
||||
lands — recommend nothing by name that could rot.
|
||||
@@ -0,0 +1,191 @@
|
||||
#!/usr/bin/env python3
|
||||
"""orient.py — build a factual orientation pack for a repo you didn't write.
|
||||
|
||||
Run it from the root of a cloned repo. It prints a Markdown summary of *ground truth*
|
||||
about the codebase — size, languages, project signals, the biggest (often most central)
|
||||
files, the top-level layout, and likely build/test commands — that you can paste in as the
|
||||
opening context for an AI session before asking it to map or change anything.
|
||||
|
||||
The point is NOT to replace the AI's own exploration. It's to anchor that exploration in
|
||||
facts the model can't hallucinate: real file names, real counts, real entry points. The AI
|
||||
then verifies and deepens this; you never let it map from vibes alone.
|
||||
|
||||
No dependencies. Standard library only. Works on any OS with Python 3.10+ and git.
|
||||
|
||||
python orient.py # print the pack
|
||||
python orient.py > ORIENT.md # save it to hand to the AI (don't commit it)
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import subprocess
|
||||
import sys
|
||||
from collections import Counter
|
||||
from pathlib import Path
|
||||
|
||||
# Files whose mere presence tells you how the project is built, tested, shipped, and configured.
|
||||
# (key file/dir -> what its presence means). Kept tool- and language-agnostic on purpose.
|
||||
SIGNALS: dict[str, str] = {
|
||||
"pyproject.toml": "Python project (PEP 621 / poetry / hatch)",
|
||||
"setup.py": "Python project (legacy setuptools)",
|
||||
"requirements.txt": "Python dependencies (pip)",
|
||||
"package.json": "Node/JS project",
|
||||
"pnpm-lock.yaml": "Node project (pnpm)",
|
||||
"yarn.lock": "Node project (yarn)",
|
||||
"go.mod": "Go module",
|
||||
"Cargo.toml": "Rust crate",
|
||||
"pom.xml": "Java/Maven project",
|
||||
"build.gradle": "Java/Kotlin/Gradle project",
|
||||
"Gemfile": "Ruby project",
|
||||
"composer.json": "PHP project",
|
||||
"Makefile": "Make targets (often the real entry point for build/test)",
|
||||
"Dockerfile": "Containerized (Module 16)",
|
||||
"docker-compose.yml": "Multi-service local stack (Module 16)",
|
||||
"compose.yaml": "Multi-service local stack (Module 16)",
|
||||
".github": "GitHub Actions / project meta",
|
||||
".gitea": "Gitea Actions",
|
||||
".gitlab-ci.yml": "GitLab CI",
|
||||
"tox.ini": "Python test matrix",
|
||||
"README.md": "Has a README — read it first",
|
||||
"CONTRIBUTING.md": "Has contributor guidance — read before changing",
|
||||
"ARCHITECTURE.md": "Has an architecture doc — rare and valuable",
|
||||
"AGENTS.md": "Has a committed AI instructions file (Module 5)",
|
||||
"CLAUDE.md": "Has a committed AI instructions file (Module 5)",
|
||||
}
|
||||
|
||||
# Common test-runner hints keyed off a present signal file.
|
||||
TEST_HINTS: dict[str, str] = {
|
||||
"pyproject.toml": "pytest (or: python -m pytest)",
|
||||
"tox.ini": "tox",
|
||||
"package.json": "npm test (check the \"scripts\" block for the real command)",
|
||||
"go.mod": "go test ./...",
|
||||
"Cargo.toml": "cargo test",
|
||||
"Makefile": "make test (if a 'test' target exists)",
|
||||
"pom.xml": "mvn test",
|
||||
"Gemfile": "bundle exec rspec (or rake test)",
|
||||
}
|
||||
|
||||
CODE_EXTS = {
|
||||
".py", ".js", ".ts", ".jsx", ".tsx", ".go", ".rs", ".java", ".kt", ".rb",
|
||||
".php", ".c", ".h", ".cc", ".cpp", ".hpp", ".cs", ".swift", ".scala", ".sh",
|
||||
}
|
||||
|
||||
|
||||
def git(*args: str) -> str:
|
||||
"""Run a git command, return stdout (stripped), or "" on failure."""
|
||||
try:
|
||||
out = subprocess.run(
|
||||
["git", *args],
|
||||
capture_output=True, text=True, check=True,
|
||||
)
|
||||
return out.stdout.strip()
|
||||
except (subprocess.CalledProcessError, FileNotFoundError):
|
||||
return ""
|
||||
|
||||
|
||||
def tracked_files() -> list[str]:
|
||||
listing = git("ls-files")
|
||||
return [line for line in listing.splitlines() if line]
|
||||
|
||||
|
||||
def line_count(path: str) -> int:
|
||||
try:
|
||||
with open(path, "rb") as fh:
|
||||
return sum(1 for _ in fh)
|
||||
except OSError:
|
||||
return 0
|
||||
|
||||
|
||||
def main() -> int:
|
||||
if not Path(".git").exists() and not git("rev-parse", "--is-inside-work-tree"):
|
||||
print("Not inside a git repository. cd into a cloned repo first.", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
files = tracked_files()
|
||||
if not files:
|
||||
print("No tracked files found (is this an empty or non-git repo?).", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
out: list[str] = []
|
||||
w = out.append
|
||||
|
||||
# --- identity -----------------------------------------------------------
|
||||
remote = git("remote", "get-url", "origin") or "(no origin remote)"
|
||||
branch = git("rev-parse", "--abbrev-ref", "HEAD") or "(unknown)"
|
||||
total_commits = git("rev-list", "--count", "HEAD") or "?"
|
||||
|
||||
w("# Repo orientation pack\n")
|
||||
w(f"- **Origin:** {remote}")
|
||||
w(f"- **Branch:** {branch}")
|
||||
w(f"- **Total commits:** {total_commits}")
|
||||
w(f"- **Tracked files:** {len(files)}")
|
||||
|
||||
# --- languages ----------------------------------------------------------
|
||||
ext_counts: Counter[str] = Counter()
|
||||
for f in files:
|
||||
ext = Path(f).suffix.lower() or "(none)"
|
||||
ext_counts[ext] += 1
|
||||
w("\n## Languages / file types (top 15 by file count)\n")
|
||||
for ext, n in ext_counts.most_common(15):
|
||||
marker = " <- code" if ext in CODE_EXTS else ""
|
||||
w(f"- `{ext}`: {n}{marker}")
|
||||
|
||||
# --- project signals ----------------------------------------------------
|
||||
present = {name for name in SIGNALS if Path(name).exists()}
|
||||
w("\n## Project signals (what's present at the root)\n")
|
||||
if present:
|
||||
for name in SIGNALS:
|
||||
if name in present:
|
||||
w(f"- `{name}` — {SIGNALS[name]}")
|
||||
else:
|
||||
w("- (none of the usual manifests/CI/docs at the root — look one level down)")
|
||||
|
||||
# --- likely test command ------------------------------------------------
|
||||
hints = [TEST_HINTS[name] for name in TEST_HINTS if name in present]
|
||||
w("\n## Likely build/test command (verify before trusting)\n")
|
||||
if hints:
|
||||
for h in hints:
|
||||
w(f"- `{h}`")
|
||||
else:
|
||||
w("- No obvious runner detected. Search the README and CI config for the real command.")
|
||||
|
||||
# --- biggest files (often the spine) ------------------------------------
|
||||
sized = sorted(
|
||||
((line_count(f), f) for f in files if Path(f).suffix.lower() in CODE_EXTS),
|
||||
reverse=True,
|
||||
)[:15]
|
||||
w("\n## Largest code files (often where the core logic lives)\n")
|
||||
if sized:
|
||||
for n, f in sized:
|
||||
w(f"- {n:>6} lines `{f}`")
|
||||
else:
|
||||
w("- (no recognized source files)")
|
||||
|
||||
# --- top-level layout ---------------------------------------------------
|
||||
top_dirs: Counter[str] = Counter()
|
||||
for f in files:
|
||||
head = f.split("/", 1)[0]
|
||||
top_dirs[head] += 1
|
||||
w("\n## Top-level layout (entries by tracked-file count)\n")
|
||||
for name, n in sorted(top_dirs.items(), key=lambda kv: (-kv[1], kv[0])):
|
||||
kind = "dir" if "/" in next(p for p in files if p.split("/", 1)[0] == name) else "file"
|
||||
w(f"- `{name}`{'/' if kind == 'dir' else ''} — {n}")
|
||||
|
||||
# --- recent activity ----------------------------------------------------
|
||||
recent = git("log", "--oneline", "-10")
|
||||
w("\n## Last 10 commits (the project's recent direction)\n")
|
||||
w("```")
|
||||
w(recent or "(no history)")
|
||||
w("```")
|
||||
|
||||
w("\n---")
|
||||
w("> Generated by orient.py. These are *facts*, not conclusions. Hand them to the AI as the")
|
||||
w("> opening context, then make it verify and map the areas you actually care about before")
|
||||
w("> it changes anything.")
|
||||
|
||||
print("\n".join(out))
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
@@ -0,0 +1,32 @@
|
||||
# Skill: Map this repo
|
||||
|
||||
A navigation playbook (a Module 21 skill) for orienting in a codebase you didn't write.
|
||||
Point your agentic tool at this file as a skill, or paste it in as instructions. The goal is a
|
||||
**read-only** mental model — no edits happen here.
|
||||
|
||||
## When to use
|
||||
At the start of any session on an unfamiliar repo, before any change is discussed.
|
||||
|
||||
## Rules
|
||||
- **Read only.** Do not edit, create, or delete files while mapping. No exceptions.
|
||||
- **Cite real paths.** Every claim about the code must point to a file and, ideally, a line range.
|
||||
If you can't cite it, say "unverified" instead of guessing.
|
||||
- **Breadth before depth.** Establish the whole shape before diving into any one area.
|
||||
- **No conclusions from file names alone.** A file called `auth.py` may not be where auth lives.
|
||||
|
||||
## Steps
|
||||
1. Read the orientation pack (from `orient.py`), the README, and any `CONTRIBUTING`,
|
||||
`ARCHITECTURE`, or committed AI-instructions file. Treat these as claims to verify, not truth.
|
||||
2. Identify the **entry points**: how does this thing start? (CLI `main`, web server, library
|
||||
exports.) Name the exact file(s).
|
||||
3. Trace **one representative request/command end to end** — from entry point to where it does its
|
||||
real work and back. List the files it passes through, in order.
|
||||
4. Produce an **architecture summary** (max ~1 page):
|
||||
- One paragraph: what this project does and how it's structured.
|
||||
- A "where things live" table: concern -> directory/file.
|
||||
- The build/test/run commands, confirmed against the README or CI config.
|
||||
- 3-5 things that surprised you or look risky to touch.
|
||||
5. List **open questions** you could not resolve from the code. Do not paper over them.
|
||||
|
||||
## Output
|
||||
A single Markdown summary. End with: "Verified against: <list of files actually read>."
|
||||
@@ -0,0 +1,39 @@
|
||||
# Skill: Safe scoped change
|
||||
|
||||
A safe-change playbook (a Module 21 skill) for modifying a codebase you don't fully understand.
|
||||
Use it only **after** `map-this-repo` has produced an architecture summary. The whole bet of this
|
||||
skill is: small, scoped, tested, reviewable — never a sweeping rewrite.
|
||||
|
||||
## When to use
|
||||
When making a concrete change to an unfamiliar repo.
|
||||
|
||||
## Rules
|
||||
- **One change, one branch.** Create a branch first (Module 6). Never work on the default branch.
|
||||
- **Smallest diff that solves it.** Touch the fewest files possible. If the change wants to sprawl,
|
||||
stop and re-scope — sprawl in code you don't understand is how you break things invisibly.
|
||||
- **No drive-by edits.** Do not reformat, rename, or "clean up" unrelated code. Those bury the real
|
||||
change and make the diff unreviewable (Module 10).
|
||||
- **Match local conventions.** Mirror the surrounding code's style, naming, and patterns — not your
|
||||
own defaults.
|
||||
- **Tests are the contract.** A change isn't done until it's covered (Module 13) and the existing
|
||||
suite still passes.
|
||||
|
||||
## Steps
|
||||
1. **State the change in one sentence** and the acceptance criterion ("done when X").
|
||||
2. **Find the blast radius first:** search for every caller/usage of what you're about to touch.
|
||||
List them. If you can't enumerate them, you're not ready to change it.
|
||||
3. **Run the existing tests before touching anything** — establish a green baseline. If they were
|
||||
already red, note it; don't let a pre-existing failure get blamed on you.
|
||||
4. **Make the minimal edit.** Keep it to the files identified in step 2.
|
||||
5. **Add or extend a test** that fails without your change and passes with it.
|
||||
6. **Run the full suite.** All green, including the baseline tests.
|
||||
7. **Self-review the diff** as if reviewing someone else's PR (Module 10): is every changed line
|
||||
necessary and explained? Revert anything that isn't.
|
||||
8. **Write the PR description:** what changed, why, blast radius, how it was tested, what you did
|
||||
NOT touch and why.
|
||||
|
||||
## Stop conditions (escalate to a human instead of pushing on)
|
||||
- The change requires touching more than ~3 files or a "core" file from the architecture summary.
|
||||
- You can't enumerate the callers of what you're changing.
|
||||
- A test you don't understand starts failing.
|
||||
- The fix needs a design decision the existing code doesn't settle.
|
||||
Reference in New Issue
Block a user