Files
ai-workflow-course/modules/03-version-control-for-words/README.md
T
claude c098933f25
Sync course wiki / sync-wiki (push) Successful in 4s
De-slop: remove every em-dash + banned words across all modules + capstone (#94)
Co-authored-by: claude <claude@jpaul.io>
Co-committed-by: claude <claude@jpaul.io>
2026-06-22 23:21:22 -04:00

375 lines
19 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Module 3: Version Control for Words, Not Just Code
> **The safest place to practice Git is on words, and it happens to be a genuinely useful skill on
> its own.** Branch an Architecture Decision Record (ADR), let the AI draft it, read the diff, merge
> it. Nothing breaks if it's wrong, so you build the muscle before the agent ever touches code.
---
## Prerequisites
- **Module 1:** you have the `tasks-app` project, an editor, and a terminal.
- **Module 2:** you can `init`, `commit`, read a `diff`, and `restore`. This module adds two new
verbs to that vocabulary: `branch` and `merge`. They're introduced here, in the lowest-stakes
setting possible (a markdown file), and picked up again for real code work in
**Module 6 (Branches: Sandboxes for Experiments)**.
You're still working the way you did in Modules 12: **AI in a browser tab, copy-paste into the
file.** Editor-integrated AI is Module 4. That's deliberate; practicing branch/merge on documents
is exactly the low-risk on-ramp that makes the copy-paste friction tolerable one more time.
---
## Learning objectives
By the end of this module you can:
1. Explain why plain-text formats (markdown, AsciiDoc) version cleanly while `.docx`/`.pptx` version
uselessly, and make the case to move a runbook or ADR out of Word.
2. Create a branch, do work on it, and merge it back. That's the full branch → diff → commit → merge
loop, run on a document where a mistake costs nothing.
3. Have an AI draft a real engineering document (an ADR or a runbook) and review its work as a diff
before accepting it.
4. Recognize that the wikis on most Git hosts are themselves Git repositories, so the docs you
thought lived "in a web UI" were version-controlled all along.
---
## Key concepts
### The three seams apply to documents too
Module 1 named the three places the copy-paste loop breaks: more than one file, more than one day,
no undo. Documents have every one of those problems, and most teams feel them *worse* than they feel
them in code:
- **More than one document.** A runbook references an ADR that references a spec. Change the decision
and three documents are now subtly out of sync, with no record of which changed when.
- **More than one day.** "Why did we decide to store state as JSON instead of SQLite?" The answer
lived in a meeting, or a Slack thread, or someone's head. Six months later it's gone.
- **No undo.** Someone edits the runbook during an incident, gets it wrong, and there's no clean way
back to the version that was correct an hour ago. `runbook-final-v2-ACTUAL-use-this.docx` is what
"no undo" looks like when it metastasizes.
Git fixes all three for documents the same way it fixes them for code, but only *if* the documents
are in a format Git can actually work with. That "if" is the whole argument.
### Why plain text wins: the diff is line-based
Git's core operation is the line-based diff. It compares two snapshots and reports which **lines**
changed. Everything good about Git (readable history, reviewable changes, automatic merges) is
built on that one capability. So a format versions well in exact proportion to how well it maps onto
*lines of text*.
Markdown and AsciiDoc are just text. Change one sentence in a markdown runbook and `git diff` shows
you exactly that:
```diff
-Restart the worker with `systemctl restart tasks-worker`.
+Restart the worker with `systemctl restart tasks-worker`, then tail the log for 30s to confirm.
```
That is a perfect change record. A reviewer reads it in two seconds. Two people can edit different
sections and Git merges them automatically, because the changes touch different lines.
Now do the same edit in a `.docx`. A Word document isn't text; it's a zipped bundle of XML, styles,
and metadata. Git happily tracks it, but it can't diff it meaningfully. Ask for the diff and you get:
```
Binary files a/runbook.docx and b/runbook.docx differ
```
That's it. That's the entire change record: *something* changed. You can't see *what*, you can't
review it, and you can't merge two people's edits; Git will force you to pick one whole file and
throw the other away. The version history exists and is **completely useless**. `.pptx` is worse,
because slide decks are even more structure and even less text.
This is a real, defensible engineering argument, not a style preference:
> **Runbooks, ADRs, specs, and changelogs belong in markdown in the repo, not in Word on a shared
> drive.** The moment a document needs history, review, or more than one author, a binary format is
> actively costing you the thing version control exists to provide.
The honest counterpoint, where binary formats still earn their place, is in *Where it breaks*.
### The document types worth versioning
You don't need to convert everything. These are the high-value targets, all naturally plain text:
- **READMEs:** how to run the thing. Already markdown by convention; you saw `tasks-app/README.md`
in Module 1.
- **ADRs (Architecture Decision Records):** short documents that capture *one* decision: the
context, the choice, and the consequences. The point is to make the *reasoning* survive the
meeting. An ADR lives next to the code, gets versioned with it, and answers "why is it like this?"
long after everyone's forgotten.
- **Runbooks:** the step-by-step for an operational task (deploy, restore, rotate a key, respond to
an alert). These get edited under pressure, which is exactly when you want clean history and undo.
- **Changelogs:** what changed in each release. A markdown `CHANGELOG.md` is the standard.
- **Specs / PRDs:** what you're going to build and why, before you build it.
For this audience the ADR is the easiest win: small, structured, high-value, and the kind of thing
that *never* gets written because it feels like overhead, right up until the AI drafts it for you in
ten seconds.
### Branch → diff → commit → merge (the new verbs)
Module 2 worked on a straight line of commits. A **branch** is a second line you can work on without
disturbing the first. The mental model: `main` is the version everyone trusts; a branch is a private
copy where you draft something, and **merge** folds your finished work back into `main`.
Creating a branch is one command, and `git branch` shows you which line you're on:
```console
$ git switch -c docs/adr-storage
Switched to a new branch 'docs/adr-storage'
$ git branch
* docs/adr-storage
main
```
The `*` marks your current branch. From there, the loop for a document is the same handful of verbs
every time: **draft** the doc (with the AI's help), **stage** it, read the **diff**, **commit** it on
the branch, **switch** back to `main`, then **merge** to fold the finished work in and delete the
spent branch. You'll run that whole sequence by hand in the lab; here, just hold the shape.
Two new-command notes for this audience:
- **`git switch -c <name>`** creates and moves onto a branch. (Older docs and muscle memory use
`git checkout -b <name>`; `switch` is the newer, clearer verb for the same thing. Either works.)
- **`git diff` shows nothing for a brand-new file** until Git is tracking it; new files are
"untracked," and `git diff` only compares *tracked* changes. That's why the loop above does
`git add` *then* `git diff --staged` (also spelled `--cached`): staging tells Git "track this," and
`--staged` shows you what's staged. For a new file the diff is all-additions, which is fine; you're
still reading every line before it lands.
Because this is one document on its own branch, the merge is trivial: nothing else touched `main`
while you worked, so Git **fast-forwards**; it just slides `main` up to your branch with no
conflict. That clean case is the whole reason we practice here first. What happens when two branches
edit the *same lines* (a merge conflict) is a real skill, and it gets its own treatment in
**Module 6**, on code, where the stakes make it worth the depth. Practice the happy path now; the
hard path is easier once the verbs are reflexes.
### The aha: your wiki was a Git repo all along
Most Git hosts (GitHub, GitLab, Gitea, and others) ship a **wiki** alongside each repository. It
looks like a web app: you click "New Page," type in a box, hit save. It feels like a different kind
of thing from your code.
It isn't. On essentially every one of these hosts, **the wiki is itself a Git repository**, a
separate repo, usually addressable as something like `your-project.wiki.git`, full of markdown files.
Every page is a `.md` file. Every "save" in the web UI is a commit. The web editor is just a
convenience layer over `git commit`.
The consequence: the documentation you've been editing in a browser textbox has had full version
history (diffs, blame, the works) the entire time. You can clone it, edit the markdown locally with
the same branch/diff/merge loop you're learning here, and push it back. (Cloning and pushing to a
remote repo is **Module 8** (remotes and hosting), so you can't do the clone in *this* lab yet. But
the realization changes how you see every wiki you'll ever touch: it's not a CMS, it's a repo
wearing a web UI.)
---
## The AI angle
Here's why this module is more than "learn Git on easy mode":
- **LLMs are native markdown writers.** Markdown is arguably the *most* fluent output format these
models have; they were trained on oceans of it, and they reach for it by default. Asking an AI to
"write an ADR for this decision" or "turn these rough notes into a runbook" plays directly to its
strengths. The output is genuinely good and genuinely in the right format, with zero conversion.
- **"Draft it, branch it, diff it, merge it" works today.** You don't need new tools, a new model, or
editor integration. The whole workflow (branch, paste the AI's draft into a `.md` file, read the
diff, merge) runs on the browser chat you already have open. Most of the rest of this course is
capability you have to build up to; this part you can put to work right now.
- **Reading the diff is how you review AI writing.** Same skill as reviewing AI code (Module 10), lower
stakes. The AI will write an ADR that *sounds* authoritative and confidently states a rationale it
invented. Reading the diff is how you catch "wait, that's not why we did this." The format makes the
review possible; your judgment makes it correct.
- **It seeds a habit the whole course depends on.** Once "the AI drafts, I review the diff, I decide"
is reflexive on documents, where a mistake costs nothing, you'll apply it without thinking when
the AI starts editing code, opening PRs, and running unattended later on.
---
## Hands-on lab
**Lab language:** shell (Git commands) plus a little markdown writing, on the `tasks-app` from
Modules 12. The AI stays in the **browser**; you copy its draft into the file yourself, exactly as
in Module 2.
In this lab you'll branch the repo, have the AI draft an **Architecture Decision Record**, review it
as a diff, and merge it into `main`. The document is real and the workflow is real; only the risk is
zero.
**You'll need:**
- Your `tasks-app` folder, already a Git repo with a clean working tree from Module 2
(`git status` should say "nothing to commit, working tree clean").
- Git installed and your AI assistant open in a browser tab.
- The ADR template from this module's `lab/adr-template.md` (and `lab/runbook-template.md` if you
want to do the variant at the end).
### Part A: Branch for the document
1. Confirm you're starting clean, then create a branch for the ADR:
```bash
cd ~/ai-workflow-course/tasks-app
git status # want: "working tree clean"
git switch -c docs/adr-storage # new branch, named for what it's for
git branch # the * shows you're on docs/adr-storage now
```
You're now working on a copy. Nothing you do here touches `main` until you merge.
### Part B: Let the AI draft the ADR
2. Make a home for decision records:
```bash
mkdir -p docs/adr
```
3. Open `adr-template.md` from this module's `lab/` folder in the course repo (wherever you downloaded
it; it lives in the course repo, *not* inside `tasks-app`). In your browser chat, give the AI that
template plus the context and ask for the draft:
> *"Here's an ADR template (paste the contents of `adr-template.md`). Fill it out for this decision:
> the `tasks-app` CLI stores its state in a plain `tasks.json` file next to the code. We chose JSON
> over SQLite or a hosted database because the app is a single-user local tool and zero-setup
> matters more than query power. Keep it concise. Output markdown."*
4. Now create the file and paste the draft in. In your editor, make a new file at this exact path
inside `tasks-app`:
```
docs/adr/0001-task-storage-format.md
```
Paste the AI's markdown into it and save. (This is the copy-paste loop from Module 1, the last
stretch before Module 4 removes it.) The file has to exist on disk before the next part can stage
it.
### Part C: Review the diff before you accept it
5. A brand-new file is untracked, so `git diff` shows nothing yet. Stage it, then review:
```bash
git status # the new file shows as "untracked"
git add docs/adr/0001-task-storage-format.md
git diff --staged # every line of the new doc, as additions
```
**Read it.** This is the point of the whole module: don't accept AI writing you haven't read. Check
the *substance*, not just that it's well-formatted. Did it state a rationale you actually agree
with, or did it invent a confident-sounding reason? If it's wrong, edit the file and `git add`
again.
6. When it's right, commit it on the branch:
```bash
git commit -m "Add ADR 0001: store tasks as JSON"
git log --oneline # your new checkpoint, on this branch
```
### Part D: Make a one-line edit and see the line-based diff
7. Edit one sentence in the ADR (tighten a line, fix a claim, whatever). Save, then:
```bash
git diff
```
Notice the diff shows **only the line you changed**, in context. That clean, surgical record is the
thing a `.docx` can never give you. Commit it:
```bash
git add docs/adr/0001-task-storage-format.md
git commit -m "Tighten ADR 0001 rationale"
```
### Part E: Merge it into main
8. First, switch back to `main` and prove the document isn't there yet. You created the whole
`docs/adr/` directory on the branch, so on `main` it doesn't exist:
```bash
git switch main
ls docs/adr/ # error: "No such file or directory", only on the branch
git log --oneline # and your ADR commits aren't here either
```
That's branch isolation: the work is real and committed, but completely invisible to `main` until
you merge. Now fold it in and watch the file appear:
```bash
git merge docs/adr-storage # fast-forward, no conflict
git log --oneline # the ADR commits are on main now
ls docs/adr/ # and the file is here too
```
9. Clean up the branch. Its work now lives in `main`:
```bash
git branch -d docs/adr-storage
```
You just ran the complete branch → draft → diff → commit → merge loop on a real document, with the AI
doing the writing and you doing the reviewing. That's the loop the rest of the course runs on.
### Optional: do it again as a runbook
Repeat the loop on a different branch (`git switch -c docs/runbook-restore`) using
`runbook-template.md` from this module's `lab/` folder: ask the AI to write a runbook for "restore the
tasks list after someone deletes `tasks.json` by accident," given that the app recreates an empty list
on next run. Same five parts. Doing it twice is what turns the commands into reflexes.
---
## Where it breaks
- **Line-based diffs punish reflowed paragraphs.** Git diffs *lines*. If you (or the AI) rewrap a
paragraph so every line shifts, the diff shows the whole paragraph as changed even if you altered
three words; the clean diff degrades toward `.docx`-style noise. The fix the technical-writing
world uses is **semantic line breaks**: write one sentence (or one clause) per line, so edits stay
local and diffs stay surgical. Worth knowing the AI will *not* do this by default; you can ask it
to.
- **Plain text isn't free of binaries.** A markdown doc with screenshots still carries `.png` files,
and Git diffs those as "binary files differ" just like a `.docx`. Git tracks and stores them fine;
it just can't show you what changed inside them. Diagrams-as-code (text formats that render to
pictures) sidestep this, but that's beyond this module.
- **Word and PowerPoint still exist for reasons.** A pixel-precise client deliverable, a slide deck
with heavy layout, a document a non-technical stakeholder must edit in a tool they already know.
These are real constraints. The argument isn't "markdown for everything." It's "anything that needs
history, review, or multiple authors is paying a steep tax in a binary format." Pick the targets
where that tax actually bites: runbooks, ADRs, specs, changelogs.
- **Merge conflicts are real; you just didn't hit one.** This lab fast-forwarded because nothing else
touched `main`. The moment two branches edit the same lines, Git stops and asks *you* to resolve it.
That's a genuine skill, deferred to **Module 6** on purpose so you learn it where the stakes make it
matter.
- **The wiki-clone aha needs a remote.** You can *see* that a host's wiki is a Git repo now, but
cloning it, editing locally, and pushing back requires remotes, which is **Module 8**. The realization is
yours today; the round trip waits a few modules.
- **The AI writes confident fiction.** It will produce a fluent ADR with a rationale that sounds
exactly like something a senior engineer wrote, and is sometimes simply made up. The format makes
the document reviewable; it does not make the document *true*. Reading the diff is necessary, not
sufficient. You still have to know whether the reasoning is right.
---
## Check for understanding
**You're done when:**
- Your `tasks-app` repo has an `docs/adr/0001-*.md` on `main`, authored by the AI and reviewed by you,
arrived there via a branch and a merge.
- You created a branch, committed to it, merged it back, and deleted it; `git log --oneline` on
`main` shows the ADR commits.
- You can explain, to a skeptical colleague, why the team's runbooks shouldn't be `.docx` files on a
shared drive, using the line-based-diff argument, not just "markdown is nicer."
- You know that your Git host's wiki is itself a Git repo, and what that implies.
When branch/diff/commit/merge feels routine on a document, you're ready for **Module 4**, where the AI
finally comes out of the browser and starts editing your files directly, a step that's only safe
because you can now branch, diff, and revert exactly what it does.