Files
ai-workflow-course/modules/21-skills-teaching-the-ai-your-playbook/README.md
T
claude 2684095e2f Build out all 27 modules + capstone (#1)
Co-authored-by: claude <claude@jpaul.io>
Co-committed-by: claude <claude@jpaul.io>
2026-06-22 12:19:01 -04:00

307 lines
17 KiB
Markdown

# Module 21 — Skills: Teaching the AI Your Playbook
> **Stop re-explaining your own procedures.** A skill is a repeatable workflow written down once,
> committed, and invoked on demand — so the AI does the thing *your* way, the same way, every time,
> without you narrating the steps again.
---
## Prerequisites
- **Module 2** — you commit, read diffs, and treat the repo as durable memory. Skills live in that
repo and are versioned exactly like code.
- **Module 3** — markdown-as-versioned-text, and the `CHANGELOG.md` convention this module's lab
writes to.
- **Module 4** — the AI lives in your editor/CLI and reads your files directly. A skill is a file it
loads; a browser chat can't pick one up automatically.
- **Module 5 — the one this builds on directly.** You committed an always-on instructions file that
tells the AI how the project works in general. This module is its **structured big sibling**: the
same write-it-down-and-commit instinct, but for *specific repeatable procedures* invoked on demand.
- **Module 13** — what a real test is (and why "it didn't crash" isn't one). The lab's procedure
includes writing one.
- *Helpful, not required:* **Module 20 (MCP)** — a skill's steps can call the real tools an MCP
server exposes, which is where playbooks get genuinely powerful.
---
## Learning objectives
By the end of this module you can:
1. Explain the difference between an **always-on instructions file (Module 5)** and a **skill** — and
say when each is the right tool.
2. Write a skill: a structured, named, invokable playbook for a recurring task, in your tool's
format-agnostic essentials (when-to-use, inputs, ordered steps, done-criteria).
3. Have the AI **execute** a skill end to end and verify it followed every step.
4. Keep skills in version control so a procedure is shareable, reviewable, and recoverable like any
other artifact.
5. Recognize when a one-off prompt has earned promotion into a durable skill — and when it hasn't.
---
## Key concepts
### The pain: you keep narrating the same procedure
You've written the Module 5 instructions file, and it's working — the AI knows your layout, your test
command, your off-limits files. But there's a class of knowledge it doesn't cover: **multi-step
procedures you run again and again.**
"Add a new CLI command" is the canonical example. Done properly it's never one edit — it's: put the
logic in the right file, wire the CLI, write a test that actually checks the behavior, run the tests,
smoke-test the command, add a changelog line, commit it as one clean change. The AI can do every step.
But left to a bare prompt — *"add a `clear` command"* — it'll usually give you the code and forget the
test, or skip the changelog, or commit `tasks.json` along for the ride. So you spell out the seven
steps. It works. Next week you add another command and **you spell out the same seven steps again.**
That re-narration is the exact pain Module 1 named, one level up: not re-explaining the *project* each
session, but re-explaining the *procedure* each time you run it. A skill is where that procedure stops
being something you retype and becomes something the repo carries.
### What a skill is
A **skill** is a named, structured, invokable set of instructions for one repeatable procedure,
stored as a file in the repo and loaded **on demand** when that procedure is the task at hand.
Strip the vendor branding and every skill has the same four parts:
- **A name and a "when to use it."** So both you and the AI know which playbook applies — and, just as
importantly, when it *doesn't*.
- **Inputs.** The few things the procedure needs to be told (here: the command name and what it does).
- **Ordered steps.** The actual procedure — the commands, the files, the checks, in sequence, with the
non-negotiables marked ("run the tests before claiming success," "don't stage `tasks.json`").
- **Done-criteria.** How the AI (and you) know it's actually finished, not just "produced something."
That's it. A skill is a checklist precise enough that an agent can execute it and you can verify it
did.
### Skill vs. the Module 5 instructions file
This is the distinction to lock in, because the two are siblings and easy to conflate:
| | **Committed instructions file (Module 5)** | **Skill (this module)** |
|---|---|---|
| Scope | How the project works, *in general* | How to do *one specific procedure* |
| When it loads | **Always on** — read every session | **On demand** — invoked when relevant |
| Shape | Ambient briefing: conventions, commands, don't-touch list | A playbook: when-to-use, inputs, ordered steps, done-criteria |
| Analogy | The standing house rules posted on the wall | A labeled recipe card you pull out when you cook that dish |
They're complementary. The instructions file is the right home for facts true *all the time* ("tests
run with `python -m unittest`"). A skill is the right home for a procedure you run *sometimes* ("here
is exactly how we add a command"). Module 5 even told you this was coming: start with the always-on
file; graduate a procedure into a skill when it earns its own page.
### Why "on demand" is the whole point
Module 5 warned that **bloat kills an instructions file** — a 300-line always-on briefing gets read
the way you read a terms-of-service. So you *can't* solve the re-narration problem by stuffing every
procedure into the always-on file; you'd drown the signal that makes it work.
Skills are the escape hatch. Because a skill loads only when its procedure is the task, you can write
it in full detail — every step, every guardrail — without taxing every unrelated session. Ten skills
cost the AI nothing on a session that invokes none of them. This is **progressive disclosure**: keep
the always-on context lean, and pull in the deep procedure exactly when it's needed. It's the same
reason you don't tape every recipe you own to the kitchen wall.
### Skills live in version control
This is what makes a skill more than a snippet in a notes app, and it's why this module sits where it
does in the course. A skill is a file in the repo, so everything you already learned about versioned
text applies to it directly:
- **Recoverable and historied (Module 2).** A skill has a `git log`. You can see when a step was added
and why, and `git restore` a botched edit. The procedure is a checkpoint like any other.
- **Shareable (Modules 8 & 11).** Push the repo and the whole team — and every agent that later
operates on it — inherits the same playbook. Nobody runs their own private version of "how we add a
command." It's the Module 5 anti-drift argument, applied to procedures.
- **Reviewable (Module 10).** Changing how the AI performs a procedure arrives as a **diff in a PR**.
Tightening "add a test" into "add a test that asserts the end state, not just no-crash" is a
reviewable change to your team's workflow — not an invisible tweak in one person's setup.
A prompt you keep in your head dies with the session. A skill in the repo is durable, shared
capability. That's the upgrade: from one-off prompting to a versioned, reviewable asset.
### Naming the pattern, not the vendor
"Skills" is one name for this. Tools also call them custom commands, slash commands, recipes, prompts,
playbooks, or modes, and they load them differently — some auto-discover a dedicated folder, some need
you to point at a file, some let your always-on instructions file say *"when asked to add a command,
follow `add-command.md`."* **The durable pattern is the same in all of them: a named, invokable file
of structured steps for a repeatable procedure, kept in the repo.** Learn the pattern; map it onto
whatever your tool calls it. As with everything in this course, the model and the tool are swappable;
the playbook you wrote is the part that lasts.
### Skills compose with your tools
A skill's steps aren't limited to editing files. They can drive the test runner, the CLI, Git — and,
once you have **Module 20's MCP** servers wired up, the real systems behind them (open the issue, hit
the staging API, query the database). A skill is where you encode *"use these hands, in this order, to
get this outcome."* The deeper your toolchain, the more a written playbook is worth — because there
are more steps to get wrong, and more value in getting them right every time.
---
## The AI angle
A generic automation course would call this "write a runbook." The AI-specific twist is what makes it
land:
- **The AI will execute the playbook, not just read it.** A runbook for a human is a reminder; a skill
for an agent is something it *performs*. The precision pays off immediately — vague step, vague
result; imperative step ("run `python -m unittest`; do not claim success until it's green"), reliable
result.
- **The AI is confidently incomplete without one.** Asked to "add a command," it'll happily stop at
the code and skip the test, the changelog, the clean commit — and sound finished doing it. The skill
is how you make *complete* the default instead of a thing you have to keep catching.
- **The skill outlives the model.** Swap models next quarter and the playbook carries over unchanged.
You encoded the *procedure*, not the prompt that happened to coax it out of this month's model. The
workflow is the durable skill; the model is the swappable part — here, literally.
---
## Hands-on lab
**Lab language:** markdown (the skill file) plus shell and Python (the `tasks-app`). You'll write a
skill, then have your editor-integrated AI (Module 4) execute it.
You'll write a skill for the procedure from *Key concepts* — **add a new `tasks-app` command, end to
end: code + test + changelog + clean commit** — and then watch the AI run it on a command it's never
seen, producing all four parts without you listing the steps.
**You'll need:**
- Your agentic coding tool from Module 4, and knowledge of how it loads a procedure (a skills/commands
folder it auto-discovers, or simply pointing it at a file by name — check its docs).
- A Python 3.10+ `tasks-app`. Use the snapshot in this module's `lab/tasks-app/` (it has `add`,
`list`, `done`, `count`, a `test_tasks.py`, and a `CHANGELOG.md`), or carry forward your own from
earlier modules. Make it a Git repo if it isn't: `git init && git add . && git commit -m "Start"`.
### Part A — Install the skill
1. Copy this module's starter skill, `lab/add-command-skill.md`, into your `tasks-app` repo wherever
your tool expects procedures. If your tool auto-discovers a folder, put it there under a clear name
(e.g. `add-command.md`). If it doesn't, just drop it at the repo root — you'll invoke it by name.
```bash
cd ~/workflow-course/tasks-app
cp /path/to/modules/21-skills-teaching-the-ai-your-playbook/lab/add-command-skill.md add-command.md
```
2. Read it. The whole file is short on purpose — when-to-use, inputs, seven ordered steps, and
done-criteria. Confirm every project fact in it matches *your* app (test command, file names, the
off-limits `tasks.json`). A skill with wrong facts misdirects the AI worse than no skill.
3. **Commit it.** This is the point — the procedure now lives in version control:
```bash
git add add-command.md
git commit -m "Add skill: add a tasks-app command end to end"
```
### Part B — Invoke it
4. Start a **fresh** AI session in your editor and invoke the skill the way your tool does it — its
slash command / skill name, or plainly: *"Follow `add-command.md` to add a `clear` command that
removes all tasks."* Crucially, **don't list the steps yourself.** The skill is supposed to supply
them.
5. Watch it perform the procedure. A correctly-followed skill will, without you saying any of it:
- add `clear()` to `tasks.py` and wire a `clear` branch into `cli.py` (logic in the right file);
- add a real test to `test_tasks.py` that asserts the list is empty afterward (not just "no crash");
- run `python -m unittest` and show it green;
- smoke-test `python cli.py clear` and show the output;
- add a `CHANGELOG.md` line;
- stage code + test + changelog into one commit, **without** `tasks.json`.
### Part C — Verify it followed the playbook
6. Don't take the AI's word for it. Check against the skill's own done-criteria:
```bash
python -m unittest # green, and a clear-related test is present
python cli.py add "x" && python cli.py clear && python cli.py list # -> (no tasks yet)
git show --stat HEAD # one commit: tasks.py, cli.py, test_tasks.py, CHANGELOG.md — no tasks.json
```
If a step was skipped, that's the lab working: it shows you exactly where your wording was too soft.
Tighten that line, commit the skill change, and run it again on a second command (`high <index>` to
flag a task, say). **A skill you improve once and reuse forever is the deliverable** — not the one
`clear` command.
### Part D — See it as a reviewable, reusable asset
7. Look at what you built:
```bash
git log --oneline add-command.md # the procedure's own history
git diff HEAD~1 add-command.md # if you tightened it in Part C — your workflow change as a diff
```
That diff *is* a change to how your team adds commands — readable, attributable, revertable. In a
team repo (Modules 8, 11) it reaches everyone on `git pull`; behind review (Module 10) it lands as a
PR someone approves. You've turned a procedure you used to narrate into a versioned capability.
---
## Where it breaks
- **A skill is guidance, not enforcement — same caveat as Module 5.** It strongly biases the AI; it
doesn't bind it. The agent can still skip a step, especially a soft one, especially late in a long
session. The steps that *can't* be skipped are the ones backed by **CI (Module 14)** — the test the
skill tells it to write only truly gates anything once a pipeline runs it on every push. Write the
done-criteria as hard checks, and let CI be the backstop.
- **Skills rot.** A playbook that says "tests run with X" after you've moved to Y will confidently
march the AI off a cliff. Skills are code-adjacent: review them, update them, delete the ones you no
longer run. Committing them (so changes are visible) is what makes that maintainable.
- **Don't skillify everything.** A skill earns its place when a procedure is *repeated*, *multi-step*,
and *gets done wrong without one*. A one-off task doesn't need a playbook, and a pile of near-duplicate
skills is its own kind of bloat — now you're maintaining ten files and the AI has to pick the right
one. Promote a prompt to a skill the third time you've typed it, not the first.
- **Overlap with the always-on file causes drift.** If a fact lives in both your Module 5 instructions
file *and* a skill, you'll eventually update one and not the other. Keep general facts in the
always-on file and *reference* them from skills; don't duplicate them.
- **A skill is not a security boundary.** "Don't stage `tasks.json`" is a convention, not a permission.
An installed third-party skill is untrusted code that runs against your repo — vetting, permissions,
and prompt-injection defense are **Module 22's** job, immediately next, for exactly this reason.
---
## Check for understanding
**You're done when:**
- Your `tasks-app` repo has a committed skill file for "add a command," with `git log` showing the
commit that added it.
- You've invoked that skill and watched a fresh AI session produce **all four** parts — code, a real
test, a changelog entry, and one clean commit — *without you listing the steps that session*.
- You've verified it against the skill's done-criteria (tests green, command works, the commit
contains the right files and not `tasks.json`) rather than trusting the AI's summary.
- You can state, in one sentence, when to put knowledge in the always-on instructions file (Module 5)
versus a skill: general facts go in the file that's always read; a specific repeatable procedure goes
in a playbook invoked on demand.
When adding the *next* command is "invoke the skill" instead of "re-explain the seven steps," the
playbook is doing its job. Module 22 comes next, and not by accident: Unit 4 just gave the AI hands —
MCP servers and skills — and the very next thing is securing them, because an installed skill or
server is untrusted code running in your environment.
---
## Verify-before-publish
This is expansion-zone material; the *concept* is durable but tool specifics drift. Re-check at build
time:
- [ ] **Skill terminology and mechanics.** Confirm how mainstream agentic tools name and load skills
(skills / custom commands / slash commands / recipes / prompts), whether they auto-discover a
folder or need an explicit pointer, and any required file format/frontmatter — without pinning
the lesson to one vendor. Update the "Naming the pattern" paragraph if the common vocabulary has
shifted.
- [ ] **No vendor leaked in.** Verify the module still names the *pattern*, not one implementation, and
that the example skill format stays generic (when-to-use / inputs / steps / done-criteria).
- [ ] **Dependency chain intact.** Confirm Module 20 (MCP) and Module 22 (securing servers/skills) are
still numbered as referenced, and that nothing here leans on a tool introduced after Module 20.
- [ ] **Lab still runs.** `python -m unittest` is green in `lab/tasks-app/`, and the `clear`-command
walkthrough still matches the starter files (`add`/`list`/`done`/`count`, `test_tasks.py`,
`CHANGELOG.md`).