2684095e2f
Co-authored-by: claude <claude@jpaul.io> Co-committed-by: claude <claude@jpaul.io>
307 lines
17 KiB
Markdown
307 lines
17 KiB
Markdown
# Module 21 — Skills: Teaching the AI Your Playbook
|
|
|
|
> **Stop re-explaining your own procedures.** A skill is a repeatable workflow written down once,
|
|
> committed, and invoked on demand — so the AI does the thing *your* way, the same way, every time,
|
|
> without you narrating the steps again.
|
|
|
|
---
|
|
|
|
## Prerequisites
|
|
|
|
- **Module 2** — you commit, read diffs, and treat the repo as durable memory. Skills live in that
|
|
repo and are versioned exactly like code.
|
|
- **Module 3** — markdown-as-versioned-text, and the `CHANGELOG.md` convention this module's lab
|
|
writes to.
|
|
- **Module 4** — the AI lives in your editor/CLI and reads your files directly. A skill is a file it
|
|
loads; a browser chat can't pick one up automatically.
|
|
- **Module 5 — the one this builds on directly.** You committed an always-on instructions file that
|
|
tells the AI how the project works in general. This module is its **structured big sibling**: the
|
|
same write-it-down-and-commit instinct, but for *specific repeatable procedures* invoked on demand.
|
|
- **Module 13** — what a real test is (and why "it didn't crash" isn't one). The lab's procedure
|
|
includes writing one.
|
|
- *Helpful, not required:* **Module 20 (MCP)** — a skill's steps can call the real tools an MCP
|
|
server exposes, which is where playbooks get genuinely powerful.
|
|
|
|
---
|
|
|
|
## Learning objectives
|
|
|
|
By the end of this module you can:
|
|
|
|
1. Explain the difference between an **always-on instructions file (Module 5)** and a **skill** — and
|
|
say when each is the right tool.
|
|
2. Write a skill: a structured, named, invokable playbook for a recurring task, in your tool's
|
|
format-agnostic essentials (when-to-use, inputs, ordered steps, done-criteria).
|
|
3. Have the AI **execute** a skill end to end and verify it followed every step.
|
|
4. Keep skills in version control so a procedure is shareable, reviewable, and recoverable like any
|
|
other artifact.
|
|
5. Recognize when a one-off prompt has earned promotion into a durable skill — and when it hasn't.
|
|
|
|
---
|
|
|
|
## Key concepts
|
|
|
|
### The pain: you keep narrating the same procedure
|
|
|
|
You've written the Module 5 instructions file, and it's working — the AI knows your layout, your test
|
|
command, your off-limits files. But there's a class of knowledge it doesn't cover: **multi-step
|
|
procedures you run again and again.**
|
|
|
|
"Add a new CLI command" is the canonical example. Done properly it's never one edit — it's: put the
|
|
logic in the right file, wire the CLI, write a test that actually checks the behavior, run the tests,
|
|
smoke-test the command, add a changelog line, commit it as one clean change. The AI can do every step.
|
|
But left to a bare prompt — *"add a `clear` command"* — it'll usually give you the code and forget the
|
|
test, or skip the changelog, or commit `tasks.json` along for the ride. So you spell out the seven
|
|
steps. It works. Next week you add another command and **you spell out the same seven steps again.**
|
|
|
|
That re-narration is the exact pain Module 1 named, one level up: not re-explaining the *project* each
|
|
session, but re-explaining the *procedure* each time you run it. A skill is where that procedure stops
|
|
being something you retype and becomes something the repo carries.
|
|
|
|
### What a skill is
|
|
|
|
A **skill** is a named, structured, invokable set of instructions for one repeatable procedure,
|
|
stored as a file in the repo and loaded **on demand** when that procedure is the task at hand.
|
|
|
|
Strip the vendor branding and every skill has the same four parts:
|
|
|
|
- **A name and a "when to use it."** So both you and the AI know which playbook applies — and, just as
|
|
importantly, when it *doesn't*.
|
|
- **Inputs.** The few things the procedure needs to be told (here: the command name and what it does).
|
|
- **Ordered steps.** The actual procedure — the commands, the files, the checks, in sequence, with the
|
|
non-negotiables marked ("run the tests before claiming success," "don't stage `tasks.json`").
|
|
- **Done-criteria.** How the AI (and you) know it's actually finished, not just "produced something."
|
|
|
|
That's it. A skill is a checklist precise enough that an agent can execute it and you can verify it
|
|
did.
|
|
|
|
### Skill vs. the Module 5 instructions file
|
|
|
|
This is the distinction to lock in, because the two are siblings and easy to conflate:
|
|
|
|
| | **Committed instructions file (Module 5)** | **Skill (this module)** |
|
|
|---|---|---|
|
|
| Scope | How the project works, *in general* | How to do *one specific procedure* |
|
|
| When it loads | **Always on** — read every session | **On demand** — invoked when relevant |
|
|
| Shape | Ambient briefing: conventions, commands, don't-touch list | A playbook: when-to-use, inputs, ordered steps, done-criteria |
|
|
| Analogy | The standing house rules posted on the wall | A labeled recipe card you pull out when you cook that dish |
|
|
|
|
They're complementary. The instructions file is the right home for facts true *all the time* ("tests
|
|
run with `python -m unittest`"). A skill is the right home for a procedure you run *sometimes* ("here
|
|
is exactly how we add a command"). Module 5 even told you this was coming: start with the always-on
|
|
file; graduate a procedure into a skill when it earns its own page.
|
|
|
|
### Why "on demand" is the whole point
|
|
|
|
Module 5 warned that **bloat kills an instructions file** — a 300-line always-on briefing gets read
|
|
the way you read a terms-of-service. So you *can't* solve the re-narration problem by stuffing every
|
|
procedure into the always-on file; you'd drown the signal that makes it work.
|
|
|
|
Skills are the escape hatch. Because a skill loads only when its procedure is the task, you can write
|
|
it in full detail — every step, every guardrail — without taxing every unrelated session. Ten skills
|
|
cost the AI nothing on a session that invokes none of them. This is **progressive disclosure**: keep
|
|
the always-on context lean, and pull in the deep procedure exactly when it's needed. It's the same
|
|
reason you don't tape every recipe you own to the kitchen wall.
|
|
|
|
### Skills live in version control
|
|
|
|
This is what makes a skill more than a snippet in a notes app, and it's why this module sits where it
|
|
does in the course. A skill is a file in the repo, so everything you already learned about versioned
|
|
text applies to it directly:
|
|
|
|
- **Recoverable and historied (Module 2).** A skill has a `git log`. You can see when a step was added
|
|
and why, and `git restore` a botched edit. The procedure is a checkpoint like any other.
|
|
- **Shareable (Modules 8 & 11).** Push the repo and the whole team — and every agent that later
|
|
operates on it — inherits the same playbook. Nobody runs their own private version of "how we add a
|
|
command." It's the Module 5 anti-drift argument, applied to procedures.
|
|
- **Reviewable (Module 10).** Changing how the AI performs a procedure arrives as a **diff in a PR**.
|
|
Tightening "add a test" into "add a test that asserts the end state, not just no-crash" is a
|
|
reviewable change to your team's workflow — not an invisible tweak in one person's setup.
|
|
|
|
A prompt you keep in your head dies with the session. A skill in the repo is durable, shared
|
|
capability. That's the upgrade: from one-off prompting to a versioned, reviewable asset.
|
|
|
|
### Naming the pattern, not the vendor
|
|
|
|
"Skills" is one name for this. Tools also call them custom commands, slash commands, recipes, prompts,
|
|
playbooks, or modes, and they load them differently — some auto-discover a dedicated folder, some need
|
|
you to point at a file, some let your always-on instructions file say *"when asked to add a command,
|
|
follow `add-command.md`."* **The durable pattern is the same in all of them: a named, invokable file
|
|
of structured steps for a repeatable procedure, kept in the repo.** Learn the pattern; map it onto
|
|
whatever your tool calls it. As with everything in this course, the model and the tool are swappable;
|
|
the playbook you wrote is the part that lasts.
|
|
|
|
### Skills compose with your tools
|
|
|
|
A skill's steps aren't limited to editing files. They can drive the test runner, the CLI, Git — and,
|
|
once you have **Module 20's MCP** servers wired up, the real systems behind them (open the issue, hit
|
|
the staging API, query the database). A skill is where you encode *"use these hands, in this order, to
|
|
get this outcome."* The deeper your toolchain, the more a written playbook is worth — because there
|
|
are more steps to get wrong, and more value in getting them right every time.
|
|
|
|
---
|
|
|
|
## The AI angle
|
|
|
|
A generic automation course would call this "write a runbook." The AI-specific twist is what makes it
|
|
land:
|
|
|
|
- **The AI will execute the playbook, not just read it.** A runbook for a human is a reminder; a skill
|
|
for an agent is something it *performs*. The precision pays off immediately — vague step, vague
|
|
result; imperative step ("run `python -m unittest`; do not claim success until it's green"), reliable
|
|
result.
|
|
- **The AI is confidently incomplete without one.** Asked to "add a command," it'll happily stop at
|
|
the code and skip the test, the changelog, the clean commit — and sound finished doing it. The skill
|
|
is how you make *complete* the default instead of a thing you have to keep catching.
|
|
- **The skill outlives the model.** Swap models next quarter and the playbook carries over unchanged.
|
|
You encoded the *procedure*, not the prompt that happened to coax it out of this month's model. The
|
|
workflow is the durable skill; the model is the swappable part — here, literally.
|
|
|
|
---
|
|
|
|
## Hands-on lab
|
|
|
|
**Lab language:** markdown (the skill file) plus shell and Python (the `tasks-app`). You'll write a
|
|
skill, then have your editor-integrated AI (Module 4) execute it.
|
|
|
|
You'll write a skill for the procedure from *Key concepts* — **add a new `tasks-app` command, end to
|
|
end: code + test + changelog + clean commit** — and then watch the AI run it on a command it's never
|
|
seen, producing all four parts without you listing the steps.
|
|
|
|
**You'll need:**
|
|
|
|
- Your agentic coding tool from Module 4, and knowledge of how it loads a procedure (a skills/commands
|
|
folder it auto-discovers, or simply pointing it at a file by name — check its docs).
|
|
- A Python 3.10+ `tasks-app`. Use the snapshot in this module's `lab/tasks-app/` (it has `add`,
|
|
`list`, `done`, `count`, a `test_tasks.py`, and a `CHANGELOG.md`), or carry forward your own from
|
|
earlier modules. Make it a Git repo if it isn't: `git init && git add . && git commit -m "Start"`.
|
|
|
|
### Part A — Install the skill
|
|
|
|
1. Copy this module's starter skill, `lab/add-command-skill.md`, into your `tasks-app` repo wherever
|
|
your tool expects procedures. If your tool auto-discovers a folder, put it there under a clear name
|
|
(e.g. `add-command.md`). If it doesn't, just drop it at the repo root — you'll invoke it by name.
|
|
|
|
```bash
|
|
cd ~/workflow-course/tasks-app
|
|
cp /path/to/modules/21-skills-teaching-the-ai-your-playbook/lab/add-command-skill.md add-command.md
|
|
```
|
|
|
|
2. Read it. The whole file is short on purpose — when-to-use, inputs, seven ordered steps, and
|
|
done-criteria. Confirm every project fact in it matches *your* app (test command, file names, the
|
|
off-limits `tasks.json`). A skill with wrong facts misdirects the AI worse than no skill.
|
|
|
|
3. **Commit it.** This is the point — the procedure now lives in version control:
|
|
|
|
```bash
|
|
git add add-command.md
|
|
git commit -m "Add skill: add a tasks-app command end to end"
|
|
```
|
|
|
|
### Part B — Invoke it
|
|
|
|
4. Start a **fresh** AI session in your editor and invoke the skill the way your tool does it — its
|
|
slash command / skill name, or plainly: *"Follow `add-command.md` to add a `clear` command that
|
|
removes all tasks."* Crucially, **don't list the steps yourself.** The skill is supposed to supply
|
|
them.
|
|
|
|
5. Watch it perform the procedure. A correctly-followed skill will, without you saying any of it:
|
|
- add `clear()` to `tasks.py` and wire a `clear` branch into `cli.py` (logic in the right file);
|
|
- add a real test to `test_tasks.py` that asserts the list is empty afterward (not just "no crash");
|
|
- run `python -m unittest` and show it green;
|
|
- smoke-test `python cli.py clear` and show the output;
|
|
- add a `CHANGELOG.md` line;
|
|
- stage code + test + changelog into one commit, **without** `tasks.json`.
|
|
|
|
### Part C — Verify it followed the playbook
|
|
|
|
6. Don't take the AI's word for it. Check against the skill's own done-criteria:
|
|
|
|
```bash
|
|
python -m unittest # green, and a clear-related test is present
|
|
python cli.py add "x" && python cli.py clear && python cli.py list # -> (no tasks yet)
|
|
git show --stat HEAD # one commit: tasks.py, cli.py, test_tasks.py, CHANGELOG.md — no tasks.json
|
|
```
|
|
|
|
If a step was skipped, that's the lab working: it shows you exactly where your wording was too soft.
|
|
Tighten that line, commit the skill change, and run it again on a second command (`high <index>` to
|
|
flag a task, say). **A skill you improve once and reuse forever is the deliverable** — not the one
|
|
`clear` command.
|
|
|
|
### Part D — See it as a reviewable, reusable asset
|
|
|
|
7. Look at what you built:
|
|
|
|
```bash
|
|
git log --oneline add-command.md # the procedure's own history
|
|
git diff HEAD~1 add-command.md # if you tightened it in Part C — your workflow change as a diff
|
|
```
|
|
|
|
That diff *is* a change to how your team adds commands — readable, attributable, revertable. In a
|
|
team repo (Modules 8, 11) it reaches everyone on `git pull`; behind review (Module 10) it lands as a
|
|
PR someone approves. You've turned a procedure you used to narrate into a versioned capability.
|
|
|
|
---
|
|
|
|
## Where it breaks
|
|
|
|
- **A skill is guidance, not enforcement — same caveat as Module 5.** It strongly biases the AI; it
|
|
doesn't bind it. The agent can still skip a step, especially a soft one, especially late in a long
|
|
session. The steps that *can't* be skipped are the ones backed by **CI (Module 14)** — the test the
|
|
skill tells it to write only truly gates anything once a pipeline runs it on every push. Write the
|
|
done-criteria as hard checks, and let CI be the backstop.
|
|
- **Skills rot.** A playbook that says "tests run with X" after you've moved to Y will confidently
|
|
march the AI off a cliff. Skills are code-adjacent: review them, update them, delete the ones you no
|
|
longer run. Committing them (so changes are visible) is what makes that maintainable.
|
|
- **Don't skillify everything.** A skill earns its place when a procedure is *repeated*, *multi-step*,
|
|
and *gets done wrong without one*. A one-off task doesn't need a playbook, and a pile of near-duplicate
|
|
skills is its own kind of bloat — now you're maintaining ten files and the AI has to pick the right
|
|
one. Promote a prompt to a skill the third time you've typed it, not the first.
|
|
- **Overlap with the always-on file causes drift.** If a fact lives in both your Module 5 instructions
|
|
file *and* a skill, you'll eventually update one and not the other. Keep general facts in the
|
|
always-on file and *reference* them from skills; don't duplicate them.
|
|
- **A skill is not a security boundary.** "Don't stage `tasks.json`" is a convention, not a permission.
|
|
An installed third-party skill is untrusted code that runs against your repo — vetting, permissions,
|
|
and prompt-injection defense are **Module 22's** job, immediately next, for exactly this reason.
|
|
|
|
---
|
|
|
|
## Check for understanding
|
|
|
|
**You're done when:**
|
|
|
|
- Your `tasks-app` repo has a committed skill file for "add a command," with `git log` showing the
|
|
commit that added it.
|
|
- You've invoked that skill and watched a fresh AI session produce **all four** parts — code, a real
|
|
test, a changelog entry, and one clean commit — *without you listing the steps that session*.
|
|
- You've verified it against the skill's done-criteria (tests green, command works, the commit
|
|
contains the right files and not `tasks.json`) rather than trusting the AI's summary.
|
|
- You can state, in one sentence, when to put knowledge in the always-on instructions file (Module 5)
|
|
versus a skill: general facts go in the file that's always read; a specific repeatable procedure goes
|
|
in a playbook invoked on demand.
|
|
|
|
When adding the *next* command is "invoke the skill" instead of "re-explain the seven steps," the
|
|
playbook is doing its job. Module 22 comes next, and not by accident: Unit 4 just gave the AI hands —
|
|
MCP servers and skills — and the very next thing is securing them, because an installed skill or
|
|
server is untrusted code running in your environment.
|
|
|
|
---
|
|
|
|
## Verify-before-publish
|
|
|
|
This is expansion-zone material; the *concept* is durable but tool specifics drift. Re-check at build
|
|
time:
|
|
|
|
- [ ] **Skill terminology and mechanics.** Confirm how mainstream agentic tools name and load skills
|
|
(skills / custom commands / slash commands / recipes / prompts), whether they auto-discover a
|
|
folder or need an explicit pointer, and any required file format/frontmatter — without pinning
|
|
the lesson to one vendor. Update the "Naming the pattern" paragraph if the common vocabulary has
|
|
shifted.
|
|
- [ ] **No vendor leaked in.** Verify the module still names the *pattern*, not one implementation, and
|
|
that the example skill format stays generic (when-to-use / inputs / steps / done-criteria).
|
|
- [ ] **Dependency chain intact.** Confirm Module 20 (MCP) and Module 22 (securing servers/skills) are
|
|
still numbered as referenced, and that nothing here leans on a tool introduced after Module 20.
|
|
- [ ] **Lab still runs.** `python -m unittest` is green in `lab/tasks-app/`, and the `clear`-command
|
|
walkthrough still matches the starter files (`add`/`list`/`done`/`count`, `test_tasks.py`,
|
|
`CHANGELOG.md`).
|