ai-workflow-course/modules/21-skills-teaching-the-ai-your-playbook/README.md

# Module 21: Skills: Teaching the AI Your Playbook

> **Stop re-explaining your own procedures.** A skill is a repeatable workflow written down once,
> committed, and invoked on demand, so the AI does the thing *your* way, the same way, every time,
> without you narrating the steps again.

---

## Prerequisites

- **Module 2:** you commit, read diffs, and treat the repo as durable memory. Skills live in that
  repo and are versioned exactly like code.
- **Module 3:** markdown-as-versioned-text, and the `CHANGELOG.md` convention this module's lab
  writes to.
- **Module 4:** the AI lives in your editor/CLI and reads your files directly. A skill is a file it
  loads; a browser chat can't pick one up automatically.
- **Module 5, the one this builds on directly.** You committed an always-on instructions file that
  tells the AI how the project works in general. This module is its **structured big sibling**: the
  same write-it-down-and-commit instinct, but for *specific repeatable procedures* invoked on demand.
- **Module 13:** what a real test is (and why "it didn't crash" isn't one). The lab's procedure
  includes writing one.
- *Helpful, not required:* **Module 20 (MCP).** A skill's steps can call the real tools an MCP
  server exposes, which is where a playbook reaches beyond editing files into live systems.

---

## Learning objectives

By the end of this module you can:

1. Explain the difference between an **always-on instructions file (Module 5)** and a **skill**, and
   say when each is the right tool.
2. Write a skill: a structured, named, invokable playbook for a recurring task, in your tool's
   format-agnostic essentials (when-to-use, inputs, ordered steps, done-criteria).
3. Have the AI **execute** a skill end to end and verify it followed every step.
4. Keep skills in version control so a procedure is shareable, reviewable, and recoverable like any
   other artifact.
5. Recognize when a one-off prompt has earned promotion into a durable skill, and when it hasn't.

---

## Key concepts

### The pain: you keep narrating the same procedure

You've written the Module 5 instructions file, and it's working. The AI knows your layout, your test
command, your off-limits files. But there's a class of knowledge it doesn't cover: **multi-step
procedures you run again and again.**

"Add a new CLI command" is the canonical example. Done properly it's never one edit. It's: put the
logic in the right file, wire the CLI, write a test that actually checks the behavior, run the tests,
smoke-test the command, add a changelog line, commit it as one clean change. The AI can do every step.
But left to a bare prompt (*"add a `clear` command"*) it'll usually give you the code and forget the
test, or skip the changelog, or commit `tasks.json` along for the ride. So you spell out the seven
steps. It works. Next week you add another command and **you spell out the same seven steps again.**

That re-narration is the exact pain Module 1 named, one level up: not re-explaining the *project* each
session, but re-explaining the *procedure* each time you run it. A skill is where that procedure stops
being something you retype and becomes something the repo carries.

### What a skill is

A **skill** is a named, structured, invokable set of instructions for one repeatable procedure,
stored as a file in the repo and loaded **on demand** when that procedure is the task at hand.

Strip the vendor branding and every skill has the same four parts:

- **A name and a "when to use it."** So both you and the AI know which playbook applies and, just as
  importantly, when it *doesn't*.
- **Inputs.** The few things the procedure needs to be told (here: the command name and what it does).
- **Ordered steps.** The actual procedure: the commands, the files, the checks, in sequence, with the
  non-negotiables marked ("run the tests before claiming success," "don't stage `tasks.json`").
- **Done-criteria.** How the AI (and you) know it's actually finished, not just "produced something."

That's it. A skill is a checklist precise enough that an agent can execute it and you can verify it
did.

### Skill vs. the Module 5 instructions file

This is the distinction to lock in, because the two are siblings and easy to conflate:

| | **Committed instructions file (Module 5)** | **Skill (this module)** |
|---|---|---|
| Scope | How the project works, *in general* | How to do *one specific procedure* |
| When it loads | **Always on**: read every session | **On demand**: invoked when relevant |
| Shape | Ambient briefing: conventions, commands, don't-touch list | A playbook: when-to-use, inputs, ordered steps, done-criteria |
| Analogy | The standing house rules posted on the wall | A labeled recipe card you pull out when you cook that dish |

They're complementary. The instructions file is the right home for facts true *all the time* ("tests
run with `python3 -m unittest`"). A skill is the right home for a procedure you run *sometimes* ("here
is exactly how we add a command"). Module 5 even told you this was coming: start with the always-on
file; graduate a procedure into a skill when it earns its own page.

### Why "on demand" is the whole point

Module 5 warned that **bloat kills an instructions file**: a 300-line always-on briefing gets read
the way you read a terms-of-service. So you *can't* solve the re-narration problem by stuffing every
procedure into the always-on file; you'd drown the signal that makes it work.

A skill solves that. Because a skill loads only when its procedure is the task, you can write
it in full detail, every step and every guardrail, without taxing every unrelated session. Ten skills
cost the AI nothing on a session that invokes none of them. This is **progressive disclosure**: keep
the always-on context lean, and pull in the deep procedure exactly when it's needed. It's the same
reason you don't tape every recipe you own to the kitchen wall.

### Skills live in version control

This is what makes a skill more than a snippet in a notes app, and it's why this module sits where it
does in the course. A skill is a file in the repo, so everything you already learned about versioned
text applies to it directly:

- **Recoverable and historied (Module 2).** A skill has a `git log`. You can see when a step was added
  and why, and `git restore` a botched edit. The procedure is a checkpoint like any other.
- **Shareable (Modules 8 & 11).** Push the repo and the whole team, plus every agent that later
  operates on it, inherits the same playbook. Nobody runs their own private version of "how we add a
  command." It's the Module 5 anti-drift argument, applied to procedures.
- **Reviewable (Module 10).** Changing how the AI performs a procedure arrives as a **diff in a PR**.
  Tightening "add a test" into "add a test that asserts the end state, not just no-crash" is a
  reviewable change to your team's workflow, not an invisible tweak in one person's setup.

A prompt you keep in your head dies with the session. A skill in the repo is durable, shared
capability. That's the upgrade: from one-off prompting to a versioned, reviewable asset.

### Naming the pattern, not the vendor

"Skills" is one name for this. Tools also call them custom commands, slash commands, recipes, prompts,
playbooks, or modes, and they load them differently: some auto-discover a dedicated folder, some need
you to point at a file, some let your always-on instructions file say *"when asked to add a command,
follow `add-command.md`."* **The durable pattern is the same in all of them: a named, invokable file
of structured steps for a repeatable procedure, kept in the repo.** Learn the pattern; map it onto
whatever your tool calls it. As with everything in this course, the model and the tool are swappable;
the playbook you wrote is the part that lasts.

### Skills compose with your tools

A skill's steps aren't limited to editing files. They can drive the test runner, the CLI, Git, and,
once you have **Module 20's MCP** servers wired up, the real systems behind them (open the issue, hit
the staging API, query the database). A skill is where you encode *"use these hands, in this order, to
get this outcome."* The deeper your toolchain, the more a written playbook is worth, because there
are more steps to get wrong, and more value in getting them right every time.

---

## The AI angle

On paper this is just "write a runbook." The AI-specific twist is what changes the stakes:

- **The AI will execute the playbook, not just read it.** A runbook for a human is a reminder; a skill
  for an agent is something it *performs*. The precision pays off immediately: vague step, vague
  result; imperative step ("run `python3 -m unittest`; do not claim success until it's green"), reliable
  result.
- **The AI is confidently incomplete without one.** Asked to "add a command," it'll happily stop at
  the code and skip the test, the changelog, the clean commit, and sound finished doing it. The skill
  is how you make *complete* the default instead of a thing you have to keep catching.
- **The skill outlives the model.** Swap models next quarter and the playbook carries over unchanged.
  You encoded the *procedure*, not the prompt that happened to coax it out of this month's model. The
  workflow is the durable skill; the model is the swappable part; here, literally.

---

## Hands-on lab


> **Starting point (this lab is skip-friendly).** You do not need to have done the earlier labs.
> To begin from a clean, known state, copy this module's snapshot into a fresh `tasks-app` and
> make the first commit:
>
> ```bash
> mkdir -p ~/ai-workflow-course/tasks-app
> cp -r ~/ai-workflow-course/modules/21-skills-teaching-the-ai-your-playbook/lab/start/. ~/ai-workflow-course/tasks-app/
> cd ~/ai-workflow-course/tasks-app && git init -b main && git add -A && git commit -m "start: module 21"
> ```
>
> Already carrying your `tasks-app` from earlier modules? Keep using it and ignore this box.
**Lab language:** markdown (the skill file) plus shell and Python (the `tasks-app`). You'll write a
skill, then have your editor-integrated AI (Module 4) execute it.

You'll write a skill for the procedure from *Key concepts*, **add a new `tasks-app` command, end to
end: code + test + changelog + clean commit**, and then watch the AI run it on a command it's never
seen, producing all four parts without you listing the steps.

**You'll need:**

- Your agentic coding tool from Module 4, and knowledge of how it loads a procedure (a skills/commands
  folder it auto-discovers, or simply pointing it at a file by name; check its docs).
- A Python 3.10+ `tasks-app`. Use the snapshot in this module's `lab/tasks-app/` (it has `add`,
  `list`, `done`, `count`, a `test_tasks.py`, and a `CHANGELOG.md`), or carry forward your own from
  earlier modules. It should already be a Git repo from earlier modules; if you're starting fresh,
  ask Claude Code (`claude` in the project; sub your own agent) to initialize it and commit a
  baseline, then confirm with `git log` that the first commit landed.

### Part A: Install the skill

1. Copy this module's starter skill, `lab/add-command-skill.md`, into your `tasks-app` repo wherever
   your tool expects procedures. If your tool auto-discovers a folder, put it there under a clear name
   (e.g. `add-command.md`). If it doesn't, just drop it at the repo root and invoke it by name.

   ```bash
   cd ~/ai-workflow-course/tasks-app
   cp ~/ai-workflow-course/modules/21-skills-teaching-the-ai-your-playbook/lab/add-command-skill.md add-command.md
   ```

2. Read it. The whole file is short on purpose: when-to-use, inputs, seven ordered steps, and
   done-criteria. Confirm every project fact in it matches *your* app (test command, file names, the
   off-limits `tasks.json`). A skill with wrong facts misdirects the AI worse than no skill.

3. **Commit it.** This is the point: the procedure now lives in version control. Ask Claude Code
   (sub your own agent) to commit the new skill file with a message like "Add skill: add a tasks-app
   command end to end," then verify it landed:

   ```bash
   git log --oneline -1   # the skill commit, by name
   ```

### Part B: Invoke it

4. Start a **fresh** AI session in your editor and invoke the skill the way your tool does it: its
   slash command / skill name, or plainly: *"Follow `add-command.md` to add a `clear` command that
   removes all tasks."* Crucially, **don't list the steps yourself.** The skill is supposed to supply
   them.

5. Watch it perform the procedure. A correctly-followed skill will, without you saying any of it:
   - add `clear()` to `tasks.py` and wire a `clear` branch into `cli.py` (logic in the right file);
   - add a real test to `test_tasks.py` that asserts the list is empty afterward (not just "no crash");
   - run `python3 -m unittest` and show it green;
   - smoke-test `python3 cli.py clear` and show the output;
   - add a `CHANGELOG.md` line;
   - stage code + test + changelog into one commit, **without** `tasks.json`.

### Part C: Verify it followed the playbook

6. Don't take the AI's word for it. Check against the skill's own done-criteria:

   ```bash
   python3 -m unittest          # green, and a clear-related test is present
   python3 cli.py add "x" && python3 cli.py clear && python3 cli.py list   # -> (no tasks yet)
   git show --stat HEAD        # one commit: tasks.py, cli.py, test_tasks.py, CHANGELOG.md; no tasks.json
   ```

   If a step was skipped, that's the lab working: it shows you exactly where your wording was too soft.
   Tighten that line, have Claude Code (sub your own agent) commit the skill edit while you verify the
   diff, and run it again on a second command (`high <index>` to flag a task, say). **A skill you
   improve once and reuse forever is the deliverable**, not the one `clear` command.

### Part D: See it as a reviewable, reusable asset

7. Look at what you built:

   ```bash
   git log --oneline add-command.md   # the procedure's own history
   git log -p -- add-command.md        # full patch history: the file's creation, plus the Part C tighten if you made one
   ```

   (`git log -p` surfaces the skill's own patches no matter what you committed *after* tightening it,
   unlike `git diff HEAD~1`, which would be empty here because the most recent commit added the second
   *command*, not a change to the skill.) Each entry in that history *is* a change to how your team adds
   commands: readable, attributable, revertable. In a
   team repo (Modules 8, 11) it reaches everyone on `git pull`; behind review (Module 10) it lands as a
   PR someone approves. You've turned a procedure you used to narrate into a versioned capability.

---

## Where it breaks

- **A skill is guidance, not enforcement; same caveat as Module 5.** It strongly biases the AI; it
  doesn't bind it. The agent can still skip a step, especially a soft one, especially late in a long
  session. The steps that *can't* be skipped are the ones backed by **CI (Module 14)**: the test the
  skill tells it to write only gates anything once a pipeline runs it on every push. Write the
  done-criteria as hard checks, and let CI be the backstop.
- **Skills rot.** A playbook that says "tests run with X" after you've moved to Y will confidently
  march the AI off a cliff. Skills are code-adjacent: review them, update them, delete the ones you no
  longer run. Committing them (so changes are visible) is what makes that maintainable.
- **Don't skillify everything.** A skill earns its place when a procedure is *repeated*, *multi-step*,
  and *gets done wrong without one*. A one-off task doesn't need a playbook, and a pile of near-duplicate
  skills is its own kind of bloat: now you're maintaining ten files and the AI has to pick the right
  one. Promote a prompt to a skill the third time you've typed it, not the first.
- **Overlap with the always-on file causes drift.** If a fact lives in both your Module 5 instructions
  file *and* a skill, you'll eventually update one and not the other. Keep general facts in the
  always-on file and *reference* them from skills; don't duplicate them.
- **A skill is not a security boundary.** "Don't stage `tasks.json`" is a convention, not a permission.
  An installed third-party skill is untrusted code that runs against your repo; vetting, permissions,
  and prompt-injection defense are **Module 22's** job, immediately next, for exactly this reason.

---

## Check for understanding

**You're done when:**

- Your `tasks-app` repo has a committed skill file for "add a command," with `git log` showing the
  commit that added it.
- You've invoked that skill and watched a fresh AI session produce **all four** parts (code, a real
  test, a changelog entry, and one clean commit) *without you listing the steps that session*.
- You've verified it against the skill's done-criteria (tests green, command works, the commit
  contains the right files and not `tasks.json`) rather than trusting the AI's summary.
- You can state, in one sentence, when to put knowledge in the always-on instructions file (Module 5)
  versus a skill: general facts go in the file that's always read; a specific repeatable procedure goes
  in a playbook invoked on demand.

When adding the *next* command is "invoke the skill" instead of "re-explain the seven steps," the
playbook is doing its job. Module 22 comes next, and not by accident: Unit 4 just gave the AI hands,
MCP servers and skills, and the very next thing is securing them, because an installed skill or
server is untrusted code running in your environment.

---

## Verify-before-publish

This is expansion-zone material; the *concept* is durable but tool specifics drift. Re-check at build
time:

- [ ] **Skill terminology and mechanics.** Confirm how mainstream agentic tools name and load skills
      (skills / custom commands / slash commands / recipes / prompts), whether they auto-discover a
      folder or need an explicit pointer, and any required file format/frontmatter, without pinning
      the lesson to one vendor. Update the "Naming the pattern" paragraph if the common vocabulary has
      shifted.
- [ ] **No vendor leaked in.** Verify the module still names the *pattern*, not one implementation, and
      that the example skill format stays generic (when-to-use / inputs / steps / done-criteria).
- [ ] **Dependency chain intact.** Confirm Module 20 (MCP) and Module 22 (securing servers/skills) are
      still numbered as referenced, and that nothing here leans on a tool introduced after Module 20.
- [ ] **Lab still runs.** `python3 -m unittest` is green in `lab/tasks-app/`, and the `clear`-command
      walkthrough still matches the starter files (`add`/`list`/`done`/`count`, `test_tasks.py`,
      `CHANGELOG.md`).