Files
ai-workflow-course/modules/20-mcp-servers-giving-the-ai-hands/README.md
T
claude 2684095e2f Build out all 27 modules + capstone (#1)
Co-authored-by: claude <claude@jpaul.io>
Co-committed-by: claude <claude@jpaul.io>
2026-06-22 12:19:01 -04:00

426 lines
24 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Module 20 — MCP Servers: Giving the AI Hands
> **Until now the AI could read and write files in your repo and nothing else. MCP lets it reach
> your real tools, data, and systems — your task tracker, your database, your docs, your APIs —
> through a standard interface instead of working blind.** And because MCP is an open protocol, not
> a vendor feature, the connections you build outlive whichever model you're running.
---
## Prerequisites
- **Module 1** — the `tasks-app` running example, an editor, and a terminal. The lab gives the AI
hands on this exact app.
- **Module 2** — you read a project's state from Git and you trust `git restore` to undo a mess.
That safety net matters more here than anywhere so far: you're about to let the AI *act on real
systems*, not just edit files.
- **Module 4** — the AI lives in your editor or CLI (an "agentic tool") and edits files directly.
That same tool is the **MCP client** in this module; MCP is how you extend what it can reach.
- **Module 5** — you commit the AI's config to the repo. MCP server configuration is more config
worth committing, and the same "make it travel with the repo" instinct applies.
Helpful but not required: **Module 16** (containers) and **Module 17** (secrets) get referenced when
we talk about *where* a server runs and *what it's allowed to touch*. You can read this module
without them.
This is the opener of **Unit 4 — Extend the AI into your systems.** Units 13 got the AI safely
editing your code and shipping it. Unit 4 is about giving it reach beyond the repo.
---
## Learning objectives
By the end of this module you can:
1. Explain the MCP client/server model — what a server exposes (tools, resources, prompts), what the
client (your agentic tool) does, and why "it's a protocol, not a vendor feature" is the whole
point.
2. Connect an existing MCP server to your agentic tool and confirm the AI can call its tools.
3. Build a tiny MCP server in Python that exposes one real capability over the `tasks-app`, and wire
it into your tool.
4. Watch the AI *use* that server — read and change real state through a tool call — and verify the
effect outside the chat.
5. State precisely what MCP does and doesn't give you, including the one caveat this module
deliberately defers: **installing an MCP server is installing code that runs with access to your
systems** (handled in Module 22).
---
## Key concepts
### The wall the AI keeps hitting
Everything so far has given the AI exactly one kind of reach: **files in your repo.** Module 4 let
it read and write `cli.py`; Module 2 let it read your Git history. That's a lot — but watch where it
stops.
Ask your agentic tool, *"how many tasks are in my list and which are done?"* and it can answer,
because the data happens to live in a file it can read. Now ask it something one inch further out:
- *"How many active users signed up this week?"* — the answer is in a database it can't query.
- *"Is this docs page out of date versus the changelog?"* — the docs live in a system it can't read.
- *"File a ticket for this bug."* — the tracker is an API it can't call.
The AI's response to all three is some flavour of *"I can't access that, but here's a script you
could run"* — and you're back in the copy-paste loop from Module 1, just one level up. The model is
plenty smart enough to do the work. It's **blind and handless** beyond your files. It can reason
about your systems; it can't *touch* them.
You could solve this the bad way: paste a database dump into the chat, copy the AI's SQL out and run
it yourself, paste the results back. That's Module 1's seam all over again — you as the integration
layer, manually shuttling data between the AI and the real system. MCP exists to delete that loop.
### What MCP is
The **Model Context Protocol (MCP)** is an open standard for connecting AI applications to external
tools and data through a uniform interface. Two roles:
- An **MCP server** exposes capabilities — "here are the things I can do and the data I can provide."
- An **MCP client** (embedded in your agentic tool) discovers those capabilities and calls them on
the AI's behalf.
That's the entire shape: **servers offer, clients call.** Your editor-integrated AI tool is the
client. A small program you (or someone else) writes is the server. When the AI decides it needs to
add a task, the client calls the server's `add_task` tool, the server does the work against the real
system, and the result comes back into the AI's context. No pasting, no scripts you run by hand.
If you've ever written or consumed an HTTP API, the instinct transfers cleanly: a server advertises
a set of operations; a client calls them with arguments and gets structured results back. The
difference is what it's *for* — MCP is shaped specifically so an AI can **discover** what's available
at runtime (names, descriptions, argument schemas) and decide which call to make, rather than a human
reading docs and hardcoding the call.
### Why "a protocol, not a vendor feature" is the whole point
This is the course thesis showing up in the architecture itself. MCP is a **standard**, like HTTP or
SQL — not a button inside one company's product. The consequences are exactly the ones this course
keeps promising:
- **Write a server once; every compliant client can use it.** The `tasks` server you'll build in the
lab works with any agentic tool that speaks MCP — today's and next year's. You are not building for
a vendor; you're building for the protocol.
- **Swap the model underneath and your servers don't care.** The server exposes `add_task`; it has
no idea which model is on the other end of the client. Change models — which you will — and every
connection you built keeps working. That's the durable-skill payoff stated in Module 1, now load-
bearing instead of aspirational.
- **The ecosystem compounds.** Because it's a shared standard, there's a large and growing catalogue
of servers other people already wrote — for databases, cloud providers, ticket trackers, docs,
browsers, your own internal tools. Connecting one is usually configuration, not coding.
MCP originated with one vendor and was released as an open spec; it's since been adopted across major
AI tooling regardless of who makes the model. We name no vendor on purpose: the skill is "wire a
server to a client," and it's the same skill everywhere.
### What a server actually exposes: tools, resources, prompts
An MCP server can offer three kinds of things. You'll mostly care about the first:
- **Tools** — *actions the AI can take.* A tool is a named function with typed arguments and a
description: `add_task(title)`, `run_query(sql)`, `create_issue(title, body)`. The AI reads the
description, decides to call it, supplies the arguments, and gets a result. This is the "hands"
half of the module title — tools are how the AI *does* things. (Tools can have side effects: they
write to your database, hit your API, change real state. That power is exactly why Module 22
exists.)
- **Resources** — *data the AI can read.* Read-only context the server makes available: a file, a
database record, a docs page, the contents of a config. Where tools *do*, resources *inform*
they're how the AI gets eyes on a system, the parallel to "durable memory it can read" from
Module 2, extended past your repo.
- **Prompts** — *reusable prompt templates the server offers* for common operations against it (e.g.
"summarize this incident from these logs"). Useful, but the least-used of the three; don't worry
about them while you're learning.
For the lab you'll build **tools**, because tools are where MCP earns the module title. One function,
one decorator, and the AI has a new verb.
### How the client and server talk: transports
The client has to launch or reach the server and exchange messages with it. Two shapes dominate, and
the distinction is practical:
- **stdio (local).** The client launches the server as a subprocess on your machine and talks to it
over standard input/output — the same pipes a normal command-line program uses. This is the right
default for anything local: your `tasks` server, a server that reads your filesystem, one that
drives a local tool. No network, no ports, no auth to set up. **This is what the lab uses.**
- **HTTP-based (remote).** For a server running somewhere else — a shared internal service, a
vendor's hosted server — the client reaches it over HTTP. This is where authentication and network
access enter the picture, and where the security stakes climb.
You don't pick the transport at random; it follows from where the server runs. Local tool over a
real system on your box → stdio. Shared or third-party service → HTTP. (The exact name of the HTTP
transport in the spec has changed more than once — see *Verify-before-publish* — but the local-vs-
remote split is the durable idea.)
### Configuring a server: where the wiring lives
To connect a server, you tell your agentic tool how to start it (for stdio) or reach it (for HTTP).
Most tools read this from a small JSON config. The *de facto* common shape for a local server looks
like this:
```json
{
"mcpServers": {
"tasks": {
"command": "python",
"args": ["/absolute/path/to/tasks-app/tasks_mcp_server.py"]
}
}
}
```
Read it plainly: *"there's a server called `tasks`; to start it, run `python <that file>` and talk to
it over stdio."* That's the whole contract for a local server.
Two honest notes, both flowing from the course's core promises:
- **The filename and location of this config are tool-specific, and we won't pin them.** Some tools
keep it in a project file, some in a user-level file, some let you add servers from a UI. The
`mcpServers` *shape* above is widely shared, but check your tool's docs for where it reads it. The
principle — "a server is a name plus how to launch or reach it" — outlives any one tool's filename,
exactly like the committed-instructions file in Module 5.
- **This config is worth committing — with care.** A project-level MCP config means every teammate
and every agent that opens the repo gets the same tools wired up, which is the Module 5 instinct
applied one level out. But MCP config often points at paths or, for HTTP servers, endpoints and
credentials — and **credentials never go in the repo** (that's Module 17, and it's a hard rule).
Commit the wiring; keep the secrets in the environment.
### Where this is in the repo's reach, and where it's heading
Stack the units up and the picture is clear. Module 4 put the AI in your editor. This module gives
that same AI hands beyond the repo. The next three modules build directly on it:
- **Module 21 (Skills)** teaches the AI *playbooks* — repeatable procedures it runs your way. Skills
and MCP compose: MCP gives the AI the tools; a skill tells it *how and when* to use them.
- **Module 22 (Securing third-party MCP servers and skills)** handles the danger this module is
deliberately deferring (see *Where it breaks*). Read it before you install anything you didn't
write.
- **Module 23 (Working with existing codebases)** leans on MCP to give the AI real access to a large
repo and the systems around it, so it can orient before it changes anything.
---
## The AI angle
A generic integration course would teach you to wire systems together for *programs* to use —
fixed clients calling fixed endpoints. MCP is shaped for a different consumer: **an AI that decides
at runtime what it needs.** That changes what matters about the integration.
- **Discovery, not hardcoding.** A traditional client is written against specific API calls by a
human. An MCP client hands the AI a *menu* — tool names, descriptions, argument schemas — and the
AI picks. Which means the **description you write for a tool is part of the interface**: it's how
the model knows when to reach for `add_task` versus `list_tasks`. A vague docstring is a vague tool.
(You'll feel this in the lab — the docstrings on the server functions are not decoration; they're
what the AI reads.)
- **It closes Module 1's loop at the systems layer.** The original copy-paste pain was shuttling code
between a chat and a file. The same pain reappears one level out: shuttling *data* between the AI
and your database, your tracker, your docs. MCP is the editor-integration moment for systems — the
AI reaches them directly instead of you being the integration layer.
- **It's the model-agnostic bet made concrete.** Every other module argues the workflow outlasts the
model. MCP *is* that argument in protocol form: the server you write is bound to a standard, not a
model. Swap the model and your hands stay attached.
- **The reach is the risk.** The very thing that makes MCP powerful — real access to real systems —
is why it needs its own security module. An AI with hands can do real damage as easily as real
work. That's not a reason to avoid it; it's the reason Module 22 comes right after.
---
## Hands-on lab
**Lab language:** Python (a ~15-line MCP server) plus your agentic tool's config. Runs on your own
machine, any OS.
You'll do two things: **connect an existing MCP server** to confirm the client/server wiring works
at all, then **build your own tiny server** over the `tasks-app` and watch the AI use it. The second
is the one that lands the concept.
**You'll need:**
- The `tasks-app` from Module 1/2 (a folder with `tasks.py`, `cli.py`, and ideally a Git repo so you
can see and undo what the AI does — Module 2).
- Your agentic coding tool from Module 4, which is the **MCP client**. Find, in its docs, *where it
reads MCP server configuration* and *how it shows that a server is connected* (often a list of
connected servers or available tools).
- Python 3.10+ and the official MCP Python SDK: `pip install "mcp[cli]"`.
- The starter files in this module's `lab/` folder: `tasks_mcp_server.py` and
`mcp-config-example.json`.
### Part A — Connect an existing server (warm-up, ~10 min)
Before building anything, prove the plumbing works by connecting a server someone else already
wrote. The MCP ecosystem ships a set of **reference servers** (filesystem, fetch, git, and more) —
pick a simple, read-only one your tool's docs point you at (a "filesystem" or "fetch" server is a
good first choice).
1. Add the server to your tool's MCP config, following the tool's docs. Most reference servers are
launched the same stdio way as the JSON shape shown in *Key concepts* — a `command` and `args`.
2. Restart or reload your agentic tool so it picks up the config. Confirm it reports the server as
**connected** and lists its tools.
3. Ask the AI to do something only that server enables — e.g. with a fetch server, *"fetch
example.com and summarize it"*; with a filesystem server scoped to a folder, *"list the files in
that folder."* Watch the AI **call a tool** rather than tell you it can't.
That's the entire client/server loop, end to end, with zero code you wrote. Now make your own.
> **Stop before you install anything you don't fully trust.** A reference server from the protocol's
> own maintainers is a reasonable warm-up. A random server off the internet is untrusted code that
> will run with your permissions — vetting that is **Module 22's** job, and it's not optional. For
> now, stick to first-party reference servers or the one you write next.
### Part B — Build a one-tool server over the tasks-app
1. Copy this module's `lab/tasks_mcp_server.py` into your `tasks-app` folder, next to `tasks.py` and
`cli.py`. (It reuses `tasks.py` and shares the same `tasks.json`, so anything it changes shows up
in `python cli.py list`.) The whole server is two tools:
```python
@mcp.tool()
def list_tasks() -> str:
"""List every task in the tasks-app, with its index and whether it's done."""
return _load().render()
@mcp.tool()
def add_task(title: str) -> str:
"""Add a new task to the tasks-app. `title` is the text of the task to add."""
tlist = _load()
tlist.add(title)
_save(tlist)
return f"added: {title}"
```
That's it — a tool is a normal function plus the docstring the AI reads to decide when to use it.
2. Sanity-check it starts. From inside `tasks-app`:
```bash
pip install "mcp[cli]" # once
python tasks_mcp_server.py # it will sit there waiting for a client — that's correct
```
It looks like it's hanging. It isn't — a stdio server waits for a client on its stdin/stdout.
Press Ctrl-C; you don't run it by hand, the client launches it.
### Part C — Wire it into your agentic tool
3. Open `lab/mcp-config-example.json`. Copy the `tasks` entry into wherever your tool reads MCP
config, and replace the path with the **absolute** path to your `tasks_mcp_server.py`. (Use
`python3` or a venv's python if that's what runs the SDK on your system.)
```json
"tasks": {
"command": "python",
"args": ["/ABSOLUTE/PATH/TO/workflow-course/tasks-app/tasks_mcp_server.py"]
}
```
4. Reload your agentic tool and confirm it shows the `tasks` server **connected**, with `list_tasks`
and `add_task` among its available tools. If it doesn't connect, the usual culprits are a wrong
path, the wrong `python`, or the SDK not installed for that interpreter — check the tool's MCP
logs.
### Part D — Watch the AI use its new hands
5. In the AI chat, **don't** mention files or `tasks.json`. Ask in terms of the *system*:
> *"What's on my task list right now?"*
The AI should call `list_tasks` and answer from the live result — not from reading a file, not
from memory. Many tools show the tool call inline ("called `tasks.list_tasks`"); watch for it.
6. Now have it act:
> *"Add a task: review the Module 20 lab."*
It should call `add_task("review the Module 20 lab")`. Then **verify the effect outside the AI**,
which is the whole point — the change is real:
```bash
python cli.py list # the new task is there, because the server wrote the same tasks.json
git diff # the change shows up in your repo, exactly like any other edit (Module 2)
```
The AI just changed real state in a real system through a tool call. No copy-paste, no script you
ran by hand, no pasting `tasks.json` into a chat. That's "hands."
7. (Optional, to feel the discovery point.) Edit the docstring on `add_task` to be vague — change it
to just `"""Adds something."""` — reload, and try the same request. Notice the AI gets *less*
reliable about choosing the tool. The description is part of the interface; the model reads it to
decide. Restore the good docstring.
---
## Where it breaks
The honest caveats — and one of them is large enough that it gets its own module.
- **Installing an MCP server is installing code that runs with your access — and this module does not
secure it.** A server you connect runs on your machine (stdio) or is trusted by your client (HTTP),
with whatever permissions you give it: your files, your network, your credentials. A malicious or
compromised server is malware with an AI driving it, and a server's tool descriptions can even
carry instructions that try to steer the model (prompt injection). **This module deliberately
stops here.** The attack surface — vetting servers, pinning versions, least-privilege, prompt
injection — is **Module 22 (Securing Third-Party MCP Servers and Skills)**, and you should treat
it as required reading before connecting anything you didn't write. In this module: only first-
party reference servers and the one you build yourself.
- **A tool with side effects can do real damage as easily as real work.** Your `add_task` writes to
real state. A `run_query` or `delete_user` tool does too. An AI that confidently calls the wrong
tool with the wrong arguments isn't a typo in a file you can `git restore` — it might be a row
deleted from a database Git never backed up (Module 12's limit). Keep destructive tools behind
confirmation, scope them narrowly, and lean on the safety net: do this against test data first.
- **The AI still has to *choose* the tool correctly.** MCP gives the model hands; it doesn't give it
judgment. It can call the wrong tool, pass bad arguments, or ignore a perfectly good tool and
hallucinate an answer instead. Good tool names and descriptions reduce this a lot (Part D step 7);
they don't eliminate it.
- **More servers, more tools, more noise.** Every connected tool is something the model has to
consider on every turn. Wire up thirty tools and you dilute the model's attention and slow it down.
Connect what a task needs; disconnect what it doesn't. (This is the MCP echo of Module 5's "bloat
kills it.")
- **The spec and SDKs move fast.** This is expansion-zone material. Transport names, SDK APIs, and
config conventions have all churned and will again. The *client/server, servers-offer-clients-call*
model is durable; specific commands and field names are not — verify them at build time.
- **stdio servers are local-only by nature.** The lab's server runs on your machine for you. Sharing
a server with a team, or reaching one that needs to run elsewhere, means the HTTP transport, which
drags in auth, network access, and the containerization story from Module 16. Don't reach for that
until you need it.
---
## Check for understanding
**You're done when:**
- You connected an **existing** MCP server to your agentic tool and watched the AI call one of its
tools (Part A).
- You built `tasks_mcp_server.py`, wired it into your tool, and saw the `tasks` server report as
connected with `list_tasks` and `add_task` available.
- You asked the AI a question and it answered by **calling a tool** against the live system, and you
asked it to add a task and then **verified the change outside the AI** with `python cli.py list`
and `git diff`.
- You can explain the client/server model in one breath — *servers expose tools/resources/prompts;
the client (your agentic tool) discovers and calls them on the AI's behalf* — and why "it's a
protocol, not a vendor feature" means your server survives a model swap.
- You can state the one caveat this module defers: connecting an MCP server is running code with
access to your systems, and **Module 22** is where that risk gets handled.
When "the AI can't reach that system" stops being a wall and becomes "so I'll give it a tool," you've
got it. Module 21 takes the next step: teaching the AI the *playbook* for using these hands well.
---
## Verify-before-publish
MCP is moving fast; re-check these at build/publish time rather than trusting this draft:
- [ ] **Python SDK install + API.** Confirm `pip install "mcp[cli]"` is still the package, and that
`from mcp.server.fastmcp import FastMCP`, the `@mcp.tool()` decorator, and `mcp.run()` are
still the current FastMCP surface. Run `tasks_mcp_server.py` end to end against a real client.
- [ ] **Transport naming.** The HTTP transport has been renamed in the spec before (an SSE-based
transport gave way to a "streamable HTTP" one). Verify the current name and any deprecation
before describing remote transports.
- [ ] **The `mcpServers` config shape.** Confirm it's still the widely-shared convention for stdio
servers, and that the `command`/`args` fields are current. Keep the lesson tool-agnostic about
*where* the config file lives.
- [ ] **Reference servers (Part A).** Verify which first-party reference servers exist and how
they're launched today; the catalogue and launch commands change. Don't name a specific server
that may have moved or been retired without checking.
- [ ] **Adoption framing.** Re-confirm the "open standard, adopted across vendors regardless of
model" claim is still accurate and still vendor-neutral; update if the ecosystem has shifted.