fix(M7-27+capstone): apply AI-drives-git reframe, lesson=theory, de-slop course-wide
Phase 2 sweep — all modules are post-pivot, so the learner directs the AI agent
(Claude Code as the worked example) to do the git/setup work and verifies, instead
of typing commands by hand; no re-teaching basics. Lesson sections are theory with
example output; all execution lives in the labs. De-slopped ("prose" etc. gone
course-wide, em-dash density thinned). /path/to placeholders -> ~/ai-workflow-course.
Every deliberate teaching device verified intact: M10 ai-change.patch trap,
M12 bad-clear-snippet, M13/M27 planted pending_count bug, M15 secret+typosquat+MD5,
M18 BREAK=1, M21 absent-.gitignore, M22 poisoned skill, M24 no-op patch, M25 --simulate.
Labs compile/parse (py/sh/yaml/json); no junk.
Closes #83
Closes #86
Closes #89
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TfzV5QvtPDz8LJS3Pu5VLT
This commit is contained in:
@@ -1,6 +1,6 @@
|
||||
# Module 18 — Continuous Delivery and Deployment
|
||||
|
||||
> **Merged isn't running.** This module closes the last gap in the pipeline — getting approved code
|
||||
> **Merged isn't running.** This module closes the last gap in the pipeline: getting approved code
|
||||
> from `main` to something actually serving traffic, automatically, with a way back when it's wrong.
|
||||
|
||||
---
|
||||
@@ -51,14 +51,15 @@ Walk the pipeline you've built so far. A change gets proposed (Module 9), implem
|
||||
(Module 15). It merges. `main` is now correct, tested, and clean.
|
||||
|
||||
And then nothing happens. The code that's "done" is sitting in a Git history. The thing your users
|
||||
touch is still running last week's version. Somebody — usually you, usually at 6pm — has to SSH in,
|
||||
touch is still running last week's version. Somebody (usually you, usually at 6pm) has to SSH in,
|
||||
pull, build, restart, and pray. That manual last mile is where most outages are actually born:
|
||||
inconsistent steps, a forgotten config flag, a half-restarted service, "wait, which version is in
|
||||
prod right now?"
|
||||
|
||||
CI answered *"is this change good?"* CD answers the next question: ***"now get the good change
|
||||
running, the same way every time."*** It's the same instinct that made CI worth it — replace an
|
||||
error-prone manual ritual with an automated, repeatable one — pointed at the last step.
|
||||
running, the same way every time."*** It's the same instinct that made CI worth it, the one that
|
||||
replaces an error-prone manual ritual with an automated, repeatable one, now pointed at the last
|
||||
step.
|
||||
|
||||
### Delivery vs. deployment: the distinction that matters
|
||||
|
||||
@@ -145,17 +146,17 @@ A deploy that can't tell whether it worked isn't a deploy, it's a gamble. The si
|
||||
thing CD adds over "SSH in and restart" is that **the pipeline verifies the new version is alive
|
||||
before trusting it, and reverses itself when it isn't.**
|
||||
|
||||
A health check is a cheap, honest signal that the new version is actually serving — typically an
|
||||
A health check is a cheap, honest signal that the new version is actually serving: typically an
|
||||
endpoint like `/health` that returns `200` only when the app has started clean. The deploy step
|
||||
hits it after starting the new version and **waits for green before cutting over.**
|
||||
|
||||
Rollback is the other half: if the health check fails, the deploy stops the broken new version and
|
||||
Rollback is the other half. If the health check fails, the deploy stops the broken new version and
|
||||
brings the **previous known-good image tag** back up. Because you deploy immutable tags, rollback is
|
||||
trivial — you still have `tasks-app:<previous-sha>`, so "go back" is just "run the old tag again."
|
||||
trivial: you still have `tasks-app:<previous-sha>`, so "go back" is just "run the old tag again."
|
||||
No rebuild, no git revert race, no scramble. (Reverting the *source* is still Module 12's job for the
|
||||
code; rollback here is about the *running artifact*.) The strategies have names you'll meet —
|
||||
blue-green (run old and new side by side, flip a switch), canary (send 5% of traffic to new, watch,
|
||||
ramp) — but they're all variations on "keep the old one ready until the new one proves itself."
|
||||
code; rollback here is about the *running artifact*.) The strategies have names you'll meet:
|
||||
blue-green (run old and new side by side, flip a switch) and canary (send 5% of traffic to new,
|
||||
watch, ramp). They're all variations on "keep the old one ready until the new one proves itself."
|
||||
|
||||
> **Reframe for the ops reader:** you already know this instinct. It's the deployment equivalent of
|
||||
> a maintenance window with a back-out plan — except the back-out plan is automated, tested on every
|
||||
@@ -172,7 +173,7 @@ the merged-to-prod gate.
|
||||
AI writes and ships changes dramatically faster. More PRs open, more merge, and they merge sooner.
|
||||
That's the upside — and it means the volume of code flowing toward production goes *up*, while the
|
||||
human attention available to babysit each deploy stays flat. The gap between "merged" and "in prod"
|
||||
stops being a quiet formality and becomes the place where the speed either pays off or hurts you.
|
||||
stops being a quiet formality and becomes the place where that speed either pays off or hurts you.
|
||||
|
||||
Two consequences follow, and they pull in opposite directions:
|
||||
|
||||
@@ -180,10 +181,10 @@ Two consequences follow, and they pull in opposite directions:
|
||||
the manual last mile becomes the bottleneck that eats all the speed AI just gave you. CD is what
|
||||
lets the throughput actually reach users.
|
||||
- **The gate matters more.** Faster shipping of code that *looks right* (the recurring AI failure
|
||||
mode from Modules 1 and 14) means a bad change reaches prod faster too — unless something catches
|
||||
mode from Modules 1 and 14) means a bad change reaches prod faster too, unless something catches
|
||||
it. This is the crucial point: **continuous deployment is only survivable because of the gates in
|
||||
front of it.** Review (Module 10), CI tests (Module 14), and security scanning (Module 15) are not
|
||||
bureaucracy you tolerate — they are the *entire reason* you're allowed to remove the human from the
|
||||
bureaucracy you tolerate. They are the *entire reason* you're allowed to remove the human from the
|
||||
deploy button. Take auto-deploy without those gates and you've built a machine that ships AI
|
||||
mistakes to production at full speed.
|
||||
|
||||
@@ -214,7 +215,9 @@ account. The five deploy steps are real; only the *target* is your laptop instea
|
||||
`docker info` first, or `deploy.sh`'s build step fails with "Cannot connect to the Docker daemon."
|
||||
- The `tasks-app` from Modules 1–2, now a Git repo.
|
||||
- `curl` (for the health check) and a bash-capable shell. On Windows, use WSL or Git Bash.
|
||||
- Your AI assistant — by now, ideally editor-integrated (Module 4).
|
||||
- Claude Code (sub your own agent), editor-integrated as of Module 4. From here you **direct it** to
|
||||
do the setup, commit, build, and deploy work, then you **verify** the result; you don't type those
|
||||
commands by hand.
|
||||
|
||||
Starter files are in this module's `lab/` folder:
|
||||
|
||||
@@ -229,11 +232,13 @@ Starter files are in this module's `lab/` folder:
|
||||
|
||||
A CLI that exits immediately is awkward to "deploy." Give the app a long-running face.
|
||||
|
||||
1. Copy `lab/serve.py` and `lab/Dockerfile` into your `tasks-app` folder next to `tasks.py` and
|
||||
`cli.py`. Read `serve.py` — it's ~40 lines wrapping the `TaskList` you already have in a stdlib
|
||||
HTTP server with two routes: `/health` and `/tasks`.
|
||||
1. Direct Claude Code to bring the starter files into your `tasks-app` folder next to `tasks.py` and
|
||||
`cli.py`: *"Copy `serve.py`, `Dockerfile`, and `deploy.sh` from this module's `lab/` into the
|
||||
tasks-app folder."* Then **read `serve.py` yourself** — it's ~40 lines wrapping the `TaskList` you
|
||||
already have in a stdlib HTTP server with two routes, `/health` and `/tasks`. Verify the three
|
||||
files landed next to `tasks.py`/`cli.py`.
|
||||
|
||||
2. Run it locally first, no container, to see it work:
|
||||
2. Run the service locally first, no container, to see it work:
|
||||
|
||||
```bash
|
||||
python serve.py # serves on http://localhost:8000
|
||||
@@ -246,51 +251,52 @@ A CLI that exits immediately is awkward to "deploy." Give the app a long-running
|
||||
curl localhost:8000/tasks # your tasks as JSON
|
||||
```
|
||||
|
||||
Stop it with Ctrl-C. Commit this (`git add . && git commit -m "Add HTTP service + Dockerfile"`).
|
||||
Stop it with Ctrl-C. Now have Claude Code commit the new files: *"Stage and commit the HTTP
|
||||
service and Dockerfile with a clear message."* **Verify** the commit before moving on — read the
|
||||
diff it staged and confirm no secret, state file, or junk got swept in (it should be just
|
||||
`serve.py`, `Dockerfile`, and `deploy.sh`).
|
||||
|
||||
### Part B — Build and tag the artifact
|
||||
|
||||
3. Build the image and tag it with the current commit SHA — the immutable, traceable tag:
|
||||
3. Have Claude Code build the image and tag it with the current commit SHA, the immutable, traceable
|
||||
tag: *"Build the container image and tag it with the short commit SHA and also `:latest`."*
|
||||
Getting the SHA is git work the agent drives. **Verify** the result yourself:
|
||||
|
||||
```bash
|
||||
SHA=$(git rev-parse --short HEAD)
|
||||
docker build -t tasks-app:$SHA -t tasks-app:latest .
|
||||
docker images tasks-app # see both tags pointing at one image
|
||||
docker images tasks-app # both tags point at one image; note the SHA
|
||||
```
|
||||
|
||||
That `:$SHA` tag is the unit of deploy. Everything downstream refers to *this exact image*.
|
||||
That `:<sha>` tag is the unit of deploy. Everything downstream refers to *this exact image*.
|
||||
|
||||
### Part C — Deploy it (with a net)
|
||||
|
||||
4. Read `lab/deploy.sh`. It does the five steps: stops any running `tasks-app` container, starts the
|
||||
new image with runtime config injected as env vars (Module 17 — note the `APP_VERSION` and the
|
||||
*absence* of any secret baked into the image), polls `/health` until green, and on failure rolls
|
||||
back to the previous tag it recorded. Make it executable and run it:
|
||||
4. **Read `lab/deploy.sh` yourself** before running it. It does the five steps: stops any running
|
||||
`tasks-app` container, starts the new image with runtime config injected as env vars (Module 17,
|
||||
note the `APP_VERSION` and the *absence* of any secret baked into the image), polls `/health`
|
||||
until green, and on failure rolls back to the previous tag it recorded.
|
||||
|
||||
```bash
|
||||
chmod +x deploy.sh
|
||||
./deploy.sh $SHA
|
||||
```
|
||||
|
||||
Watch it build, run, health-check, and report the deploy healthy. Hit it:
|
||||
Now direct Claude Code to run the deploy against the SHA you just built: *"Run `deploy.sh` for the
|
||||
current commit SHA and report whether it came up healthy."* The agent makes the script executable
|
||||
and runs it. **Verify** the deploy yourself:
|
||||
|
||||
```bash
|
||||
curl localhost:8000/health # now reports the SHA you deployed
|
||||
```
|
||||
|
||||
Run `./deploy.sh` again after another commit and notice it records the prior version as the
|
||||
Ask the agent to commit a trivial change and deploy again, then read back what it recorded as the
|
||||
rollback target. You now have continuous *delivery* in miniature: one command turns a commit into
|
||||
a running, version-tagged service.
|
||||
|
||||
### Part D — Break a deploy and watch it roll back
|
||||
|
||||
5. Now prove the net works. The service honors a `BREAK=1` env var that makes `/health` return `500`
|
||||
— a stand-in for "this build starts but is actually broken." Deploy a healthy version first so
|
||||
there's a known-good to fall back to, then force a bad one:
|
||||
5. Now prove the net works. The service honors a `BREAK=1` env var that makes `/health` return
|
||||
`500`, a stand-in for "this build starts but is actually broken." First have the agent deploy a
|
||||
healthy version so there's a known-good to fall back to, then trigger the broken one yourself so
|
||||
you watch it happen:
|
||||
|
||||
```bash
|
||||
./deploy.sh $SHA # healthy baseline
|
||||
BREAK=1 ./deploy.sh $SHA # same image, but the new instance fails its health check
|
||||
./deploy.sh # healthy baseline (defaults to the current commit SHA)
|
||||
BREAK=1 ./deploy.sh # same image, but the new instance fails its health check
|
||||
```
|
||||
|
||||
The script starts the "new" version, the health check fails, and it **automatically stops the
|
||||
@@ -300,7 +306,7 @@ A CLI that exits immediately is awkward to "deploy." Give the app a long-running
|
||||
curl localhost:8000/health # ok — the bad deploy reverted itself
|
||||
```
|
||||
|
||||
That automatic reversal — not the build, not the run — is the part that makes auto-deploy
|
||||
That automatic reversal, not the build and not the run, is the part that makes auto-deploy
|
||||
something you can sleep through.
|
||||
|
||||
### Part E — Wire it into the pipeline (read + reason)
|
||||
@@ -312,9 +318,9 @@ A CLI that exits immediately is awkward to "deploy." Give the app a long-running
|
||||
|
||||
7. Find the one line that is the delivery-vs-deployment switch — the deploy-to-prod step gated behind
|
||||
a manual approval (`environment:` with a required reviewer, commented in the file). Decide, for
|
||||
the `tasks-app`, which side you'd choose and why, and ask your AI assistant to make the case for
|
||||
the *other* choice. The goal isn't a "right" answer; it's being able to articulate the risk
|
||||
posture either way.
|
||||
the `tasks-app`, which side you'd choose and why, and ask Claude Code to make the case for the
|
||||
*other* choice. The goal isn't a "right" answer; it's being able to articulate the risk posture
|
||||
either way.
|
||||
|
||||
> **A note on running the full pipeline:** actually executing `cd-starter.yml` end to end needs a
|
||||
> forge with a container registry and a deploy target wired up — that's environment-specific and
|
||||
|
||||
Reference in New Issue
Block a user