feat(course): build out all 27 modules, capstone, scaffold, and conventions

Scaffold the course repo and author the full curriculum in dependency-chain
order, following the settled build decisions in handoff.md.

- Scaffold: course README, vendor-neutral AGENTS.md (dogfoods Module 5),
  _TEMPLATE.md (the fixed 9-section module shape), root .gitignore, ship config.
- Modules 1-2: reference exemplars (locked for tone/depth/lab style).
- Modules 3-27: full lessons + runnable labs, each following the template,
  respecting the chain, vendor/model-agnostic, with "feel the pain" labs.
- Module 8 hosting comparison web-researched and date-stamped (as of 2026-06-22),
  not written from memory; expansion-zone modules carry Verify-before-publish.
- Capstone: the full loop end to end on the running tasks-app example.

Lab code syntax-checked (Python/shell/YAML); every module has the 7 core
template sections.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TfzV5QvtPDz8LJS3Pu5VLT
This commit is contained in:
2026-06-22 12:18:30 -04:00
parent 4bd586bbd0
commit fbec36cb67
117 changed files with 15131 additions and 1 deletions
@@ -0,0 +1,384 @@
# Module 18 — Continuous Delivery and Deployment
> **Merged isn't running.** This module closes the last gap in the pipeline — getting approved code
> from `main` to something actually serving traffic, automatically, with a way back when it's wrong.
---
## Prerequisites
- **Module 10 — Reviewing Code You Didn't Write.** The PR review gate. Auto-deploy is only safe
because a human (or an agent under supervision) signed off on the diff first.
- **Module 14 — Continuous Integration.** You already have a pipeline that lints, builds, and tests
on every push. CD is not a new system — it's **more stages on that same pipeline**, after the
checks pass.
- **Module 15 — Security Scanning.** Dependency, secret, and static-analysis gates on the same
pushes. These are part of what makes shipping without a human in the loop survivable.
- **Module 16 — Containers and Reproducible Environments.** The container image is *what you ship*.
CD takes that image and runs it somewhere. This module assumes you can already build and tag an
image of the `tasks-app`.
- **Module 17 — Secrets, Config, and Environments.** A running service needs configuration and
secrets at runtime — *what it needs to run*. CD wires those into the deploy step instead of baking
them into the image.
If you've done 1417, you have all the parts. This module is the assembly.
---
## Learning objectives
By the end of this module you can:
1. State the precise difference between continuous **delivery** and continuous **deployment**, and
decide which one a given project should use.
2. Extend your CI pipeline with build-and-publish stages that turn a merge into a versioned,
deployable artifact.
3. Wire a deploy step that takes that artifact, injects runtime config/secrets, and brings up the
new version — provider-neutrally.
4. Add a health check and an automatic **rollback** so a bad deploy reverts itself instead of
staying down.
5. Reason about the deploy gate the way this audience already reasons about change windows: what's
automated, what's manual, and where the stop button is.
---
## Key concepts
### The gap nobody automated yet
Walk the pipeline you've built so far. A change gets proposed (Module 9), implemented on a branch
(Module 6), reviewed as a PR (Module 10), checked by CI (Module 14), scanned for vulnerabilities
(Module 15). It merges. `main` is now correct, tested, and clean.
And then nothing happens. The code that's "done" is sitting in a Git history. The thing your users
touch is still running last week's version. Somebody — usually you, usually at 6pm — has to SSH in,
pull, build, restart, and pray. That manual last mile is where most outages are actually born:
inconsistent steps, a forgotten config flag, a half-restarted service, "wait, which version is in
prod right now?"
CI answered *"is this change good?"* CD answers the next question: ***"now get the good change
running, the same way every time."*** It's the same instinct that made CI worth it — replace an
error-prone manual ritual with an automated, repeatable one — pointed at the last step.
### Delivery vs. deployment: the distinction that matters
These two terms get used interchangeably and they are not the same thing. The difference is exactly
one decision: **who pushes the button to prod.**
- **Continuous Delivery** — every merge to `main` automatically produces a **deployable artifact**
(a built, tagged, tested container image, sitting in a registry) and deploys it as far as a
staging/pre-prod environment. Production deploy is **one click by a human**. The pipeline
guarantees the artifact is *ready to ship at any moment*; a person decides *when*.
- **Continuous Deployment** — same pipeline, but there's **no button**. If it passes every gate, it
goes all the way to production automatically. Merge is the last human action.
```
merge to main
┌─────────────┴──────────────┐
CONTINUOUS DELIVERY CONTINUOUS DEPLOYMENT
│ │
build + test + scan build + test + scan
│ │
publish artifact publish artifact
│ │
deploy to staging deploy to staging
│ │
[human clicks "ship"] ──► deploy to prod (automatic)
│ │
deploy to prod done
```
Both are "CD." When someone says "we do CD," ask which one — the operational risk is completely
different. Continuous deployment is not the more advanced/better option you graduate to; it's a
different risk posture that's appropriate for some systems and reckless for others. A blog,
internal dashboard, or stateless web service with good tests is a fine candidate. A billing engine,
a database migration, or anything with a regulatory change-control requirement usually is not — and
"a human clicks deploy" is a perfectly mature answer there, not a failure to automate.
The honest default for most teams adopting this: **start with continuous *delivery*.** Get the
artifact and the deploy step fully automated and trustworthy, keep the human on the prod button, and
remove that button only once you trust the gates more than you trust the click.
### The artifact is the unit of deploy
Here's the discipline that makes CD reliable, and it comes straight from Module 16: **you deploy a
built image, not a Git ref.** "Deploy `main`" is ambiguous — it means "go to the prod box, pull,
and rebuild," and that rebuild can pull a different base image or dependency version than CI tested.
"Deploy `tasks-app:9f3a2c1`" is not ambiguous. It's the exact bytes CI built and tested.
So the build-and-publish stage does this once, centrally:
1. Build the image from the merged code.
2. Tag it with something **immutable and traceable** — the Git commit SHA is the standard choice
(`tasks-app:9f3a2c1`). Optionally also a moving tag like `:latest` or `:staging` for convenience,
but the SHA tag is the one you trust.
3. Push it to a container registry — the durable, shared home for images, the same way a Git remote
(Module 8) is the durable home for commits.
Every later deploy — to staging, to prod, a rollback — just says "run *this* tag." Build once, run
the identical artifact everywhere. That single property is what kills "works on my machine" at the
deploy layer.
### The deploy step, provider-neutrally
The shape of a deploy is the same everywhere, whatever the target — a cloud platform, a Kubernetes
cluster, a single VM, a PaaS:
1. **Pull** the specific image tag onto the target.
2. **Inject runtime config and secrets** (Module 17) — environment variables, mounted secret files,
a secrets-manager lookup. Never baked into the image; supplied at run time so the *same* image
runs in staging and prod with different config.
3. **Start the new version** alongside or in place of the old one.
4. **Health-check** it before sending real traffic.
5. **Cut over** if healthy; **roll back** if not.
This module is deliberately provider-agnostic on *where* — the same way Module 8 stayed neutral on
hosts. The mechanics differ (a `kubectl` apply, a platform CLI, a `docker run`, a `compose up`), but
the five steps don't. The lab does the simplest possible real version: a local container run. The
logic is identical at scale.
### Health checks and rollback: the part beginners skip
A deploy that can't tell whether it worked isn't a deploy, it's a gamble. The single most important
thing CD adds over "SSH in and restart" is that **the pipeline verifies the new version is alive
before trusting it, and reverses itself when it isn't.**
A health check is a cheap, honest signal that the new version is actually serving — typically an
endpoint like `/health` that returns `200` only when the app has started clean. The deploy step
hits it after starting the new version and **waits for green before cutting over.**
Rollback is the other half: if the health check fails, the deploy stops the broken new version and
brings the **previous known-good image tag** back up. Because you deploy immutable tags, rollback is
trivial — you still have `tasks-app:<previous-sha>`, so "go back" is just "run the old tag again."
No rebuild, no git revert race, no scramble. (Reverting the *source* is still Module 12's job for the
code; rollback here is about the *running artifact*.) The strategies have names you'll meet —
blue-green (run old and new side by side, flip a switch), canary (send 5% of traffic to new, watch,
ramp) — but they're all variations on "keep the old one ready until the new one proves itself."
> **Reframe for the ops reader:** you already know this instinct. It's the deployment equivalent of
> a maintenance window with a back-out plan — except the back-out plan is automated, tested on every
> single deploy, and takes seconds instead of a panicked hour. CD doesn't remove the discipline you
> already have; it encodes it so it runs every time instead of only when someone remembers.
---
## The AI angle
CI existed long before AI, and so did CD. What changed is the **rate**, and rate is everything for
the merged-to-prod gate.
AI writes and ships changes dramatically faster. More PRs open, more merge, and they merge sooner.
That's the upside — and it means the volume of code flowing toward production goes *up*, while the
human attention available to babysit each deploy stays flat. The gap between "merged" and "in prod"
stops being a quiet formality and becomes the place where the speed either pays off or hurts you.
Two consequences follow, and they pull in opposite directions:
- **Automating the deploy matters more.** If a human has to hand-deploy every AI-generated change,
the manual last mile becomes the bottleneck that eats all the speed AI just gave you. CD is what
lets the throughput actually reach users.
- **The gate matters more.** Faster shipping of code that *looks right* (the recurring AI failure
mode from Modules 1 and 14) means a bad change reaches prod faster too — unless something catches
it. This is the crucial point: **continuous deployment is only survivable because of the gates in
front of it.** Review (Module 10), CI tests (Module 14), and security scanning (Module 15) are not
bureaucracy you tolerate — they are the *entire reason* you're allowed to remove the human from the
deploy button. Take auto-deploy without those gates and you've built a machine that ships AI
mistakes to production at full speed.
So the AI-era posture is specific: **strengthen the early gates, then automate the late ones.** The
more you trust review + CI + scanning, the further right you can safely push automation — up to and
including no human on the prod button. The strength of the gates is the dial that decides whether
continuous *deployment* is responsible or reckless for a given repo. And when an agent itself is the
one merging (Unit 5), this stops being theoretical: the deploy gate is the last thing standing
between an autonomous contributor and your users.
---
## Hands-on lab
**Lab language:** shell, driving the container tooling from Module 16. You'll extend the `tasks-app`
into a tiny running service, then build a deploy script that ships it locally with a health check and
automatic rollback — the whole CD motion, simulated on your own machine.
This lab simulates deployment with a **local container run** so it works on any machine with no cloud
account. The five deploy steps are real; only the *target* is your laptop instead of a server.
**You'll need:**
- A container runtime from Module 16 — Docker or Podman. (Commands below use `docker`; if you run
Podman, `alias docker=podman` or substitute.)
- The `tasks-app` from Modules 12, now a Git repo.
- `curl` (for the health check) and a bash-capable shell. On Windows, use WSL or Git Bash.
- Your AI assistant — by now, ideally editor-integrated (Module 4).
Starter files are in this module's `lab/` folder:
- `serve.py` — turns the `tasks-app` into a minimal HTTP service with a `/health` endpoint, using
only the Python standard library (no dependencies). This is the long-running thing CD deploys.
- `Dockerfile` — the Module 16 container image, adjusted to run the service.
- `deploy.sh` — the deploy step: build, tag, run, health-check, cut over or roll back.
- `cd-starter.yml` — the CD pipeline stages, written as GitHub Actions and extending the Module 14
CI file. GitLab/other-forge notes are in the comments.
### Part A — Make something worth deploying
A CLI that exits immediately is awkward to "deploy." Give the app a long-running face.
1. Copy `lab/serve.py` and `lab/Dockerfile` into your `tasks-app` folder next to `tasks.py` and
`cli.py`. Read `serve.py` — it's ~40 lines wrapping the `TaskList` you already have in a stdlib
HTTP server with two routes: `/health` and `/tasks`.
2. Run it locally first, no container, to see it work:
```bash
python serve.py # serves on http://localhost:8000
```
In another terminal:
```bash
curl localhost:8000/health # {"status": "ok", "version": "dev"}
curl localhost:8000/tasks # your tasks as JSON
```
Stop it with Ctrl-C. Commit this (`git add . && git commit -m "Add HTTP service + Dockerfile"`).
### Part B — Build and tag the artifact
3. Build the image and tag it with the current commit SHA — the immutable, traceable tag:
```bash
SHA=$(git rev-parse --short HEAD)
docker build -t tasks-app:$SHA -t tasks-app:latest .
docker images tasks-app # see both tags pointing at one image
```
That `:$SHA` tag is the unit of deploy. Everything downstream refers to *this exact image*.
### Part C — Deploy it (with a net)
4. Read `lab/deploy.sh`. It does the five steps: stops any running `tasks-app` container, starts the
new image with runtime config injected as env vars (Module 17 — note the `APP_VERSION` and the
*absence* of any secret baked into the image), polls `/health` until green, and on failure rolls
back to the previous tag it recorded. Make it executable and run it:
```bash
chmod +x deploy.sh
./deploy.sh $SHA
```
Watch it build, run, health-check, and report the deploy healthy. Hit it:
```bash
curl localhost:8000/health # now reports the SHA you deployed
```
Run `./deploy.sh` again after another commit and notice it records the prior version as the
rollback target. You now have continuous *delivery* in miniature: one command turns a commit into
a running, version-tagged service.
### Part D — Break a deploy and watch it roll back
5. Now prove the net works. The service honors a `BREAK=1` env var that makes `/health` return `500`
— a stand-in for "this build starts but is actually broken." Deploy a healthy version first so
there's a known-good to fall back to, then force a bad one:
```bash
./deploy.sh $SHA # healthy baseline
BREAK=1 ./deploy.sh $SHA # same image, but the new instance fails its health check
```
The script starts the "new" version, the health check fails, and it **automatically stops the
broken instance and brings the previous good one back up.** Confirm you're still serving:
```bash
curl localhost:8000/health # ok — the bad deploy reverted itself
```
That automatic reversal — not the build, not the run — is the part that makes auto-deploy
something you can sleep through.
### Part E — Wire it into the pipeline (read + reason)
6. Open `lab/cd-starter.yml` and compare it to the Module 14 `ci-starter.yml`. It's the **same
pipeline with stages appended**: the lint/test/scan gates run first (unchanged), and only `on:
push` to `main` (a merge) do the build-publish-deploy stages run. Trace the `needs:`/dependency
chain that makes deploy run *only after* the checks pass.
7. Find the one line that is the delivery-vs-deployment switch — the deploy-to-prod step gated behind
a manual approval (`environment:` with a required reviewer, commented in the file). Decide, for
the `tasks-app`, which side you'd choose and why, and ask your AI assistant to make the case for
the *other* choice. The goal isn't a "right" answer; it's being able to articulate the risk
posture either way.
> **A note on running the full pipeline:** actually executing `cd-starter.yml` end to end needs a
> forge with a container registry and a deploy target wired up — that's environment-specific and
> partly Module 19's territory (the runners and compute underneath). Parts AD give you the deploy
> *logic* runnable today on your own machine; the YAML shows how it slots into the automated
> pipeline you already started in Module 14.
---
## Where it breaks
Be honest about the edges — this is where teams get burned.
- **The deploy is only as safe as the gates in front of it.** Continuous deployment with weak tests
and no review isn't "moving fast," it's an automated mistake-shipping machine. If you haven't done
the Module 10/14/15 work, do *delivery* (human on the button), not *deployment*. Auto-deploy is a
reward you earn by trusting your gates, not a default you turn on.
- **Health checks lie.** A `200` from `/health` means "the process started," not "the feature
works." A shallow health check passes while the app returns garbage to users. Make the check
meaningful (does it reach its database? can it serve a real request?) and lean on canary/gradual
rollout for anything important — but know that no health check replaces real tests and real
monitoring.
- **Rollback isn't free, and some things don't roll back.** Reverting the *running image* is cheap.
Reverting a **database migration**, a sent email, a charged credit card, or a published message is
not — those are forward-only. The cleaner the separation between code deploys and irreversible
state changes, the more rollback actually saves you. Don't assume "we can always roll back" covers
data.
- **This lab simulates the target.** A local `docker run` is the deploy logic, not the deploy
reality. Real targets add networking, DNS cutover, load balancers, zero-downtime orchestration,
and multiple instances. The five steps hold; the operational surface around them is larger. The
*compute* that runs all of this — and why you might run your own — is Module 19.
- **"Build once" only holds if you actually do.** The instant someone rebuilds on the prod box "just
to be sure," you've lost the guarantee that prod runs what CI tested. Deploy the artifact CI built.
No rebuilds downstream.
---
## Check for understanding
**You're done when:**
- You can state the difference between continuous delivery and continuous deployment in one sentence
— *who clicks the prod button* — and say which one `tasks-app` should use and why.
- `./deploy.sh` builds, tags by commit SHA, runs the container, and reports a healthy deploy you can
`curl`.
- You have **watched a bad deploy roll itself back** to the previous good version, and the service
stayed up.
- You can point at the line in `cd-starter.yml` that turns delivery into deployment, and explain what
gates have to be trustworthy before you'd flip it.
When a deploy is one command, a bad one reverts itself, and you can argue the delivery-vs-deployment
call for a given repo, you've closed the merged-to-running gap. Module 19 goes underneath all of
this — the runners and compute actually executing your CI/CD, and why you'd own them.
---
## Verify-before-publish
This is expansion-zone material (Module 15+); some specifics drift. Re-check at build/publish time:
- [ ] **Action/runner versions** in `cd-starter.yml` (`actions/checkout`, `actions/setup-python`,
any build/login/push actions) — pin to current major versions and confirm they still exist.
- [ ] **Registry login + push syntax** — the standard build-and-push action names and auth flow
change; verify against current forge docs rather than the comments here.
- [ ] **Manual-approval mechanism** — the way a forge gates a job behind human approval
(GitHub `environment` protection rules, GitLab `when: manual`, others) shifts in naming/UI.
Confirm the delivery-vs-deployment switch still maps to the current feature.
- [ ] **Container runtime commands** — confirm `docker`/`podman` flags used in `deploy.sh`
(`run`, `--health-*`, `inspect`) match current CLI behavior.
- [ ] **Cross-references** to Modules 16, 17, and 19 still match those modules' final content.
@@ -0,0 +1,24 @@
# The Module 16 container image for the tasks-app, set to run the HTTP service from serve.py.
#
# This is *what you ship* (Module 16). Continuous delivery/deployment (this module) builds this
# once, tags it with the commit SHA, and runs that exact artifact everywhere.
#
# Note what is NOT here: no secrets, no environment-specific config. Those are injected at run time
# (Module 17), which is why the same image can run in staging and prod unchanged.
FROM python:3.12-slim
WORKDIR /app
# The app is dependency-free (stdlib only), so there is nothing to pip install. Copy the source.
COPY tasks.py cli.py serve.py ./
# Document the port the service listens on.
EXPOSE 8000
# A built-in container health check. The deploy step also checks /health from outside, but this
# lets the runtime itself know whether the container is healthy.
HEALTHCHECK --interval=5s --timeout=3s --retries=3 \
CMD python -c "import urllib.request,sys; sys.exit(0 if urllib.request.urlopen('http://localhost:8000/health').status==200 else 1)"
CMD ["python", "serve.py"]
@@ -0,0 +1,87 @@
# Starter CD pipeline for the tasks-app — GitHub Actions flavor, extending the Module 14 CI file.
#
# The whole idea: CD is not a new system. It is MORE STAGES on the SAME pipeline, after the checks
# pass. The lint/test gates below are the Module 14 pipeline, unchanged. Everything from the
# `build-and-publish` job down is new in this module.
#
# Where this file goes: .github/workflows/cd.yml (or fold it into your existing ci.yml). On GitLab,
# the same shape is stages in .gitlab-ci.yml with `needs:`/`rules:`; Forgejo/Gitea use Actions-
# compatible YAML. The concept — gated stages from merge to running — is identical everywhere.
#
# VERIFY BEFORE PUBLISH: action versions, the registry login/build-push action names, and the
# manual-approval mechanism all drift. Check current forge docs at build time (see README checklist).
name: CD
on:
push:
branches: [main] # only a MERGE to main triggers a deploy
pull_request: # PRs still run the gates, but never deploy
jobs:
# ---- The Module 14 gates: nothing ships without passing these first. ----------------------------
check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: pip install pytest ruff
- run: ruff check . # lint
- run: pytest -q # test
# In a real pipeline a security-scan job (Module 15) would also gate here.
# ---- Build the artifact ONCE and publish it. The unit of deploy is an immutable, SHA-tagged image.
build-and-publish:
needs: check # only runs if the gates passed
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
# Log in to your container registry (Module 16's images need a durable home, like a Git remote
# is for commits). Registry/credentials are provider-specific — supply them as secrets,
# never inline (Module 17).
# - uses: docker/login-action@v3
# with:
# registry: ${{ vars.REGISTRY }}
# username: ${{ secrets.REGISTRY_USER }}
# password: ${{ secrets.REGISTRY_TOKEN }}
# Build and push, tagging with the commit SHA (immutable + traceable) and :staging (moving).
# - uses: docker/build-push-action@v6
# with:
# push: true
# tags: |
# ${{ vars.REGISTRY }}/tasks-app:${{ github.sha }}
# ${{ vars.REGISTRY }}/tasks-app:staging
- run: echo "build + push tasks-app:${{ github.sha }} (wire up the registry steps above)"
# ---- Deploy to a NON-prod environment automatically. Safe to do on every merge. ----------------
deploy-staging:
needs: build-and-publish
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
# The five deploy steps live in deploy.sh in this folder. On a real target this would run the
# platform's deploy (kubectl / platform CLI / compose) against the SHA-tagged image, inject
# runtime config + secrets (Module 17), health-check, and roll back on failure.
- run: echo "deploy tasks-app:${{ github.sha }} to STAGING, health-check, roll back if red"
# ---- THIS JOB IS THE DELIVERY-vs-DEPLOYMENT SWITCH. ---------------------------------------------
#
# As written, `environment: production` requires a human to approve before this job runs (set a
# required reviewer on the 'production' environment in the forge). That is CONTINUOUS DELIVERY:
# the artifact is auto-built and staged; a person clicks to ship to prod.
#
# Delete the `environment:` block and this becomes CONTINUOUS DEPLOYMENT: merge -> prod, no human.
# Only remove it once you trust your review + CI + security gates (Modules 10/14/15) more than you
# trust the click. On GitLab the equivalent switch is `when: manual` vs. automatic.
deploy-prod:
needs: deploy-staging
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
environment: production # <-- required-reviewer gate = delivery. Remove = deployment.
steps:
- run: echo "deploy tasks-app:${{ github.sha }} to PRODUCTION (gated on human approval)"
@@ -0,0 +1,95 @@
#!/usr/bin/env bash
#
# deploy.sh — the deploy step of CD, simulated with a local container run.
#
# The five steps of any deploy, provider-neutral (see the module README):
# 1. build/pull the specific image tag 4. health-check before trusting it
# 2. inject runtime config + secrets 5. cut over if healthy, ROLL BACK if not
# 3. start the new version
#
# The *target* here is your own machine instead of a server, but the logic is the real thing.
#
# Usage:
# ./deploy.sh <tag> # e.g. ./deploy.sh $(git rev-parse --short HEAD)
# BREAK=1 ./deploy.sh <tag># force the new version's health check to fail, to demo rollback
#
# Requires: docker (or `alias docker=podman`), curl.
set -euo pipefail
IMAGE="tasks-app"
CONTAINER="tasks-app"
PORT="8000"
STATE_FILE=".deploy-state" # records the last good tag, for rollback
TAG="${1:-$(git rev-parse --short HEAD)}"
say() { printf '\n=== %s\n' "$*"; }
# --- Step 1: build the artifact for this tag (in real CD this was already built+pushed by CI) -----
say "Building ${IMAGE}:${TAG}"
docker build -t "${IMAGE}:${TAG}" .
# Remember what is currently running so we can roll back to it.
PREVIOUS=""
if [ -f "${STATE_FILE}" ]; then
PREVIOUS="$(cat "${STATE_FILE}")"
fi
# --- Steps 2 + 3: start the new version with runtime config/secrets injected (Module 17) ----------
# Note: APP_VERSION is config supplied at run time, NOT baked into the image. A real deploy would
# also pass secrets here (e.g. --env-file, a mounted secret, or a secrets-manager lookup) — never
# committed, never in the image.
start_version() {
local tag="$1"
docker rm -f "${CONTAINER}" >/dev/null 2>&1 || true
docker run -d --name "${CONTAINER}" \
-p "${PORT}:8000" \
-e "APP_VERSION=${tag}" \
${BREAK:+-e "BREAK=${BREAK}"} \
"${IMAGE}:${tag}" >/dev/null
}
say "Starting ${IMAGE}:${TAG}"
start_version "${TAG}"
# --- Step 4: health-check the new version before trusting it --------------------------------------
healthy() {
for _ in $(seq 1 10); do
if curl -fs "http://localhost:${PORT}/health" >/dev/null 2>&1; then
return 0
fi
sleep 1
done
return 1
}
say "Health-checking http://localhost:${PORT}/health"
if healthy; then
# --- Step 5a: cut over. Record this as the new known-good for the next deploy's rollback target.
echo "${TAG}" > "${STATE_FILE}"
say "DEPLOY OK — ${IMAGE}:${TAG} is live and healthy"
curl -s "http://localhost:${PORT}/health"; echo
exit 0
fi
# --- Step 5b: ROLLBACK. The new version failed its health check. ----------------------------------
say "HEALTH CHECK FAILED for ${IMAGE}:${TAG} — rolling back"
docker rm -f "${CONTAINER}" >/dev/null 2>&1 || true
if [ -z "${PREVIOUS}" ]; then
echo "No previous known-good version to roll back to. Service is DOWN." >&2
echo "(Deploy a healthy version first, then re-run the BREAK=1 deploy to see rollback work.)" >&2
exit 1
fi
# Rollback is trivial because we deploy immutable tags: just run the old one again. No rebuild.
say "Restoring previous good version ${IMAGE}:${PREVIOUS}"
BREAK="" start_version "${PREVIOUS}" # clear BREAK so the good version comes up clean
if healthy; then
say "ROLLED BACK — ${IMAGE}:${PREVIOUS} is live and healthy. The bad deploy reverted itself."
curl -s "http://localhost:${PORT}/health"; echo
exit 1 # exit non-zero: the deploy you asked for did NOT ship, even though service recovered
else
echo "Rollback FAILED — service is DOWN. Investigate ${IMAGE}:${PREVIOUS}." >&2
exit 2
fi
@@ -0,0 +1,67 @@
"""Minimal HTTP face for the tasks-app, so there is something long-running to *deploy*.
Standard library only — no pip install, so the container image stays tiny and the lab has no
dependencies to drift. It reuses the TaskList from tasks.py (Modules 1-2) unchanged.
Run it:
python serve.py # serves on http://localhost:8000
Endpoints:
GET /health -> {"status": "ok", "version": <APP_VERSION>} (200)
GET /tasks -> the current tasks as JSON
Two environment knobs make this realistic for the CD lab (config injected at run time, Module 17):
APP_VERSION what /health reports as the running version (set by deploy.sh to the commit SHA)
BREAK=1 force /health to return 500 — a stand-in for "this build starts but is broken",
used in Part D to trigger an automatic rollback.
"""
import json
import os
from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer
from pathlib import Path
from tasks import Task, TaskList
STATE = Path(__file__).parent / "tasks.json"
PORT = int(os.environ.get("PORT", "8000"))
APP_VERSION = os.environ.get("APP_VERSION", "dev")
BREAK = os.environ.get("BREAK") == "1"
def load() -> TaskList:
if not STATE.exists():
return TaskList()
raw = json.loads(STATE.read_text())
return TaskList(tasks=[Task(**t) for t in raw])
class Handler(BaseHTTPRequestHandler):
def _send(self, code: int, payload: dict) -> None:
body = json.dumps(payload).encode()
self.send_response(code)
self.send_header("Content-Type", "application/json")
self.send_header("Content-Length", str(len(body)))
self.end_headers()
self.wfile.write(body)
def do_GET(self) -> None:
if self.path == "/health":
# A real health check would also confirm dependencies (db, etc.) are reachable.
if BREAK:
self._send(500, {"status": "unhealthy", "version": APP_VERSION})
else:
self._send(200, {"status": "ok", "version": APP_VERSION})
elif self.path == "/tasks":
tlist = load()
self._send(200, {"tasks": [t.__dict__ for t in tlist.tasks]})
else:
self._send(404, {"error": "not found"})
def log_message(self, *args) -> None: # keep the lab output clean
pass
if __name__ == "__main__":
print(f"serving tasks-app version={APP_VERSION} on http://localhost:{PORT}")
ThreadingHTTPServer(("0.0.0.0", PORT), Handler).serve_forever()