fbec36cb67
Scaffold the course repo and author the full curriculum in dependency-chain order, following the settled build decisions in handoff.md. - Scaffold: course README, vendor-neutral AGENTS.md (dogfoods Module 5), _TEMPLATE.md (the fixed 9-section module shape), root .gitignore, ship config. - Modules 1-2: reference exemplars (locked for tone/depth/lab style). - Modules 3-27: full lessons + runnable labs, each following the template, respecting the chain, vendor/model-agnostic, with "feel the pain" labs. - Module 8 hosting comparison web-researched and date-stamped (as of 2026-06-22), not written from memory; expansion-zone modules carry Verify-before-publish. - Capstone: the full loop end to end on the running tasks-app example. Lab code syntax-checked (Python/shell/YAML); every module has the 7 core template sections. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01TfzV5QvtPDz8LJS3Pu5VLT
362 lines
18 KiB
Markdown
362 lines
18 KiB
Markdown
# Module 14 — Continuous Integration
|
||
|
||
> **The AI writes code that looks right. CI is the tireless reviewer that checks whether it actually
|
||
> is — automatically, on every single push, before anyone trusts it.** This module turns the tests
|
||
> you wrote in Module 13 into a gate that runs itself.
|
||
|
||
---
|
||
|
||
## Prerequisites
|
||
|
||
- **Module 8 — Remotes and Hosting.** CI runs *on the forge*, triggered by pushes. You need a repo
|
||
pushed to a remote (any forge — GitHub, GitLab, a self-hosted Forgejo/Gitea, whatever you set up
|
||
in Module 8) for there to be anything to trigger.
|
||
- **Module 13 — Testing in the AI Era.** CI is mostly "run the tests, automatically." You need tests
|
||
to run. If you skipped writing them, this module's lab ships a small suite so you're not blocked,
|
||
but the real payoff is automating *your* tests.
|
||
- **Module 2 — Version Control.** Pushes, commits, and the diff habit are the substrate CI sits on.
|
||
|
||
You do **not** need Docker, secrets management, or your own runner yet — those are Modules 16, 17,
|
||
and 19. This module uses the forge's hosted runners, which require zero setup.
|
||
|
||
---
|
||
|
||
## Learning objectives
|
||
|
||
By the end of this module you can:
|
||
|
||
1. Explain what CI actually is — automated checks bound to a trigger — and why "on every push" is the
|
||
part that makes it valuable.
|
||
2. Write a forge-native CI workflow that checks out your code, installs its tools, and runs a linter
|
||
and your test suite.
|
||
3. Read a CI run: find which step failed, read the log, and reproduce the failure locally.
|
||
4. Watch CI catch a breaking change *before* it reaches anyone who would trust the broken code.
|
||
5. Recognize that CI is the same concept on every forge, and port a pipeline from one to another.
|
||
|
||
---
|
||
|
||
## Key concepts
|
||
|
||
### What CI is, stripped down
|
||
|
||
Continuous Integration has a grand-sounding name and a mundane core: **a set of checks that run
|
||
automatically whenever you push code, on a clean machine you don't control.** That's it. The checks
|
||
are usually the same commands you'd run by hand — lint, build, test — and the magic is entirely in
|
||
the word *automatically*.
|
||
|
||
You already run checks. Before you commit, you (sometimes) run the tests, (sometimes) run the
|
||
linter, (sometimes) remember to. CI removes every "sometimes." It runs the checks the same way,
|
||
every time, on every push, whether you remember or not, whether you're tired or not, whether it's a
|
||
one-line fix you're *sure* about or not. The discipline you can't reliably enforce on yourself, a
|
||
machine enforces for free.
|
||
|
||
Three properties make CI more than a glorified shell script:
|
||
|
||
- **It's triggered, not invoked.** You don't run CI; pushing runs it. The check is bound to the
|
||
event, so it can't be skipped by forgetting.
|
||
- **It runs on a clean machine.** The forge spins up a fresh, throwaway runner with nothing of yours
|
||
on it — no half-installed dependency, no environment variable you set six months ago and forgot.
|
||
If your code only works because of something special about your laptop, CI finds out immediately.
|
||
("Works on my machine" dies here. Module 16 takes the reproducibility idea further with
|
||
containers.)
|
||
- **Its result is visible and shared.** A green check or a red X shows up on the commit and on the
|
||
pull request (Module 10), where everyone — every human reviewer and, later, every agent — can see
|
||
whether this code passed the gate.
|
||
|
||
### The pipeline: checkout → setup → checks
|
||
|
||
Almost every CI configuration, on every forge, is the same four moves:
|
||
|
||
1. **Check out the code** onto the runner. The runner starts empty; first you put your repo on it.
|
||
2. **Set up the environment** — install the language runtime, pin its version.
|
||
3. **Install the tools** the checks need — the test runner, the linter.
|
||
4. **Run the checks** — lint, then test. Any check that exits non-zero fails the whole run.
|
||
|
||
That last point is the load-bearing one. CI's entire enforcement mechanism is the **exit code**.
|
||
Every tool you'd run in a terminal returns 0 for success and non-zero for failure. `pytest` exits
|
||
non-zero if a test fails. `ruff check` exits non-zero if it finds a lint problem. CI runs your
|
||
commands and watches those exit codes; one failure turns the run red. You're not learning a new
|
||
testing system — you're wiring the tools you already have to a trigger.
|
||
|
||
### What goes in a CI run for this audience
|
||
|
||
Three tiers of check, cheapest first, because a fast check that fails early saves you waiting on a
|
||
slow one:
|
||
|
||
- **Lint** — static checks that don't run your code: style, unused imports, obvious mistakes. Fast,
|
||
cheap, catches a surprising amount. We use a linter as the example here; the principle is
|
||
tool-agnostic.
|
||
- **Build** — does the code even assemble? For an interpreted language like our Python example
|
||
there's no compile step, so "build" often collapses into "does it import without erroring." For
|
||
compiled languages this is where a broken type or missing symbol gets caught.
|
||
- **Test** — the Module 13 suite. The expensive, high-value tier: it actually runs your code and
|
||
checks behavior.
|
||
|
||
Order them cheap-to-expensive so the fast checks fail fast. There's no reason to spend two minutes
|
||
running the test suite if the linter would have rejected the push in three seconds.
|
||
|
||
### The worked example: a forge-native workflow
|
||
|
||
Here's a complete, real CI pipeline for the `tasks-app`. This is GitHub Actions YAML — the most
|
||
common dialect, and our default example — but **read it as a concept, not a product.** Every forge
|
||
has the exact same pipeline in its own dialect; the GitLab version is in the lab folder, and it's
|
||
the same five moves.
|
||
|
||
```yaml
|
||
name: CI
|
||
|
||
on:
|
||
push:
|
||
pull_request:
|
||
|
||
jobs:
|
||
check:
|
||
runs-on: ubuntu-latest
|
||
steps:
|
||
- name: Check out the code
|
||
uses: actions/checkout@v4
|
||
- name: Set up Python
|
||
uses: actions/setup-python@v5
|
||
with:
|
||
python-version: "3.12"
|
||
- name: Install tools
|
||
run: pip install pytest ruff
|
||
- name: Lint
|
||
run: ruff check .
|
||
- name: Test
|
||
run: pytest -q
|
||
```
|
||
|
||
Reading it top to bottom: `on:` is the trigger (push and pull request). `runs-on:` picks the clean
|
||
machine. The `steps:` are the four moves — checkout, set up Python, install the tools, then the two
|
||
checks. `uses:` pulls in a pre-built action (someone else's reusable step); `run:` is just a shell
|
||
command. The linter runs first because it's cheap; the tests run last because they're the
|
||
expensive, decisive check.
|
||
|
||
This file lives *in the repo*, committed and versioned like everything else. That's deliberate and
|
||
on-thesis: your pipeline is code, it's reviewed as a diff in a PR (Module 10), and a teammate or an
|
||
agent inherits it automatically by cloning. The same logic as committing the AI's config in
|
||
Module 5 — the automation around your work is itself a durable, shared artifact.
|
||
|
||
### Reading a failed run
|
||
|
||
When CI goes red, the skill is triage, and it's fast once you know the shape:
|
||
|
||
1. **Open the run.** The forge shows the job as a list of steps with a red X on the one that failed.
|
||
2. **The first red step is the cause.** Steps run in order and stop at the first failure; everything
|
||
after it is skipped, not broken. Don't get distracted by the skipped steps.
|
||
3. **Read that step's log.** It's the same output the tool prints in your terminal — a failing
|
||
`pytest` assertion, a `ruff` finding with a file and line number. CI didn't invent a new error
|
||
format; it's showing you the command's own output.
|
||
4. **Reproduce it locally.** Run the exact command from the failed step (`pytest -q` or
|
||
`ruff check .`) on your machine. It will fail the same way, because CI ran the same command. Fix
|
||
it locally, confirm it's green locally, push again.
|
||
|
||
That loop — red on the forge, reproduce locally, fix, push — is the entire day-to-day of working
|
||
with CI. The clean-machine runner occasionally surfaces a failure you *can't* reproduce locally;
|
||
that's not CI being flaky, that's CI correctly catching that your machine has something the clean
|
||
one doesn't. (See "Where it breaks.")
|
||
|
||
---
|
||
|
||
## The AI angle
|
||
|
||
This is the module where CI stops being generic devops hygiene and becomes specifically, urgently
|
||
about AI-assisted work.
|
||
|
||
AI generates code that **looks right.** That's not a knock on the models — it's their defining
|
||
property. They produce fluent, plausible, well-formatted code that passes a human skim, because
|
||
"looks like correct code" is close to what they're optimizing for. The failure mode isn't garbage
|
||
that obviously won't run; it's the function that's 95% right with a flipped comparison, the refactor
|
||
that quietly drops an edge case, the "cleanup" that breaks one path you didn't think to re-check.
|
||
A human reviewer skimming a confident-looking diff is exactly the reviewer that misses these
|
||
(Module 10 is the whole skill of *not* missing them — and it's hard).
|
||
|
||
CI is the reviewer that doesn't skim. It runs the code. It doesn't care how clean the diff looks or
|
||
how confidently the commit message is worded — it executes the tests and reports the exit code. The
|
||
flipped comparison fails an assertion. The dropped edge case fails the test that covered it. The
|
||
plausibility that fools a human is invisible to a process that only checks behavior.
|
||
|
||
This compounds with everything else AI changes about your workflow:
|
||
|
||
- **AI raises your push rate.** You're making more changes, faster, more of them generated. Manual
|
||
pre-push checking scales with discipline and doesn't survive volume. The automated gate scales
|
||
for free — it doesn't get tired on the fortieth push of the day.
|
||
- **AI can fix what CI catches.** A red CI run is a precise, machine-readable problem statement: the
|
||
exact command, the exact failing assertion, the exact line. That's ideal input for an agent —
|
||
paste the failed log and ask it to fix the failure. (Module 25 automates this into agents that
|
||
respond to a failing pipeline on their own. CI is the trigger that makes self-healing possible.)
|
||
- **CI is the gate that makes letting agents run safely possible at all.** Every later module that
|
||
hands the AI more autonomy — issue-to-PR agents, unattended runs — relies on the fact that nothing
|
||
the agent produces reaches anyone without passing CI first. The supervision is structural: it's
|
||
this gate, not a human watching the agent type.
|
||
|
||
You don't add CI *despite* using AI. The faster and more confidently the AI writes plausible code,
|
||
the more you need a reviewer that checks behavior instead of believing the diff.
|
||
|
||
---
|
||
|
||
## Hands-on lab
|
||
|
||
**Lab language:** YAML (the CI config) plus the Python `tasks-app` and shell commands. You won't
|
||
write much by hand — you'll commit a starter workflow, watch it pass, then break it on purpose.
|
||
|
||
**You'll need:**
|
||
|
||
- The `tasks-app` from Modules 1–2, **pushed to a forge** (Module 8). Any forge works.
|
||
- The starter files in this module's `lab/`:
|
||
- `ci-starter.yml` — the workflow (GitHub Actions flavor).
|
||
- `gitlab-ci-starter.yml` — the same pipeline for GitLab, if that's your forge.
|
||
- `test_tasks.py` — a small test suite (use your Module 13 tests instead if you have them).
|
||
- Python 3.10+ locally, and your AI assistant.
|
||
|
||
### Part A — Run the checks locally first
|
||
|
||
Never push a workflow you haven't run by hand. CI just runs the same commands — prove they work on
|
||
your machine first.
|
||
|
||
1. Copy `lab/test_tasks.py` into your `tasks-app` folder (next to `tasks.py`). Install the tools and
|
||
run both checks exactly as CI will:
|
||
|
||
```bash
|
||
cd ~/workflow-course/tasks-app
|
||
pip install pytest ruff
|
||
pytest -q # should report all tests passing
|
||
ruff check . # should report no issues (or fix what it flags)
|
||
```
|
||
|
||
If both are clean locally, CI will be green. If not, fix it here — it's faster than waiting on a
|
||
runner.
|
||
|
||
### Part B — Add the workflow and watch it pass
|
||
|
||
2. Put the workflow where your forge looks for it:
|
||
- **GitHub / Forgejo / Gitea:** copy `lab/ci-starter.yml` to `.github/workflows/ci.yml` in your
|
||
repo (Forgejo/Gitea also read `.forgejo/workflows/` or `.gitea/workflows/` — check yours).
|
||
- **GitLab:** copy `lab/gitlab-ci-starter.yml` to `.gitlab-ci.yml` at the repo root.
|
||
|
||
3. Commit and push it:
|
||
|
||
```bash
|
||
git add .github/workflows/ci.yml test_tasks.py # adjust path for your forge
|
||
git commit -m "Add CI: lint and test on every push"
|
||
git push
|
||
```
|
||
|
||
4. Open your repo in the forge's web UI and find the run (usually an "Actions," "CI/CD," or
|
||
"Pipelines" tab, and a status icon on the commit). Watch the steps execute and turn green.
|
||
**That green check is the gate now standing guard on every future push.**
|
||
|
||
### Part C — Break it on purpose and watch CI catch it
|
||
|
||
This is the whole point. You're going to ship the kind of plausible-but-wrong change AI produces,
|
||
and watch CI stop it.
|
||
|
||
5. Introduce a breaking change. Ask your AI assistant — in the browser, or with your editor-
|
||
integrated tool from Module 4 — for something that *sounds* like a cleanup but changes behavior.
|
||
For example: *"Refactor `pending()` in tasks.py to be simpler"* and, if it stays correct, nudge
|
||
it until the logic actually changes — or just make the change yourself to feel it. A classic
|
||
plausible break: have `pending()` return `self.tasks` (all tasks) instead of filtering out the
|
||
done ones. It reads fine. It's wrong.
|
||
|
||
6. **Notice it still looks right.** Glance at the diff. The function is short, clean, plausible.
|
||
This is exactly the trap from "The AI angle" — nothing in the *appearance* warns you.
|
||
|
||
7. Commit and push it:
|
||
|
||
```bash
|
||
git add tasks.py
|
||
git commit -m "Simplify pending()"
|
||
git push
|
||
```
|
||
|
||
8. Watch CI go red. Open the run, find the first failed step (`Test`), and read the log:
|
||
`test_pending_excludes_completed_tasks` failed, with the assertion and the actual-vs-expected
|
||
values. CI caught in seconds what a skim would have waved through.
|
||
|
||
9. Reproduce and fix:
|
||
|
||
```bash
|
||
pytest -q # fails locally too — same command, same failure
|
||
git restore tasks.py # throw away the bad change (Module 2's safety net)
|
||
git commit -am "Revert: pending() must exclude completed tasks"
|
||
git push # CI goes green again
|
||
```
|
||
|
||
10. *(Optional, to feel the linter tier.)* Add an obviously unused import to `cli.py`
|
||
(`import os` at the top, unused), commit, and push. Watch the **Lint** step fail *before* the
|
||
tests even run — the cheap check failing fast. Remove it and push again.
|
||
|
||
You've now seen both halves: CI passing as a quiet guardrail, and CI failing as the reviewer that
|
||
caught a change you might have trusted.
|
||
|
||
---
|
||
|
||
## Where it breaks
|
||
|
||
The honest caveats, because a skeptical audience trusts the limits more than the pitch:
|
||
|
||
- **CI only catches what your checks check.** A green run means "the linter found nothing and the
|
||
tests passed" — not "the code is correct." If the AI broke behavior you have no test for, CI is
|
||
cheerfully green while the bug ships. CI is exactly as good as your test suite (Module 13), and no
|
||
better. The flipped-comparison bug above got caught *because a test covered it.*
|
||
- **Green CI is not "reviewed."** It checks behavior, not design, intent, security, or whether the
|
||
feature is even the right one. It does not replace human review (Module 10) or the security gates
|
||
in Module 15 — it sits alongside them. Treating a green check as sign-off is how plausible-wrong
|
||
code with no failing test sails straight through.
|
||
- **The clean machine is a feature that feels like a bug.** Sooner or later CI fails in a way you
|
||
can't reproduce locally — a dependency you have installed but never declared, a file outside the
|
||
repo your code quietly reads, a path that only exists on your machine. That's not flakiness; it's
|
||
CI correctly catching that your code depends on something that isn't in the repo. Fix the
|
||
dependency, don't blame the runner. (Module 16's containers make local and CI environments
|
||
identical, which kills most of these.)
|
||
- **Slow CI gets ignored.** If the run takes fifteen minutes, people stop waiting for it and start
|
||
merging around it, and the gate is worthless. Keep it fast: cheap checks first, and don't put
|
||
things in CI that don't need to run on every push.
|
||
- **CI is not free compute, and it's not infinite.** Hosted runners have usage limits and queue
|
||
times, and a workflow that triggers on every push to every branch can burn through them. (Module
|
||
19 is where you understand and own that compute.)
|
||
- **A committed workflow runs code from the repo.** A pull request from an untrusted fork can
|
||
propose changes to the workflow itself. Forges have settings for how CI handles fork PRs; the
|
||
defaults are usually safe, but it's a real attack surface worth knowing exists (the supply-chain
|
||
thread picks up in Modules 15 and 22).
|
||
|
||
---
|
||
|
||
## Check for understanding
|
||
|
||
**You're done when:**
|
||
|
||
- Your `tasks-app` has a committed CI workflow that runs a linter and your tests on every push, and
|
||
you've watched it go green on the forge.
|
||
- You pushed a plausible-but-wrong change and watched CI catch it — found the failed step, read the
|
||
log, reproduced the failure locally, and fixed it.
|
||
- You can explain, in your own words, why CI specifically matters for AI-generated code (it checks
|
||
behavior, not appearance) and the one thing a green check does *not* tell you (that the code is
|
||
correct — only that your checks passed).
|
||
- You can point at the same pipeline in two forge dialects and see it's the same five moves.
|
||
|
||
When pushing a change and *expecting* the gate to either bless it or stop it feels automatic — when
|
||
you'd be uneasy merging code that hadn't been through CI — you've got it. Module 15 adds the next
|
||
gates on the same pushes: scanning for vulnerable dependencies, leaked secrets, and the packages AI
|
||
hallucinates into existence.
|
||
|
||
---
|
||
|
||
## Verify-before-publish
|
||
|
||
CI YAML and the actions it references drift faster than the rest of this durable-core material.
|
||
Re-check at build time:
|
||
|
||
- [ ] **Action versions.** Confirm `actions/checkout` and `actions/setup-python` major versions in
|
||
`ci-starter.yml` are current and not deprecated. Pinned majors (`@v4`, `@v5`) age.
|
||
- [ ] **Runner labels.** Confirm `ubuntu-latest` (and any GitLab `image:` tag) still resolves to a
|
||
supported image; default runner OS versions roll forward.
|
||
- [ ] **Trigger and config syntax.** Verify the `on:` keys and overall workflow schema against the
|
||
forge's current docs — Actions YAML keys do change.
|
||
- [ ] **Forge UI labels.** The tab names in the lab ("Actions," "CI/CD," "Pipelines") and the
|
||
workflow file locations (`.github/workflows/`, `.gitlab-ci.yml`, `.forgejo/`, `.gitea/`) match
|
||
what the current forge versions actually use.
|
||
- [ ] **Tool names.** The example linter and test runner (`ruff`, `pytest`) are current, installable,
|
||
and still behave as described — or swap in the equivalents the rest of the course uses.
|