ai-workflow-course/modules/17-secrets-config-and-environments/README.md

# Module 17 — Secrets, Config, and Environments

> **Ask an AI to "connect to the API" and it will cheerfully paste your secret key straight into
> a source file — the one place it must never go.** This module gives you the standard, boring,
> correct place to put secrets and per-environment config instead, and a reflex for catching the
> AI when it does the wrong thing.

---

## Prerequisites

- **Module 2 — Version Control as a Safety Net.** You need `.gitignore` and the habit of reading
  `git diff` before you commit. Both are load-bearing here.
- **Module 12 — Revert, Reset, and Recovery.** You learned that Git history is forever and that
  secrets *don't belong in it* — this module is the practical follow-through on that promise.
- **Module 15 — Security Scanning for AI-Generated Code.** Secret scanning is the automated gate
  that catches a hardcoded key after the fact. This module is the *prevention* that means the gate
  rarely has to fire.
- **Module 16 — Containers and Reproducible Environments.** A container is a sealed box; config and
  secrets are how you pass the outside world *into* it at run time. That handoff is environment
  variables, which is exactly what this module is about.

You can attempt the lab with only Modules 1–2, but the *why* leans on 12, 15, and 16.

---

## Learning objectives

By the end of this module you can:

1. Explain why a secret in source code is a different and worse problem than a bug — and why Git
   makes it permanent.
2. Move a secret out of code and into the **environment** (an environment variable or a gitignored
   `.env` file), and have the app read it back at run time.
3. Keep config you *can* commit (a committed template) separate from secrets you *can't* (the real
   `.env`), so a teammate or a fresh AI session knows exactly what to supply.
4. Apply the 12-factor rule — *config lives in the environment, not the build* — to run one codebase
   unchanged across dev, staging, and prod.
5. Describe what a secrets manager buys you over `.env` files, in vendor-neutral terms, and know
   when you've outgrown a file on disk.

---

## Key concepts

### A secret in source is not a bug — it's a leak

A bug is a wrong behavior you can fix and move on from. A hardcoded secret is different: the moment
it's written to a file in a repo, you've started a countdown. Commit it and it's in your history
**forever** — Module 12 was blunt about this: `git revert` writes a *new* commit undoing the
change, but the old commit, with the key in plain text, is still right there in the log for anyone
who clones the repo. Push it (Module 8) and it's now on a server, in every teammate's clone, and in
every backup. "Delete the line and commit again" does nothing; the secret is in the snapshot, not
the current file.

So the only real fix after a leak is **rotation**: revoke the exposed key at the provider and issue
a new one, treating the old one as compromised. That's expensive and easy to forget, which is why
the entire discipline is built around *never writing the secret to a tracked file in the first
place.* Prevention is the whole game.

What counts as a secret: API keys and tokens, database passwords and connection strings, private
keys and certificates, signing/encryption keys, OAuth client secrets, webhook signing secrets. The
test is simple — *if this string leaked, would someone have to scramble?* If yes, it's a secret and
it does not go in code.

### Config vs. secrets vs. code

Three things often get jumbled into source files. Pulling them apart is the whole mental model:

| Kind | Example | Where it lives | Goes in Git? |
|------|---------|----------------|--------------|
| **Code** | The logic of your app | Source files | **Yes** — that's the point |
| **Config** | Which backend URL, log level, feature flags, timeouts | The environment (often a `.env` *template* you commit + real values you don't) | The *template* yes, the *values* it depends |
| **Secrets** | API keys, passwords, tokens | The environment, sourced from a secret store in real deployments | **Never** |

The dividing line that matters: **config and secrets are things that change between *where* the app
runs, not *what* the app does.** Your dev laptop, the staging server, and production all run the
same code — they differ only in config (different URLs) and secrets (different keys). That
observation is the entire 12-factor idea below.

### The environment: where config and secrets actually go

An **environment variable** is a named value the operating system hands to a process when it
starts. Every OS has them; your shell is full of them right now (`PATH`, `HOME`). They're the
universal, language-agnostic channel for passing config *into* a program without putting it *in* the
program.

Set one for a single command:

```bash
# macOS / Linux
TASKS_API_KEY="sk-live-..." python sync.py

# Windows PowerShell
$env:TASKS_API_KEY="sk-live-..."; python sync.py
```

Read it back in code — and **fail loudly if it's missing**, because a silent empty string is worse
than a crash:

```python
import os

api_key = os.environ.get("TASKS_API_KEY")
if not api_key:
    raise SystemExit("TASKS_API_KEY is not set. Copy .env.example to .env and fill it in.")
```

That's the whole pattern. The secret never appears in the file; the file only *asks the environment*
for it. Anyone reading the source learns *that a key is needed* but not *what the key is* — which is
exactly the property you want.

### `.env` files: the developer-friendly middle ground

Typing `TASKS_API_KEY=...` before every command gets old, and exported shell variables vanish when
you close the terminal. The conventional fix is a **`.env` file** — a flat list of `KEY=value`
lines, sitting in your project, that gets loaded into the environment when the app starts:

```
APP_ENV=dev
TASKS_API_KEY=sk-live-9f8a7b6c5d4e3f2a1b0c9d8e7f6a5b4c
```

Two non-negotiable rules come with it:

1. **The real `.env` is gitignored. Always.** Add `.env` to your `.gitignore` (Module 2) *before*
   you create the file, so there's never a window where it could be committed. This is the single
   most important line in this module:

   ```gitignore
   # secrets and local config — never commit
   .env
   .env.*
   !.env.example
   ```

   That last two lines say: ignore `.env` and any `.env.something`, **but** keep tracking
   `.env.example` (the `!` un-ignores it). More on that next.

2. **Commit a template, not the secrets.** A `.env.example` (or `.env.template`) lists every
   variable the app needs with **placeholder** values and no real secrets. *This* file you commit.
   It's the documentation that tells a teammate — or the next AI session reading the repo as memory
   (Module 2) — exactly what to supply:

   ```
   # .env.example  (committed)
   APP_ENV=dev
   TASKS_API_KEY=replace-me
   ```

Loading a `.env` is usually one line via a small library (every major language has one). You can
also load it with a few lines of your own code and zero dependencies — the lab shows the
dependency-free version so it runs anywhere with just the language installed.

> **Naming, not values, is the contract.** Standardize the variable *names* across the team and
> commit them in the template. The values are local and secret; the names are shared and public.
> When the AI writes `os.environ["TASKS_API_KEY"]`, it should match what's in `.env.example`
> exactly — a mismatch is the most common "works on my machine" failure in this whole area.

### 12-factor: config in the environment, one build everywhere

The principle behind all of this comes from the [12-factor app](https://12factor.net) guidelines,
and factor III states it plainly: **store config in the environment.** The payoff for this audience:

> You build the artifact **once** and run the *same* artifact in every environment. Nothing about
> dev, staging, or prod is baked into the code or the container image — the differences are injected
> at run time as environment variables.

This is why it pairs so tightly with containers (Module 16). A container image is your immutable,
built-once artifact. You don't build a "staging image" and a "prod image" — you build *one* image
and start it with different environment variables:

```bash
docker run -e APP_ENV=staging -e TASKS_API_KEY="$STAGING_KEY" tasks-app
docker run -e APP_ENV=prod    -e TASKS_API_KEY="$PROD_KEY"    tasks-app
```

Same image, different environment. That's the whole idea, and it's what makes the delivery pipeline
in Module 18 sane: promote one artifact through environments instead of rebuilding per stage.

### Per-environment config: dev, staging, prod

"Environments" here means the distinct places your code runs, each with its own config and its own
secrets. The standard three:

- **dev** — your machine. A dev backend, a dev key with low privileges, verbose logging.
- **staging** — a production-like rehearsal. Separate backend, separate key, real-ish data.
- **prod** — the real thing. Real users, the powerful key, conservative settings.

The rule that catches people: **each environment gets its own secrets, and they never mix.** A dev
key must not be able to touch prod data, and a prod key must never sit in a developer's `.env`. The
clean pattern is one variable that *names* the environment (`APP_ENV`), which the code uses to pick
the right URLs and behavior, plus per-environment secret *values* supplied separately:

```python
import os

ENVIRONMENTS = {
    "dev":     "https://api.dev.example-tasks.com/v1",
    "staging": "https://api.staging.example-tasks.com/v1",
    "prod":    "https://api.example-tasks.com/v1",
}

app_env = os.environ.get("APP_ENV", "dev")
backend_url = ENVIRONMENTS[app_env]   # config selected by environment, not hardcoded
```

The *non-secret* per-environment config (which URL goes with which env) is fine to keep in code
like this — it's not sensitive and it's the same everywhere the code runs. Only the *secret values*
and the *choice of which environment this process is* come from outside.

### Secret stores: when a file on disk isn't enough

A gitignored `.env` is the right tool on your laptop. It does not scale to a running fleet, for
reasons that show up fast in real operations:

- A plaintext file on a server is readable by anything that compromises that box.
- You can't **rotate** a key across fifty machines by editing fifty files.
- You get no **audit trail** — no record of who read which secret when.
- There's no **access control** — "this service can read the DB password but not the signing key."

A **secret manager** (also called a secrets store or vault, categorically) solves these. It's a
dedicated service that stores secrets encrypted at rest, hands them out only to authenticated
callers, logs every access, and supports rotation and fine-grained access policies. At run time your
app — or the platform it runs on — fetches the secret from the manager into memory instead of
reading a file. The categories you'll encounter:

- **Cloud-provider managers** — every major cloud has one, tightly integrated with that cloud's
  identity system.
- **Standalone / self-hostable vaults** — dedicated secret-management products you run yourself, a
  good fit for the on-prem and air-gapped scenarios this audience often lives in (the same
  self-host instinct from Module 8).
- **Platform-native secrets** — your container orchestrator and your CI/CD system both have a
  built-in concept of "secrets" you can inject as environment variables, which is how secrets reach
  a pipeline (Module 14) or a deployment (Module 18) without ever touching the repo.

You don't need a manager for the lab or for a solo project. You need it the moment a secret has to
be available to *more than one machine you don't personally babysit*. The mental upgrade is the same
either way: **the app reads its secret from the environment; what populates the environment grows
up from a file to a service.** Your code doesn't change — that's the point of reading from the
environment all along.

---

## The AI angle

This module exists because of one specific, relentless AI failure mode: **AI loves to hardcode
secrets.** Ask any coding assistant to "add authentication," "connect to the database," or "call
the API," and a large fraction of the time it will write the key, token, or password directly into
the source file — often with a cheerful comment like `# your API key here`. It does this because
its training data is full of tutorials and quick examples that do exactly that, and because a
literal value is the path of least resistance to working code. The code *runs*, the demo *works*,
and a leak is now one `git commit` away.

This is the textbook case of the recurring course theme: **AI output that looks right and runs is
not the same as output that's safe.** A human who knows better still has to catch it, because the
model will keep offering it. Concretely:

- **Make "where did the secret go?" a review reflex.** Every time the AI touches auth, config, or a
  network call, read the `git diff` (Module 2) and grep the change for anything that looks like a
  key before you commit. The diff is where you catch it cheaply — *before* it's in history.
- **Tell the AI the pattern up front.** Put the rule in your committed instructions file (Module 5):
  *"Never hardcode secrets. Read all keys and config from environment variables; add new ones to
  `.env.example`."* A model given that house rule will usually write the `os.environ` version on the
  first try. This is the prevention-by-config payoff Module 5 promised.
- **Let the AI do the refactor — it's good at it.** The same model that hardcodes a key on the way
  in is genuinely good at pulling it back out when you ask: "move every hardcoded secret and
  environment-specific value into environment variables, fail loudly if they're missing, and update
  `.env.example`." That's exactly the lab.
- **Secret scanning is the backstop, not the plan (Module 15).** A scanner in CI catches the key
  you missed — but by then it may already be in a commit. Treat a scanner hit as a *rotation event*,
  not a code-review comment. The goal of this module is that the scanner stays quiet because the
  secret never reached the repo.

---

## Hands-on lab

**Lab language:** Python + shell, on a new `sync` feature for the `tasks-app` from Module 1.

You'll take a file that hardcodes a secret — the exact thing an AI hands you — and refactor it so
the secret lives in the environment and the real values never enter Git. Then you'll make it select
config per environment.

**You'll need:**

- The `tasks-app` folder from Modules 1–2 (a Git repo with a `.gitignore`).
- Python 3.10+ and a terminal.
- The starter files in this module's `lab/starter/`: `sync.py` (the before) and `.env.example`.
- Your AI assistant (browser or editor-integrated — by now, your choice).

### Part A — See the smell

1. Copy `lab/starter/sync.py` and `lab/starter/.env.example` into your `tasks-app` folder, then run
   the before-picture:

   ```bash
   cd ~/workflow-course/tasks-app
   python sync.py
   ```

   It prints a simulated request — including `Authorization: Bearer sk-live-...`. Open `sync.py` and
   find the two hardcoded lines: `API_KEY` and `BACKEND_URL`. **This is the AI default.** Picture
   this getting committed and pushed: the key is now in history forever (Module 12) and a secret
   scanner (Module 15) would light up — if you were lucky enough to have one.

### Part B — Gitignore the secret *first*

2. Before any real secret exists, close the door. Add these lines to your `.gitignore`:

   ```gitignore
   # secrets and local config — never commit
   .env
   .env.*
   !.env.example
   ```

3. Confirm Git will ignore a real `.env` but still track the template:

   ```bash
   printf 'APP_ENV=dev\nTASKS_API_KEY=sk-live-test-0000\n' > .env
   git status        # .env must NOT appear; .env.example and your .gitignore change SHOULD
   ```

   If `.env` shows up in `git status`, stop and fix the ignore rule before going further. This is
   the step that prevents the leak.

### Part C — Refactor the secret into the environment

4. Now move the secret and the environment-specific URL out of the code. Ask your AI:

   > *"Refactor `sync.py` so it reads `TASKS_API_KEY` and `APP_ENV` from environment variables
   > instead of hardcoding them. Pick the backend URL from `APP_ENV` (dev/staging/prod). Fail loudly
   > with a clear message if `TASKS_API_KEY` is missing. Don't add any third-party dependency — load
   > the `.env` file with a few lines of plain Python."*

   You're looking for a result shaped like this (read the diff before you accept it):

   ```python
   import os
   from pathlib import Path

   def load_dotenv(path: Path) -> None:
       """Minimal .env loader — no dependency. Real projects use a library for this."""
       if not path.exists():
           return
       for line in path.read_text().splitlines():
           line = line.strip()
           if not line or line.startswith("#") or "=" not in line:
               continue
           key, _, value = line.partition("=")
           os.environ.setdefault(key.strip(), value.strip())

   load_dotenv(Path(__file__).parent / ".env")

   ENVIRONMENTS = {
       "dev":     "https://api.dev.example-tasks.com/v1",
       "staging": "https://api.staging.example-tasks.com/v1",
       "prod":    "https://api.example-tasks.com/v1",
   }

   app_env = os.environ.get("APP_ENV", "dev")
   api_key = os.environ.get("TASKS_API_KEY")
   if not api_key:
       raise SystemExit("TASKS_API_KEY is not set. Copy .env.example to .env and fill it in.")
   backend_url = ENVIRONMENTS[app_env]
   ```

   Confirm there is **no literal key left anywhere** in `sync.py`:

   ```bash
   grep -n "sk-live" sync.py     # should print nothing
   ```

### Part D — Run it from the environment

5. Run it reading from your `.env`:

   ```bash
   python sync.py                # loads .env -> dev URL, key from the file
   ```

6. Now prove the 12-factor point: **same code, different environment, no edit.** Override at the
   command line to act like staging, then prod:

   ```bash
   # macOS / Linux
   APP_ENV=staging python sync.py
   APP_ENV=prod    TASKS_API_KEY="sk-live-prod-key" python sync.py
   ```

   ```powershell
   # Windows PowerShell
   $env:APP_ENV="staging"; python sync.py
   ```

   Watch the backend URL change with `APP_ENV` while the source never does. That's config in the
   environment.

### Part E — Commit, and verify the secret didn't tag along

7. Stage and **read the diff before committing** — the review reflex from the AI angle:

   ```bash
   git add -A
   git diff --cached            # the refactored sync.py + .gitignore + .env.example
   ```

   Confirm the diff contains the *template* and the *code that reads the environment*, and **not**
   the real key or your `.env`. Then:

   ```bash
   git commit -m "Read secrets and per-env config from the environment, not source"
   git status                   # clean; .env remains untracked
   ```

You've now done the exact refactor that turns the AI's default mistake into the correct pattern —
and left behind a `.env.example` so the next person (or agent) knows what to supply.

---

## Where it breaks

- **`.env` is not encryption.** A `.env` file is plaintext on disk. Gitignoring it keeps it out of
  *Git*, not out of reach of anything with access to your machine. It's the right tool for local
  dev and the wrong tool for a shared server — that's where a secret manager earns its place.
- **Environment variables leak in their own ways.** They can show up in process listings, crash
  dumps, log lines that print the whole environment, and child processes that inherit them. Reading
  from the environment is far better than hardcoding, but it's not a force field — don't log the
  environment, and scrub secrets from error reports.
- **A committed template can still leak by accident.** The whole scheme depends on `.env.example`
  staying free of real values. It's easy to "just fill it in to test" and commit it. Keep the
  placeholder discipline, and lean on the Module 15 scanner as the backstop for the day you slip.
- **The damage may already be done.** If a secret was *ever* committed — even in a commit you later
  reverted — assume it's compromised and **rotate it**. Removing it from current files does not
  remove it from history. Scrubbing history is possible but disruptive (and Module 12 warned you
  about rewriting shared history); rotation is the reliable fix.
- **Managed secrets aren't automatically safe.** A secret manager with over-broad access policies,
  or one whose secrets you copy into a `.env` "just for now," gives back everything it was supposed
  to protect. The tool only helps if least-privilege access and rotation are actually configured.

---

## Check for understanding

**You're done when:**

- `sync.py` runs entirely from the environment, and `grep "sk-live" sync.py` prints nothing.
- A real `.env` exists, contains your secret, and does **not** appear in `git status` — while
  `.env.example` is tracked.
- `APP_ENV=staging python sync.py` and the default run hit different backend URLs with **zero**
  source edits between them.
- You can state, in one sentence, why deleting a committed secret and re-committing does not fix the
  leak — and what the actual fix is (rotation).
- You've added a "never hardcode secrets; read from the environment" rule to your committed
  instructions file (Module 5), so the AI stops reintroducing the problem.

When the AI hands you a hardcoded key and your first instinct is "that goes in the environment, and
the diff has to prove it didn't reach Git," the reflex is installed. Module 18 takes this artifact —
built once, configured per environment — and ships it.

---

## Verify-before-publish

This is an expansion-zone module; the durable concepts (env vars, `.env`, 12-factor, the
config/secret/code split) are stable, but anything naming a specific product drifts. Before
publishing:

- [ ] **Keep secret-manager references categorical.** The text deliberately names *categories*
      (cloud-provider managers, standalone/self-hostable vaults, platform-native secrets), not
      products. If you add specific product names, re-verify each still exists, is current, and
      isn't pinned as *the* answer (vendor-neutral rule, AGENTS.md).
- [ ] **Re-check the 12-factor reference.** Confirm the [12factor.net](https://12factor.net) link
      resolves and that "factor III — config" is still phrased as "store config in the environment."
- [ ] **Re-verify `.gitignore` negation behavior.** Confirm `!.env.example` still un-ignores the
      template under the `.env.*` rule with a current Git, and that `git status` behaves as the lab
      claims.
- [ ] **Re-verify the Windows PowerShell syntax** (`$env:VAR="..."`) and the inline
      `VAR=value command` syntax for macOS/Linux against current shells.
- [ ] **Confirm dependency-free `.env` loading still reads correctly** under the current Python
      version, so the lab runs with no `pip install`.
- [ ] **Confirm cross-references** to Modules 2, 5, 8, 12, 14, 15, 16, and 18 still match those
      modules' final numbering and titles.