hvm-docs/PLAN.md

# Docs MCP Server — Build Guide

A reusable recipe for building a hosted MCP server over a product's
public documentation. Distilled from one production build; everything
product-specific has been factored out.

The end product is a streamable-HTTP MCP server with ~15 tools that
any LLM client (Claude Desktop, Claude Code, Cursor, Copilot) can
call to answer questions against the docs, surface what changed
recently, find inconsistencies, and (optionally) submit doc bugs
back upstream.

---

## What you're building

A pipeline with these stages:

```
upstream docs portal
        │
        ▼
   scrape  ──► corpus/<bundle>/<page>.md + .json sidecar
        │
        ▼
    chunk + embed  ──► chroma/  (dense vectors)
        │           ──► bm25/   (FTS5 lexical index)
        ▼
   MCP server  ──► search_docs / get_page / diff_versions / weekly_digest /
                   find_doc_inconsistencies / submit_doc_bug / ...
        │
        ▼
   reverse proxy / Cloudflare Tunnel ──► public endpoint
```

Two CI cadences:

- **Weekly cron** (~40 min): full re-scrape, re-chunk, re-embed,
  image build & push.
- **On-demand image-only** (~18 min): code-only rebuild from
  committed corpus, image build & push.

A container registry (self-hosted Gitea works well), a host running
Docker Compose, Watchtower auto-updating from `:latest`, and a
reverse proxy in front.

---

## Build phases

Each phase is a discrete, shippable unit. Build them in order; each
one is useful on its own and unlocks the next. Realistic effort per
phase is given as a rough order of magnitude. Total: roughly 2–3
weeks of focused work for the full stack.

### Phase 0 — Project skeleton  *(half a day)*

Goals: directory layout, dependency manifest, virtualenv.

- Top-level dirs: `scrape/`, `corpus/` (gitignored), `rag/`,
  `docs_mcp/`, `eval/`, `scripts/`, `deploy/`, `.gitea/workflows/`.
- `requirements.txt` with the dependencies you'll need across all
  phases (FastMCP, chromadb, httpx, beautifulsoup4 or whatever HTML
  parser, ollama or sentence-transformers client, etc.).
- `python -m venv venv` and pin Python version (3.11 or 3.12 — be
  conservative; some embedding libraries have version-specific
  wheels).
- `.gitignore`: `venv/`, `corpus/` (regenerable), `chroma/`
  (regenerable), `bm25/` (regenerable), `*.pyc`, `__pycache__/`,
  `.pytest_cache/`.

### Phase 1 — Scraper  *(2–4 days, product-specific)*

This is the most product-dependent phase. The goal is to write a
scraper that produces a normalized corpus layout regardless of
upstream portal shape.

Output shape (mandatory):

```
corpus/
  <bundle_id>/             # one dir per "doc bundle" — see Glossary
    <page_id>.md           # markdown body
    <page_id>.json         # sidecar with structured metadata
  ...
bundles.json               # catalog of bundles with metadata
```

**Bundle metadata** (`bundles.json` is a list of these):

```json
{
  "slug":          "<bundle_id>",
  "title":         "User-facing title",
  "version":       "10.9",
  "platform":      "VMware vSphere",   // may be null
  "product":       "Admin Guide",       // optional but useful
  "language":      "en-US",
  "page_count":    127,
  "dates": {
    "Added on":    "2024-01-15",
    "Updated on":  "2026-05-20"
  },
  "landing_page":  "<page_id>"
}
```

**Per-page sidecar** (`<page_id>.json`) carries page-level metadata.
The one field that matters cross-cutting is `topic_cluster` (see
Phase 9):

```json
{
  "bundle_id":     "<bundle_id>",
  "page_id":       "<page_id>",
  "title":         "How to ...",
  "ordinal":       42,
  "topic_cluster": {
    "clustering_title": "How to ...",
    "clustered_topics": [
      {"bundle_id": "...10.8", "page_id": "How_to_X.htm", "clustering_title": "..."},
      {"bundle_id": "...10.9", "page_id": "How_to_X.htm", "clustering_title": "..."}
    ]
  }
}
```

If the portal exposes a cross-version "this page corresponds to that
page" mapping, capture it here. If it doesn't, you can synthesize a
filename-based fallback (same filename across bundle versions = same
topic) and live without the editor-curated mapping. The features that
read `topic_cluster` (`list_cluster`, `diff_versions`,
`find_doc_inconsistencies`, parts of `weekly_digest`) will work
either way; they're more accurate with real clusters.

**Patterns that recur across doc portals:**

- Most modern doc portals are SPAs. Plain `requests.get` won't see
  rendered content. Either find the underlying API the SPA calls (the
  cheapest, most reliable path), or fall back to a headless browser
  (Playwright). The API path is almost always available; sniff the
  network tab.
- Portals usually expose a "bundle/topic" hierarchy under the hood
  (Zoomin, Madcap Flare, Paligo, GitBook, Docusaurus all do). Map
  it to `bundles.json` + `corpus/<bundle>/<page>`.
- Many portals expose `?save_local=` or `.pdf` rendered versions; the
  HTML they serve is structurally cleaner than what the page shows
  through the SPA shell.

**`scrape/changelog.py`** (~250 LOC; see Phase 13) — provides
`summarize_diff()`, `render_human()`, `walk_history()` and the
`--json` / `--history-out` modes. Mostly reusable as-is; the only
product-specific bit is the path layout assumption.

### Phase 2 — Chunking + embeddings + Chroma  *(2 days)*

Goal: build a queryable dense index from the scraped corpus.

- `rag/chunk.py` — split each page's markdown into ~400-600 token
  chunks. Strategy that works: paragraph-aware splitter with a
  rich "chunk 0" containing the page title + 1-sentence summary +
  bag-of-words from key terms. Chunk 0 is what dense retrieval lands
  on first; getting it right dominates retrieval quality.
- `rag/embeddings.py` — pluggable embedder. Recommended start:
  Ollama-hosted `nomic-embed-text` (768-dim, free, good baseline).
  Other defensible choices: `text-embedding-3-small` (OpenAI),
  `bge-m3` (also via Ollama). The embedder is a Chroma
  `EmbeddingFunction` that returns `list[list[float]]` for a list
  of texts.
- `rag/index.py` — orchestrates: read corpus → emit chunks (with
  metadata: bundle_id, page_id, version, platform, ordinal) →
  upsert into Chroma collection. `--rebuild` flag for a clean
  reindex. Run via `python -m rag.index --rebuild`.

Chroma settings: `PersistentClient(path="chroma/")` and
`Settings(anonymized_telemetry=False)`. Single collection
(`<product>_docs`).

**GPU note**: embedding 70K chunks on CPU takes hours; on a GPU
(via Ollama with `NVIDIA_VISIBLE_DEVICES`) takes ~10 minutes. Two
GPUs in parallel: ~5 minutes. The orchestrator just needs to load-
balance HTTP requests across multiple Ollama endpoints.

### Phase 3 — MCP server skeleton  *(1 day)*

Goal: working FastMCP server with three tools — `search_docs`,
`get_page`, `list_versions`.

- `docs_mcp/server.py` — `FastMCP("<product>-docs", stateless_http=True)`.
  `stateless_http=True` is critical for production hosting: every
  request creates an ephemeral session, so container recreates don't
  produce a 404 storm from stale `mcp-session-id` headers on
  clients.
- Lazy initialization for everything expensive (Chroma client,
  embedder, bundles catalog) so the server starts cleanly even when
  Ollama is briefly unreachable.
- Tool: `search_docs(query, version=None, platform=None,
  bundle_id=None, k=10)`. Returns markdown of top-k chunks with full
  source URLs.
- Tool: `get_page(bundle_id, page_id)`. Returns full page markdown +
  metadata.
- Tool: `list_versions()`. Returns the version/platform facets
  available, drawn from `bundles.json`. Helps the LLM pick filter
  values.

Transports: stdio (for local Claude Desktop dev), streamable-HTTP
(for hosted production). One argparse switch.

```python
@mcp.tool()
def search_docs(
    query: Annotated[str, Field(description="Natural-language query about <product>.")],
    version: Annotated[str | None, Field(description="Restrict to one version")] = None,
    ...
) -> str:
    ...
```

The tool descriptions are first-class context — the LLM reads them
and decides whether to call the tool. Treat them as button labels;
use "Call when..." / "Use proactively whenever..." phrasings.

### Phase 4 — Containerization  *(1 day)*

Goal: image you can run anywhere.

- `Dockerfile`: Python 3.12-slim base, install requirements, COPY
  `scrape rag diff docs_mcp` + `bundles.json` + `corpus/ chroma/`
  + (later) `bm25/`. Don't COPY `scripts/` — those stay external
  for ops use only.
- `ENTRYPOINT ["python", "-m", "docs_mcp.server",
  "--transport", "streamable-http"]`. Configurable host/port via env.
- `deploy/docker-compose.yml`: one service, named volumes for usage
  logs and any state, Watchtower label, depends_on for the reranker
  sidecar (Phase 6).

Smoke-test locally: `docker compose up` should expose
`http://localhost:8000/mcp` and respond to an MCP `initialize` JSON-RPC.

### Phase 5 — CI on self-hosted Gitea Actions  *(1–2 days)*

Goal: weekly cron rebuild + on-demand code-only ship cycle.

**Two workflows, two cadences:**

| Workflow | Trigger | Steps | Runtime |
|---|---|---|---|
| `refresh.yml` | Monday cron + manual dispatch | scrape → commit corpus → rebuild indexes → build & push image | ~40 min |
| `image-only.yml` | manual dispatch only | rebuild indexes from committed corpus → build & push image | ~18 min |

**Critical settings (learned the hard way):**

- `fetch-depth: 0` on `actions/checkout@v4`. The default depth is 1
  (shallow), which breaks any step that walks git history (changelog,
  digest history walker). Pay the ~10 second cost; never debug a
  "0-byte history file" mystery.
- `runs-on: docker` (Gitea convention, not `ubuntu-latest`).
- Runner shell is `/bin/sh` (dash), not bash. `${VAR::N}` substring
  expansion doesn't exist; use `cut` / `printf` / `awk`.

**Retry-on-race pattern for long-running scrapes:**

```bash
attempt=1
while [ $attempt -le 3 ]; do
  if git push; then
    echo "pushed (attempt $attempt)"
    break
  fi
  [ $attempt -eq 3 ] && { echo "still failing"; exit 1; }
  git fetch origin main
  git rebase origin/main || { echo "conflict — bail"; exit 1; }
  attempt=$((attempt + 1))
done
```

Works because scrape commits only touch `corpus/` + `bundles.json`,
and code merges only touch `.py` / `.yml` — disjoint paths, trivially
clean rebases.

**Image tagging — three tags per build:**

| Tag | Purpose |
|---|---|
| `:latest` | Watchtower watches this for auto-deploy |
| `:<sha12>` | Immutable; rollback target |
| `:<YYYY.MM.DD>` | Human-readable in incident notes |

Same tag set on every build; rollback is a one-line compose edit
to pin `:<sha>` instead of `:latest`.

**Container registry behind Cloudflare:**

Cloudflare's free tier has a 100 MB request body limit. Big image
layers (Chroma index can easily be 800+ MB) exceed it on push. The
fix is a LAN registry endpoint for push, public hostname for pull:

```yaml
env:
  REGISTRY_PUSH: <lan-ip>:<port>     # bypasses Cloudflare
  REGISTRY_PULL: <public-hostname>   # response bodies aren't capped
```

Runner needs the LAN endpoint in `/etc/docker/daemon.json`
`insecure-registries`. Costs nothing operationally; saves hours
of debugging.

**Registry GC:** weekly cron in the workflow that walks the package
versions, keeps `:latest` + N most-recent date tags + anything
pushed in the last 90 days, deletes the rest. Worth ~50 LOC; the
package GC on the Gitea side reclaims disk after.

### Phase 6 — Reranker  *(half a day)*

Goal: lift retrieval quality 3× by cross-encoder reranking the top-N
dense candidates.

- A `/v1/rerank` HTTP endpoint backed by `llama.cpp` serving
  `jina-reranker-v2-base` (GGUF). Runs as a sidecar in compose.
  GPU strongly recommended (CPU latency is unworkable for live
  queries).
- `_rerank(query, docs)` helper in the server: POST to the endpoint,
  apply the scores, re-sort the top-N candidates. Defensive: on any
  failure log a warning and fall through to dense-only.
- Env: `RERANK_URL` (off by default), `RERANK_POOL` (how deep to
  pull candidates for reranking; 200 is a good default),
  `RERANK_TIMEOUT` (30s for cold-start tolerance).
- **Watch the per-pair token limit.** Jina's GGUF reports
  `n_ctx_train=1024` and llama.cpp will reject the ENTIRE batch if
  any pair exceeds it. Truncate doc text to ~2000 chars before
  reranking. The full untruncated chunk still goes back to the user;
  truncation is only for the reranker scoring path.

### Phase 7 — Eval harness  *(1 day)*

Goal: hand-curated golden queries + standard metrics so you can
measure the impact of any retrieval change.

- `eval/queries.jsonl`: 20–25 hand-curated queries with expected
  pages. Spread across versions, platforms, and difficulty levels.
  Include the queries that "obviously" should work and DON'T —
  those are the ones to track.
- `eval/retrievers.py`: a `Retriever` protocol with concrete
  implementations: `DenseRetriever`, `RerankedRetriever`,
  `BM25Retriever` (Phase 8), `HybridRetriever` (Phase 8). One
  matrix dimension per knob.
- `eval/run_eval.py`: computes MRR / Recall@5 / nDCG@5 across all
  retrievers; emits a markdown comparison table at
  `eval/results/<baseline>.md`. Commit the result so PRs land with
  the A/B evidence in the diff.

Three numbers are enough — don't overengineer. The hand-curated
queries are the value; the metrics are just a stable way to score
them.

### Phase 8 — BM25 + Hybrid retrieval  *(half a day, conditional)*

**Skip unless your eval shows specific failure modes.** Dense
embeddings + cross-encoder reranker handle most queries. The case
where they don't: queries with rare technical tokens (filenames,
language names, error codes) get buried at dense rank 1000+ by a
much larger prose corpus that's semantically nearby. The reranker
only sees top-200, so it never gets a shot.

- `rag/bm25.py`: SQLite FTS5 index, in the stdlib, on-disk
  (`bm25/<product>.db`). Two tables — metadata table keyed by
  rowid, FTS5 virtual table for full-text. Sanitize the query
  (strip FTS5 reserved keywords, OR-join tokens for recall). ~210
  LOC.
- `_rrf_fuse()` in the server — Reciprocal Rank Fusion with `k=60`.
  Per-id score = `sum_over_retrievers(1 / (k + rank))`. Returns
  ordered ids plus per-retriever contribution dict for telemetry.
- `search_docs` hybrid path: run dense + BM25 in parallel,
  RRF-fuse, hand the merged top-200 to the reranker. Env-gated:
  `HYBRID_SEARCH=true`.
- Log `top1_source` per call (`dense_only` / `bm25_only` / `both`)
  to usage logs so you can measure whether BM25 is actually earning
  its keep on production traffic.

If after 4–6 weeks of production data you see `bm25_only >= 80%`,
you can simplify to BM25-only (much less infrastructure). If
`both >= 50%`, hybrid is acting as tie-breaker not rescue — keep it
or simplify depending on how much you care about the long tail.

### Phase 9 — Multi-version diff tooling  *(1 day, if applicable)*

**Only relevant if the product has multiple maintained versions.**

- `diff_versions(bundle_id, page_id, against_bundle_id)`: unified
  diff between two versions of the same page. Two matching
  strategies: editor-curated `topic_cluster` peer (if the portal
  exposes it), or same-filename fallback.
- `list_cluster(bundle_id, page_id)`: list cross-version peers
  for one page.
- `bundle_changelog(bundle_id_new, bundle_id_old)`: added /
  removed / changed pages between two bundles, sorted by churn.
- `_diff_churn(a, b)`: small helper, ~15 LOC of `difflib.unified_diff
  --unified=0` line counting. Used by `bundle_changelog`,
  `find_doc_inconsistencies`, and `weekly_digest`.

### Phase 10 — Usage logging  *(half a day)*

Goal: per-call JSONL telemetry so you can answer "what are people
actually asking for" and "is the new feature getting used."

- `docs_mcp/usage.py`: `TimedCall` context manager that captures
  tool name, args, elapsed time, hits returned, any extra fields
  set by the tool via `_call.set(key=value)`. Writes JSONL to
  `var/logs/usage.jsonl`, rotated daily, kept 90 days.
- Mount the log dir as a named compose volume so logs survive
  container recreates.
- `scripts/usage_report.py` (standalone, no docs_mcp deps): reads
  the JSONL files, prints per-tool counts, top queries, 0-hit
  queries, filter usage histogram, reranker activity. Markdown
  output flag for piping into weekly digest emails.

What to log: query text, filters, hits returned, elapsed_ms,
reranker_fired flag, hybrid top1_source, retrieval_mode. What NOT
to log: anything PII-shaped. The corpus is public, queries are
usually about the product, not personal — but be deliberate.

### Phase 11 — Curated knowledge layer  *(2 days)*

The "RAG can't tell you what isn't in the docs" gap. Surfaces:

- **API quickstart repos** if the product has them. Ingest the
  example scripts (Python, PowerShell, curl) into the corpus.
  Rewrite chunk-0 for each script to embed naturally — explicit
  natural-language H1, task description sentence, keyword bag.
  Dense embeddings need an anchor.
- **A curated `<product>_api_lessons` markdown doc** for things
  the swagger / OpenAPI doesn't say: auth flow gotchas, async-task
  patterns, schema bugs you've hit, platform-detection quirks.
  Surface as a dedicated MCP tool whose description tells the LLM:
  *"Call proactively whenever the user asks you to write a script
  / integrate with the API / debug a 4xx response."*
- **An auto-hint banner** in `search_docs` results — when the
  query matches a script/API trigger word, render a one-line nudge
  at the top of results pointing at the dedicated tool. Belt-and-
  suspenders for queries where the LLM doesn't think to call it
  proactively.

### Phase 12 — Doc-bug workflow tools  *(1 day, optional)*

Two tools that pair up to enable a *"check the docs for
inconsistencies, draft bugs, confirm, submit"* workflow.

- `find_doc_inconsistencies(scope_query, version=None, platform=None,
  max_pages=30, checks=None)`: deterministic, read-only. Two checks:
  cross-version drift (pages whose content shifted between immediate-
  previous versions in the actionable 10–60% churn band) and
  redirect-chain detection (short pages whose body is just a "see
  [other page] for details" pointer). Heavy lifting is line-level
  diff (`difflib`) against editor-curated cluster peers; the model
  judges which findings are real bugs.

- `submit_doc_bug(page_url, content, email=None, rating=None,
  like=None)`: POSTs to the docs portal's feedback endpoint.
  Env-gated by `DOC_BUG_SUBMIT_ENABLED=true` so dev/staging
  deployments can't accidentally hit the upstream. The tool's
  docstring is loud about a mandatory operator-confirmation
  workflow per submission — LLM must draft, show, ask, then
  submit. Explicit *"do not loop"* instruction. Defensive
  validation upfront (URL host matches expected portal, content
  non-empty, etc.) so the LLM gets a clean error instead of a
  rejected POST.

**You'll need to find the docs portal's feedback endpoint.** Most
portals route the "Was this helpful?" widget through a backend
API; sniff the browser network tab on the live site. The payload
shape varies; common fields: content/body, page url/href, optional
email, optional rating, optional thumbs. Most accept anonymous
POSTs with no captcha at the JSON-API layer (even if the widget
shows a captcha). Validate before you ship — and if the endpoint
has rate limits or captcha enforcement, the tool returns a clean
"submission rejected — paste manually at <url>" fallback.

The whole point is the per-bug operator confirmation in the
LLM-side conversation flow; the tool description enforces it. Do
not bypass.

### Phase 13 — Weekly digest tool  *(half a day)*

Goal: a tool that answers *"what changed in the docs in the last N
days?"* with no runtime git dependency (the prod container has no
git).

- Extend `scrape/changelog.py` with `--json` (one-shot structured
  output) and `--history-out PATH` (walks `git log --first-parent
  --since="<N> days ago"` for corpus-touching commits, writes one
  JSON line per commit to a JSONL file).
- CI workflows write the JSONL file into the image at build time:
  `corpus/.digest/history.jsonl`. Both `refresh.yml` and
  `image-only.yml`. **`fetch-depth: 0` is required** — see Phase 5.
- New MCP tool `weekly_digest(days=7, version=None, platform=None,
  max_bundles=25, max_pages_per_bundle=10)`: reads the JSONL,
  filters to the window, applies version/platform via
  `bundles.json` metadata, aggregates per-bundle change counts and
  page lists, renders markdown.
- Post-filter totals are critical: the headline "X page changes
  across Y bundles" must compute X from the filtered set, not the
  raw record count. Otherwise filtered calls look wrong to the
  reader.

Out of scope but trivial bolt-ons: scheduled HTML email of the
digest, auto-publish to a blog, per-page diff excerpts as a
follow-up tool.

---

## Standard tool set

By the end you'll have ~15 tools registered. Production-tested
shape:

| Tool | What it does |
|---|---|
| `search_docs` | Semantic search with version/platform/bundle filters |
| `get_page` | Full markdown + metadata for one page |
| `list_versions` | Discover available facet values |
| `list_cluster` | Cross-version peers for one page (if applicable) |
| `diff_versions` | Unified diff of a page across two versions |
| `bundle_changelog` | Added / removed / changed pages between two bundles |
| `weekly_digest` | What changed in the last N days, with filters |
| `corpus_status` | Freshness + size of the knowledge base |
| `find_doc_inconsistencies` | Scoped scan for doc bugs |
| `submit_doc_bug` | Submit a drafted bug (env-gated, operator-confirmed) |
| `<product>_api_lessons` | Curated API gotchas, proactively-called |
| product-specific tools | Interop matrix, lifecycle queries, etc. |

---

## Per-product customization checklist

When applying this template to a new product, here's what you have
to figure out yourself — everything else is shared infrastructure:

- **Doc portal mechanics**
  - URL pattern for pages
  - Bundle/version concept (Zoomin "bundle", Madcap "project",
    GitBook "space", Docusaurus "docs version" — same idea, different
    name)
  - SPA backing API (sniff the network tab) or fallback to
    headless browser
  - How `topic_cluster` -equivalent cross-version peers are exposed
    (or whether you synthesize them from filenames)
- **Bundle metadata schema**
  - What does `version` look like? Semver, calendar, named?
  - What does `platform` mean for this product? Is there a useful
    facet at all?
  - Other useful facets (language, product line, edition)?
- **Filterable facets** for `search_docs`
  - One filter per high-cardinality facet
  - Skip filters that have <5 distinct values — they're not worth
    the surface area
- **Feedback endpoint** (for `submit_doc_bug`, if you want it)
  - URL of the POST endpoint
  - Required + optional payload fields
  - Captcha / rate-limit behavior
  - Whether anonymous submissions are accepted
- **Curated knowledge** for the `_api_lessons` tool
  - What does the product's API documentation NOT say that you've
    learned from real integration work?
- **Quickstart / example repos**
  - Does the vendor publish working code? Ingest it; rewrite
    chunk-0 for natural-language retrieval.

---

## Decisions worth carrying forward

Things you'll save time on by deciding the same way again:

- **Tool descriptions are user interface.** The LLM reads them
  verbatim and decides whether to call the tool. *"Use when..."*
  and *"Call proactively whenever..."* are real surfaces; treat
  them like button labels. Most retrieval improvements turn out
  to be tool-description rewrites in disguise.
- **`stateless_http=True`** on the FastMCP server. Eliminates
  whole categories of session-ID-related 404 storms after
  container recreates.
- **Pre-bake everything at CI time.** No runtime calls to git,
  external services, or anything you wouldn't trust on a
  Cloudflare outage. If the digest needs git history, write a
  JSONL file at CI time. If the lessons doc needs to load fast,
  bake it into the image.
- **Env-gate every side-effecting tool.** Off by default in dev;
  on only in production compose. Belt and suspenders against
  accidental writes from staging environments.
- **Operator-confirmation pattern for side-effecting tools.**
  The tool docstring is the only place to enforce
  human-in-the-loop. Make it loud. "MANDATORY", "Do not loop",
  "show-confirm-then-submit" — those phrasings work.
- **Verify with hand-curated golden queries before shipping any
  retrieval change.** Numbers in the diff, in the commit message.
  Don't ship retrieval changes on vibes.
- **Two-cadence CI** (weekly scrape vs on-demand code-only)
  saves hours per code iteration once you're past the
  one-iteration-a-week stage.
- **Rolling tag + sha-pinned tag** deploy pattern. `:latest` is
  what Watchtower watches; `:<sha>` is your safety net. Rollback
  is a one-line compose edit, not a redeploy.
- **Usage logging is non-negotiable.** You will be wrong about
  what people use. Capture the truth from day one; let it tell
  you which features to keep building and which to delete.

---

## Glossary

- **Bundle** — one logical doc set in the portal. Zoomin calls
  them bundles; Madcap calls them projects; the concept is the
  same: a versioned, titled collection of pages. One dir under
  `corpus/`.
- **Page** — one HTML page in a bundle. One `.md` + one `.json`
  sidecar under the bundle dir.
- **Topic cluster** — Zoomin's name for "this page in version
  10.9 corresponds to that page in version 10.8." Stored in the
  per-page sidecar. The portal-agnostic concept is "cross-version
  peer mapping."
- **Chunk** — a unit of text that gets independently embedded and
  stored in Chroma. Target ~400-600 tokens; preserve paragraph
  boundaries.
- **RRF** — Reciprocal Rank Fusion. The way to merge two ranked
  lists from independent retrievers without score calibration.

---

## What's deliberately NOT in this template

Decisions you should make per-product (not copy from the original
build):

- The reverse proxy and TLS termination layer. Could be Caddy,
  nginx, Traefik, Cloudflare Tunnel — pick what your infra uses.
- The Gateway / aggregator in front of multiple MCPs (MetaMCP is one
  option; you may not need any aggregator if you're running a
  single product MCP).
- The specific embedding model — `nomic-embed-text` is a strong
  default but newer / domain-specific models may be better for
  some products.
- The Ollama containers / GPU setup — depends on what hardware you
  have. The pattern is one container per GPU with explicit
  `NVIDIA_VISIBLE_DEVICES` pinning; the indexer load-balances
  across them.
- Whether to publish a blog series alongside the build. Strongly
  recommended (forces clarity, builds an audience), but optional.