initial: docs-mcp-template — build guide + scaffolded server

Template for building hosted MCP servers over a product's public documentation. Distilled from one production build; everything product-specific has been factored out. Contents: - PLAN.md — comprehensive build guide. 13 phases from project skeleton through weekly_digest. Includes the gotchas ("fetch-depth: 0 always", reranker per-pair token limit, Cloudflare body cap, dash-not-bash on Gitea runners), the decisions worth carrying forward, and a per-product customization checklist. - CLAUDE.md — guidance for Claude Code working in a clone of this template. Phase identification table, conventions (env-gating + operator confirmation for side-effecting tools, defensive fallback for retrieval components), common commands. - README.md — quick-start summary. Scaffolded code (all signature-stable, with NotImplementedError stubs where phase-specific work is required): docs_mcp/server.py FastMCP server, stateless_http=True, with search_docs / get_page / list_versions baseline tools and commented stubs for the rest of the phase set. docs_mcp/usage.py TimedCall telemetry, JSONL, daily rotation, 90-day retention. Reusable as-is. rag/embeddings.py Ollama embedder (nomic-embed-text default), load-balanced across N URLs. Reusable. rag/chunk.py Paragraph-aware chunker with synthetic chunk 0. Per-product tunable. rag/index.py Chroma + BM25 builder. --rebuild and --bm25-only flags. rag/bm25.py SQLite FTS5 lexical index. Reusable. scrape/changelog.py --cached / --ref / --json / --history-out. Reusable. scrape/README.md What you write per-product. eval/queries.jsonl.example Curate ~25 hand-labeled queries here. eval/retrievers.py Retriever protocol + stub classes. eval/run_eval.py MRR / Recall@K / nDCG@K harness skeleton. scripts/usage_report.py Standalone log analyzer; the FOLLOW-UP CHECKS pattern noted in the module docstring. scripts/registry_gc.py Gitea container registry cleanup. Reusable. Deployment + CI: Dockerfile Python 3.12-slim; COPY corpus + chroma + bm25 last for cache efficiency. deploy/docker-compose.yml MCP + reranker sidecar + Watchtower. Templated with <placeholders>. .gitea/workflows/refresh.yml Weekly cron + manual dispatch. fetch-depth: 0, retry-on-race, three-tag image scheme. .gitea/workflows/image-only.yml Code-only ship cycle, ~18min. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:18:17 -04:00
commit 9ba615c8ee
26 changed files with 3280 additions and 0 deletions
@@ -0,0 +1,647 @@
+# Docs MCP Server — Build Guide
+
+A reusable recipe for building a hosted MCP server over a product's
+public documentation. Distilled from one production build; everything
+product-specific has been factored out.
+
+The end product is a streamable-HTTP MCP server with ~15 tools that
+any LLM client (Claude Desktop, Claude Code, Cursor, Copilot) can
+call to answer questions against the docs, surface what changed
+recently, find inconsistencies, and (optionally) submit doc bugs
+back upstream.
+
+---
+
+## What you're building
+
+A pipeline with these stages:
+
+```
+upstream docs portal
+        │
+        ▼
+   scrape  ──► corpus/<bundle>/<page>.md + .json sidecar
+        │
+        ▼
+    chunk + embed  ──► chroma/  (dense vectors)
+        │           ──► bm25/   (FTS5 lexical index)
+        ▼
+   MCP server  ──► search_docs / get_page / diff_versions / weekly_digest /
+                   find_doc_inconsistencies / submit_doc_bug / ...
+        │
+        ▼
+   reverse proxy / Cloudflare Tunnel ──► public endpoint
+```
+
+Two CI cadences:
+
+- **Weekly cron** (~40 min): full re-scrape, re-chunk, re-embed,
+  image build & push.
+- **On-demand image-only** (~18 min): code-only rebuild from
+  committed corpus, image build & push.
+
+A container registry (self-hosted Gitea works well), a host running
+Docker Compose, Watchtower auto-updating from `:latest`, and a
+reverse proxy in front.
+
+---
+
+## Build phases
+
+Each phase is a discrete, shippable unit. Build them in order; each
+one is useful on its own and unlocks the next. Realistic effort per
+phase is given as a rough order of magnitude. Total: roughly 2–3
+weeks of focused work for the full stack.
+
+### Phase 0 — Project skeleton  *(half a day)*
+
+Goals: directory layout, dependency manifest, virtualenv.
+
+- Top-level dirs: `scrape/`, `corpus/` (gitignored), `rag/`,
+  `docs_mcp/`, `eval/`, `scripts/`, `deploy/`, `.gitea/workflows/`.
+- `requirements.txt` with the dependencies you'll need across all
+  phases (FastMCP, chromadb, httpx, beautifulsoup4 or whatever HTML
+  parser, ollama or sentence-transformers client, etc.).
+- `python -m venv venv` and pin Python version (3.11 or 3.12 — be
+  conservative; some embedding libraries have version-specific
+  wheels).
+- `.gitignore`: `venv/`, `corpus/` (regenerable), `chroma/`
+  (regenerable), `bm25/` (regenerable), `*.pyc`, `__pycache__/`,
+  `.pytest_cache/`.
+
+### Phase 1 — Scraper  *(2–4 days, product-specific)*
+
+This is the most product-dependent phase. The goal is to write a
+scraper that produces a normalized corpus layout regardless of
+upstream portal shape.
+
+Output shape (mandatory):
+
+```
+corpus/
+  <bundle_id>/             # one dir per "doc bundle" — see Glossary
+    <page_id>.md           # markdown body
+    <page_id>.json         # sidecar with structured metadata
+  ...
+bundles.json               # catalog of bundles with metadata
+```
+
+**Bundle metadata** (`bundles.json` is a list of these):
+
+```json
+{
+  "slug":          "<bundle_id>",
+  "title":         "User-facing title",
+  "version":       "10.9",
+  "platform":      "VMware vSphere",   // may be null
+  "product":       "Admin Guide",       // optional but useful
+  "language":      "en-US",
+  "page_count":    127,
+  "dates": {
+    "Added on":    "2024-01-15",
+    "Updated on":  "2026-05-20"
+  },
+  "landing_page":  "<page_id>"
+}
+```
+
+**Per-page sidecar** (`<page_id>.json`) carries page-level metadata.
+The one field that matters cross-cutting is `topic_cluster` (see
+Phase 9):
+
+```json
+{
+  "bundle_id":     "<bundle_id>",
+  "page_id":       "<page_id>",
+  "title":         "How to ...",
+  "ordinal":       42,
+  "topic_cluster": {
+    "clustering_title": "How to ...",
+    "clustered_topics": [
+      {"bundle_id": "...10.8", "page_id": "How_to_X.htm", "clustering_title": "..."},
+      {"bundle_id": "...10.9", "page_id": "How_to_X.htm", "clustering_title": "..."}
+    ]
+  }
+}
+```
+
+If the portal exposes a cross-version "this page corresponds to that
+page" mapping, capture it here. If it doesn't, you can synthesize a
+filename-based fallback (same filename across bundle versions = same
+topic) and live without the editor-curated mapping. The features that
+read `topic_cluster` (`list_cluster`, `diff_versions`,
+`find_doc_inconsistencies`, parts of `weekly_digest`) will work
+either way; they're more accurate with real clusters.
+
+**Patterns that recur across doc portals:**
+
+- Most modern doc portals are SPAs. Plain `requests.get` won't see
+  rendered content. Either find the underlying API the SPA calls (the
+  cheapest, most reliable path), or fall back to a headless browser
+  (Playwright). The API path is almost always available; sniff the
+  network tab.
+- Portals usually expose a "bundle/topic" hierarchy under the hood
+  (Zoomin, Madcap Flare, Paligo, GitBook, Docusaurus all do). Map
+  it to `bundles.json` + `corpus/<bundle>/<page>`.
+- Many portals expose `?save_local=` or `.pdf` rendered versions; the
+  HTML they serve is structurally cleaner than what the page shows
+  through the SPA shell.
+
+**`scrape/changelog.py`** (~250 LOC; see Phase 13) — provides
+`summarize_diff()`, `render_human()`, `walk_history()` and the
+`--json` / `--history-out` modes. Mostly reusable as-is; the only
+product-specific bit is the path layout assumption.
+
+### Phase 2 — Chunking + embeddings + Chroma  *(2 days)*
+
+Goal: build a queryable dense index from the scraped corpus.
+
+- `rag/chunk.py` — split each page's markdown into ~400-600 token
+  chunks. Strategy that works: paragraph-aware splitter with a
+  rich "chunk 0" containing the page title + 1-sentence summary +
+  bag-of-words from key terms. Chunk 0 is what dense retrieval lands
+  on first; getting it right dominates retrieval quality.
+- `rag/embeddings.py` — pluggable embedder. Recommended start:
+  Ollama-hosted `nomic-embed-text` (768-dim, free, good baseline).
+  Other defensible choices: `text-embedding-3-small` (OpenAI),
+  `bge-m3` (also via Ollama). The embedder is a Chroma
+  `EmbeddingFunction` that returns `list[list[float]]` for a list
+  of texts.
+- `rag/index.py` — orchestrates: read corpus → emit chunks (with
+  metadata: bundle_id, page_id, version, platform, ordinal) →
+  upsert into Chroma collection. `--rebuild` flag for a clean
+  reindex. Run via `python -m rag.index --rebuild`.
+
+Chroma settings: `PersistentClient(path="chroma/")` and
+`Settings(anonymized_telemetry=False)`. Single collection
+(`<product>_docs`).
+
+**GPU note**: embedding 70K chunks on CPU takes hours; on a GPU
+(via Ollama with `NVIDIA_VISIBLE_DEVICES`) takes ~10 minutes. Two
+GPUs in parallel: ~5 minutes. The orchestrator just needs to load-
+balance HTTP requests across multiple Ollama endpoints.
+
+### Phase 3 — MCP server skeleton  *(1 day)*
+
+Goal: working FastMCP server with three tools — `search_docs`,
+`get_page`, `list_versions`.
+
+- `docs_mcp/server.py` — `FastMCP("<product>-docs", stateless_http=True)`.
+  `stateless_http=True` is critical for production hosting: every
+  request creates an ephemeral session, so container recreates don't
+  produce a 404 storm from stale `mcp-session-id` headers on
+  clients.
+- Lazy initialization for everything expensive (Chroma client,
+  embedder, bundles catalog) so the server starts cleanly even when
+  Ollama is briefly unreachable.
+- Tool: `search_docs(query, version=None, platform=None,
+  bundle_id=None, k=10)`. Returns markdown of top-k chunks with full
+  source URLs.
+- Tool: `get_page(bundle_id, page_id)`. Returns full page markdown +
+  metadata.
+- Tool: `list_versions()`. Returns the version/platform facets
+  available, drawn from `bundles.json`. Helps the LLM pick filter
+  values.
+
+Transports: stdio (for local Claude Desktop dev), streamable-HTTP
+(for hosted production). One argparse switch.
+
+```python
+@mcp.tool()
+def search_docs(
+    query: Annotated[str, Field(description="Natural-language query about <product>.")],
+    version: Annotated[str | None, Field(description="Restrict to one version")] = None,
+    ...
+) -> str:
+    ...
+```
+
+The tool descriptions are first-class context — the LLM reads them
+and decides whether to call the tool. Treat them as button labels;
+use "Call when..." / "Use proactively whenever..." phrasings.
+
+### Phase 4 — Containerization  *(1 day)*
+
+Goal: image you can run anywhere.
+
+- `Dockerfile`: Python 3.12-slim base, install requirements, COPY
+  `scrape rag diff docs_mcp` + `bundles.json` + `corpus/ chroma/`
+  + (later) `bm25/`. Don't COPY `scripts/` — those stay external
+  for ops use only.
+- `ENTRYPOINT ["python", "-m", "docs_mcp.server",
+  "--transport", "streamable-http"]`. Configurable host/port via env.
+- `deploy/docker-compose.yml`: one service, named volumes for usage
+  logs and any state, Watchtower label, depends_on for the reranker
+  sidecar (Phase 6).
+
+Smoke-test locally: `docker compose up` should expose
+`http://localhost:8000/mcp` and respond to an MCP `initialize` JSON-RPC.
+
+### Phase 5 — CI on self-hosted Gitea Actions  *(1–2 days)*
+
+Goal: weekly cron rebuild + on-demand code-only ship cycle.
+
+**Two workflows, two cadences:**
+
+| Workflow | Trigger | Steps | Runtime |
+|---|---|---|---|
+| `refresh.yml` | Monday cron + manual dispatch | scrape → commit corpus → rebuild indexes → build & push image | ~40 min |
+| `image-only.yml` | manual dispatch only | rebuild indexes from committed corpus → build & push image | ~18 min |
+
+**Critical settings (learned the hard way):**
+
+- `fetch-depth: 0` on `actions/checkout@v4`. The default depth is 1
+  (shallow), which breaks any step that walks git history (changelog,
+  digest history walker). Pay the ~10 second cost; never debug a
+  "0-byte history file" mystery.
+- `runs-on: docker` (Gitea convention, not `ubuntu-latest`).
+- Runner shell is `/bin/sh` (dash), not bash. `${VAR::N}` substring
+  expansion doesn't exist; use `cut` / `printf` / `awk`.
+
+**Retry-on-race pattern for long-running scrapes:**
+
+```bash
+attempt=1
+while [ $attempt -le 3 ]; do
+  if git push; then
+    echo "pushed (attempt $attempt)"
+    break
+  fi
+  [ $attempt -eq 3 ] && { echo "still failing"; exit 1; }
+  git fetch origin main
+  git rebase origin/main || { echo "conflict — bail"; exit 1; }
+  attempt=$((attempt + 1))
+done
+```
+
+Works because scrape commits only touch `corpus/` + `bundles.json`,
+and code merges only touch `.py` / `.yml` — disjoint paths, trivially
+clean rebases.
+
+**Image tagging — three tags per build:**
+
+| Tag | Purpose |
+|---|---|
+| `:latest` | Watchtower watches this for auto-deploy |
+| `:<sha12>` | Immutable; rollback target |
+| `:<YYYY.MM.DD>` | Human-readable in incident notes |
+
+Same tag set on every build; rollback is a one-line compose edit
+to pin `:<sha>` instead of `:latest`.
+
+**Container registry behind Cloudflare:**
+
+Cloudflare's free tier has a 100 MB request body limit. Big image
+layers (Chroma index can easily be 800+ MB) exceed it on push. The
+fix is a LAN registry endpoint for push, public hostname for pull:
+
+```yaml
+env:
+  REGISTRY_PUSH: <lan-ip>:<port>     # bypasses Cloudflare
+  REGISTRY_PULL: <public-hostname>   # response bodies aren't capped
+```
+
+Runner needs the LAN endpoint in `/etc/docker/daemon.json`
+`insecure-registries`. Costs nothing operationally; saves hours
+of debugging.
+
+**Registry GC:** weekly cron in the workflow that walks the package
+versions, keeps `:latest` + N most-recent date tags + anything
+pushed in the last 90 days, deletes the rest. Worth ~50 LOC; the
+package GC on the Gitea side reclaims disk after.
+
+### Phase 6 — Reranker  *(half a day)*
+
+Goal: lift retrieval quality 3× by cross-encoder reranking the top-N
+dense candidates.
+
+- A `/v1/rerank` HTTP endpoint backed by `llama.cpp` serving
+  `jina-reranker-v2-base` (GGUF). Runs as a sidecar in compose.
+  GPU strongly recommended (CPU latency is unworkable for live
+  queries).
+- `_rerank(query, docs)` helper in the server: POST to the endpoint,
+  apply the scores, re-sort the top-N candidates. Defensive: on any
+  failure log a warning and fall through to dense-only.
+- Env: `RERANK_URL` (off by default), `RERANK_POOL` (how deep to
+  pull candidates for reranking; 200 is a good default),
+  `RERANK_TIMEOUT` (30s for cold-start tolerance).
+- **Watch the per-pair token limit.** Jina's GGUF reports
+  `n_ctx_train=1024` and llama.cpp will reject the ENTIRE batch if
+  any pair exceeds it. Truncate doc text to ~2000 chars before
+  reranking. The full untruncated chunk still goes back to the user;
+  truncation is only for the reranker scoring path.
+
+### Phase 7 — Eval harness  *(1 day)*
+
+Goal: hand-curated golden queries + standard metrics so you can
+measure the impact of any retrieval change.
+
+- `eval/queries.jsonl`: 20–25 hand-curated queries with expected
+  pages. Spread across versions, platforms, and difficulty levels.
+  Include the queries that "obviously" should work and DON'T —
+  those are the ones to track.
+- `eval/retrievers.py`: a `Retriever` protocol with concrete
+  implementations: `DenseRetriever`, `RerankedRetriever`,
+  `BM25Retriever` (Phase 8), `HybridRetriever` (Phase 8). One
+  matrix dimension per knob.
+- `eval/run_eval.py`: computes MRR / Recall@5 / nDCG@5 across all
+  retrievers; emits a markdown comparison table at
+  `eval/results/<baseline>.md`. Commit the result so PRs land with
+  the A/B evidence in the diff.
+
+Three numbers are enough — don't overengineer. The hand-curated
+queries are the value; the metrics are just a stable way to score
+them.
+
+### Phase 8 — BM25 + Hybrid retrieval  *(half a day, conditional)*
+
+**Skip unless your eval shows specific failure modes.** Dense
+embeddings + cross-encoder reranker handle most queries. The case
+where they don't: queries with rare technical tokens (filenames,
+language names, error codes) get buried at dense rank 1000+ by a
+much larger prose corpus that's semantically nearby. The reranker
+only sees top-200, so it never gets a shot.
+
+- `rag/bm25.py`: SQLite FTS5 index, in the stdlib, on-disk
+  (`bm25/<product>.db`). Two tables — metadata table keyed by
+  rowid, FTS5 virtual table for full-text. Sanitize the query
+  (strip FTS5 reserved keywords, OR-join tokens for recall). ~210
+  LOC.
+- `_rrf_fuse()` in the server — Reciprocal Rank Fusion with `k=60`.
+  Per-id score = `sum_over_retrievers(1 / (k + rank))`. Returns
+  ordered ids plus per-retriever contribution dict for telemetry.
+- `search_docs` hybrid path: run dense + BM25 in parallel,
+  RRF-fuse, hand the merged top-200 to the reranker. Env-gated:
+  `HYBRID_SEARCH=true`.
+- Log `top1_source` per call (`dense_only` / `bm25_only` / `both`)
+  to usage logs so you can measure whether BM25 is actually earning
+  its keep on production traffic.
+
+If after 4–6 weeks of production data you see `bm25_only >= 80%`,
+you can simplify to BM25-only (much less infrastructure). If
+`both >= 50%`, hybrid is acting as tie-breaker not rescue — keep it
+or simplify depending on how much you care about the long tail.
+
+### Phase 9 — Multi-version diff tooling  *(1 day, if applicable)*
+
+**Only relevant if the product has multiple maintained versions.**
+
+- `diff_versions(bundle_id, page_id, against_bundle_id)`: unified
+  diff between two versions of the same page. Two matching
+  strategies: editor-curated `topic_cluster` peer (if the portal
+  exposes it), or same-filename fallback.
+- `list_cluster(bundle_id, page_id)`: list cross-version peers
+  for one page.
+- `bundle_changelog(bundle_id_new, bundle_id_old)`: added /
+  removed / changed pages between two bundles, sorted by churn.
+- `_diff_churn(a, b)`: small helper, ~15 LOC of `difflib.unified_diff
+  --unified=0` line counting. Used by `bundle_changelog`,
+  `find_doc_inconsistencies`, and `weekly_digest`.
+
+### Phase 10 — Usage logging  *(half a day)*
+
+Goal: per-call JSONL telemetry so you can answer "what are people
+actually asking for" and "is the new feature getting used."
+
+- `docs_mcp/usage.py`: `TimedCall` context manager that captures
+  tool name, args, elapsed time, hits returned, any extra fields
+  set by the tool via `_call.set(key=value)`. Writes JSONL to
+  `var/logs/usage.jsonl`, rotated daily, kept 90 days.
+- Mount the log dir as a named compose volume so logs survive
+  container recreates.
+- `scripts/usage_report.py` (standalone, no docs_mcp deps): reads
+  the JSONL files, prints per-tool counts, top queries, 0-hit
+  queries, filter usage histogram, reranker activity. Markdown
+  output flag for piping into weekly digest emails.
+
+What to log: query text, filters, hits returned, elapsed_ms,
+reranker_fired flag, hybrid top1_source, retrieval_mode. What NOT
+to log: anything PII-shaped. The corpus is public, queries are
+usually about the product, not personal — but be deliberate.
+
+### Phase 11 — Curated knowledge layer  *(2 days)*
+
+The "RAG can't tell you what isn't in the docs" gap. Surfaces:
+
+- **API quickstart repos** if the product has them. Ingest the
+  example scripts (Python, PowerShell, curl) into the corpus.
+  Rewrite chunk-0 for each script to embed naturally — explicit
+  natural-language H1, task description sentence, keyword bag.
+  Dense embeddings need an anchor.
+- **A curated `<product>_api_lessons` markdown doc** for things
+  the swagger / OpenAPI doesn't say: auth flow gotchas, async-task
+  patterns, schema bugs you've hit, platform-detection quirks.
+  Surface as a dedicated MCP tool whose description tells the LLM:
+  *"Call proactively whenever the user asks you to write a script
+  / integrate with the API / debug a 4xx response."*
+- **An auto-hint banner** in `search_docs` results — when the
+  query matches a script/API trigger word, render a one-line nudge
+  at the top of results pointing at the dedicated tool. Belt-and-
+  suspenders for queries where the LLM doesn't think to call it
+  proactively.
+
+### Phase 12 — Doc-bug workflow tools  *(1 day, optional)*
+
+Two tools that pair up to enable a *"check the docs for
+inconsistencies, draft bugs, confirm, submit"* workflow.
+
+- `find_doc_inconsistencies(scope_query, version=None, platform=None,
+  max_pages=30, checks=None)`: deterministic, read-only. Two checks:
+  cross-version drift (pages whose content shifted between immediate-
+  previous versions in the actionable 10–60% churn band) and
+  redirect-chain detection (short pages whose body is just a "see
+  [other page] for details" pointer). Heavy lifting is line-level
+  diff (`difflib`) against editor-curated cluster peers; the model
+  judges which findings are real bugs.
+
+- `submit_doc_bug(page_url, content, email=None, rating=None,
+  like=None)`: POSTs to the docs portal's feedback endpoint.
+  Env-gated by `DOC_BUG_SUBMIT_ENABLED=true` so dev/staging
+  deployments can't accidentally hit the upstream. The tool's
+  docstring is loud about a mandatory operator-confirmation
+  workflow per submission — LLM must draft, show, ask, then
+  submit. Explicit *"do not loop"* instruction. Defensive
+  validation upfront (URL host matches expected portal, content
+  non-empty, etc.) so the LLM gets a clean error instead of a
+  rejected POST.
+
+**You'll need to find the docs portal's feedback endpoint.** Most
+portals route the "Was this helpful?" widget through a backend
+API; sniff the browser network tab on the live site. The payload
+shape varies; common fields: content/body, page url/href, optional
+email, optional rating, optional thumbs. Most accept anonymous
+POSTs with no captcha at the JSON-API layer (even if the widget
+shows a captcha). Validate before you ship — and if the endpoint
+has rate limits or captcha enforcement, the tool returns a clean
+"submission rejected — paste manually at <url>" fallback.
+
+The whole point is the per-bug operator confirmation in the
+LLM-side conversation flow; the tool description enforces it. Do
+not bypass.
+
+### Phase 13 — Weekly digest tool  *(half a day)*
+
+Goal: a tool that answers *"what changed in the docs in the last N
+days?"* with no runtime git dependency (the prod container has no
+git).
+
+- Extend `scrape/changelog.py` with `--json` (one-shot structured
+  output) and `--history-out PATH` (walks `git log --first-parent
+  --since="<N> days ago"` for corpus-touching commits, writes one
+  JSON line per commit to a JSONL file).
+- CI workflows write the JSONL file into the image at build time:
+  `corpus/.digest/history.jsonl`. Both `refresh.yml` and
+  `image-only.yml`. **`fetch-depth: 0` is required** — see Phase 5.
+- New MCP tool `weekly_digest(days=7, version=None, platform=None,
+  max_bundles=25, max_pages_per_bundle=10)`: reads the JSONL,
+  filters to the window, applies version/platform via
+  `bundles.json` metadata, aggregates per-bundle change counts and
+  page lists, renders markdown.
+- Post-filter totals are critical: the headline "X page changes
+  across Y bundles" must compute X from the filtered set, not the
+  raw record count. Otherwise filtered calls look wrong to the
+  reader.
+
+Out of scope but trivial bolt-ons: scheduled HTML email of the
+digest, auto-publish to a blog, per-page diff excerpts as a
+follow-up tool.
+
+---
+
+## Standard tool set
+
+By the end you'll have ~15 tools registered. Production-tested
+shape:
+
+| Tool | What it does |
+|---|---|
+| `search_docs` | Semantic search with version/platform/bundle filters |
+| `get_page` | Full markdown + metadata for one page |
+| `list_versions` | Discover available facet values |
+| `list_cluster` | Cross-version peers for one page (if applicable) |
+| `diff_versions` | Unified diff of a page across two versions |
+| `bundle_changelog` | Added / removed / changed pages between two bundles |
+| `weekly_digest` | What changed in the last N days, with filters |
+| `corpus_status` | Freshness + size of the knowledge base |
+| `find_doc_inconsistencies` | Scoped scan for doc bugs |
+| `submit_doc_bug` | Submit a drafted bug (env-gated, operator-confirmed) |
+| `<product>_api_lessons` | Curated API gotchas, proactively-called |
+| product-specific tools | Interop matrix, lifecycle queries, etc. |
+
+---
+
+## Per-product customization checklist
+
+When applying this template to a new product, here's what you have
+to figure out yourself — everything else is shared infrastructure:
+
+- **Doc portal mechanics**
+  - URL pattern for pages
+  - Bundle/version concept (Zoomin "bundle", Madcap "project",
+    GitBook "space", Docusaurus "docs version" — same idea, different
+    name)
+  - SPA backing API (sniff the network tab) or fallback to
+    headless browser
+  - How `topic_cluster` -equivalent cross-version peers are exposed
+    (or whether you synthesize them from filenames)
+- **Bundle metadata schema**
+  - What does `version` look like? Semver, calendar, named?
+  - What does `platform` mean for this product? Is there a useful
+    facet at all?
+  - Other useful facets (language, product line, edition)?
+- **Filterable facets** for `search_docs`
+  - One filter per high-cardinality facet
+  - Skip filters that have <5 distinct values — they're not worth
+    the surface area
+- **Feedback endpoint** (for `submit_doc_bug`, if you want it)
+  - URL of the POST endpoint
+  - Required + optional payload fields
+  - Captcha / rate-limit behavior
+  - Whether anonymous submissions are accepted
+- **Curated knowledge** for the `_api_lessons` tool
+  - What does the product's API documentation NOT say that you've
+    learned from real integration work?
+- **Quickstart / example repos**
+  - Does the vendor publish working code? Ingest it; rewrite
+    chunk-0 for natural-language retrieval.
+
+---
+
+## Decisions worth carrying forward
+
+Things you'll save time on by deciding the same way again:
+
+- **Tool descriptions are user interface.** The LLM reads them
+  verbatim and decides whether to call the tool. *"Use when..."*
+  and *"Call proactively whenever..."* are real surfaces; treat
+  them like button labels. Most retrieval improvements turn out
+  to be tool-description rewrites in disguise.
+- **`stateless_http=True`** on the FastMCP server. Eliminates
+  whole categories of session-ID-related 404 storms after
+  container recreates.
+- **Pre-bake everything at CI time.** No runtime calls to git,
+  external services, or anything you wouldn't trust on a
+  Cloudflare outage. If the digest needs git history, write a
+  JSONL file at CI time. If the lessons doc needs to load fast,
+  bake it into the image.
+- **Env-gate every side-effecting tool.** Off by default in dev;
+  on only in production compose. Belt and suspenders against
+  accidental writes from staging environments.
+- **Operator-confirmation pattern for side-effecting tools.**
+  The tool docstring is the only place to enforce
+  human-in-the-loop. Make it loud. "MANDATORY", "Do not loop",
+  "show-confirm-then-submit" — those phrasings work.
+- **Verify with hand-curated golden queries before shipping any
+  retrieval change.** Numbers in the diff, in the commit message.
+  Don't ship retrieval changes on vibes.
+- **Two-cadence CI** (weekly scrape vs on-demand code-only)
+  saves hours per code iteration once you're past the
+  one-iteration-a-week stage.
+- **Rolling tag + sha-pinned tag** deploy pattern. `:latest` is
+  what Watchtower watches; `:<sha>` is your safety net. Rollback
+  is a one-line compose edit, not a redeploy.
+- **Usage logging is non-negotiable.** You will be wrong about
+  what people use. Capture the truth from day one; let it tell
+  you which features to keep building and which to delete.
+
+---
+
+## Glossary
+
+- **Bundle** — one logical doc set in the portal. Zoomin calls
+  them bundles; Madcap calls them projects; the concept is the
+  same: a versioned, titled collection of pages. One dir under
+  `corpus/`.
+- **Page** — one HTML page in a bundle. One `.md` + one `.json`
+  sidecar under the bundle dir.
+- **Topic cluster** — Zoomin's name for "this page in version
+  10.9 corresponds to that page in version 10.8." Stored in the
+  per-page sidecar. The portal-agnostic concept is "cross-version
+  peer mapping."
+- **Chunk** — a unit of text that gets independently embedded and
+  stored in Chroma. Target ~400-600 tokens; preserve paragraph
+  boundaries.
+- **RRF** — Reciprocal Rank Fusion. The way to merge two ranked
+  lists from independent retrievers without score calibration.
+
+---
+
+## What's deliberately NOT in this template
+
+Decisions you should make per-product (not copy from the original
+build):
+
+- The reverse proxy and TLS termination layer. Could be Caddy,
+  nginx, Traefik, Cloudflare Tunnel — pick what your infra uses.
+- The Gateway / aggregator in front of multiple MCPs (MetaMCP is one
+  option; you may not need any aggregator if you're running a
+  single product MCP).
+- The specific embedding model — `nomic-embed-text` is a strong
+  default but newer / domain-specific models may be better for
+  some products.
+- The Ollama containers / GPU setup — depends on what hardware you
+  have. The pattern is one container per GPU with explicit
+  `NVIDIA_VISIBLE_DEVICES` pinning; the indexer load-balances
+  across them.
+- Whether to publish a blog series alongside the build. Strongly
+  recommended (forces clarity, builds an audience), but optional.