initial: docs-mcp-template — build guide + scaffolded server

Template for building hosted MCP servers over a product's public
documentation. Distilled from one production build; everything
product-specific has been factored out.

Contents:

- PLAN.md — comprehensive build guide. 13 phases from project
  skeleton through weekly_digest. Includes the gotchas
  ("fetch-depth: 0 always", reranker per-pair token limit,
  Cloudflare body cap, dash-not-bash on Gitea runners), the
  decisions worth carrying forward, and a per-product
  customization checklist.

- CLAUDE.md — guidance for Claude Code working in a clone of this
  template. Phase identification table, conventions (env-gating +
  operator confirmation for side-effecting tools, defensive
  fallback for retrieval components), common commands.

- README.md — quick-start summary.

Scaffolded code (all signature-stable, with NotImplementedError
stubs where phase-specific work is required):

  docs_mcp/server.py    FastMCP server, stateless_http=True, with
                        search_docs / get_page / list_versions
                        baseline tools and commented stubs for the
                        rest of the phase set.
  docs_mcp/usage.py     TimedCall telemetry, JSONL, daily rotation,
                        90-day retention. Reusable as-is.
  rag/embeddings.py     Ollama embedder (nomic-embed-text default),
                        load-balanced across N URLs. Reusable.
  rag/chunk.py          Paragraph-aware chunker with synthetic
                        chunk 0. Per-product tunable.
  rag/index.py          Chroma + BM25 builder. --rebuild and
                        --bm25-only flags.
  rag/bm25.py           SQLite FTS5 lexical index. Reusable.
  scrape/changelog.py   --cached / --ref / --json / --history-out.
                        Reusable.
  scrape/README.md      What you write per-product.
  eval/queries.jsonl.example
                        Curate ~25 hand-labeled queries here.
  eval/retrievers.py    Retriever protocol + stub classes.
  eval/run_eval.py      MRR / Recall@K / nDCG@K harness skeleton.
  scripts/usage_report.py
                        Standalone log analyzer; the
                        FOLLOW-UP CHECKS pattern noted in the
                        module docstring.
  scripts/registry_gc.py
                        Gitea container registry cleanup. Reusable.

Deployment + CI:

  Dockerfile               Python 3.12-slim; COPY corpus + chroma
                           + bm25 last for cache efficiency.
  deploy/docker-compose.yml MCP + reranker sidecar + Watchtower.
                           Templated with <placeholders>.
  .gitea/workflows/refresh.yml    Weekly cron + manual dispatch.
                                  fetch-depth: 0, retry-on-race,
                                  three-tag image scheme.
  .gitea/workflows/image-only.yml Code-only ship cycle, ~18min.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-22 09:18:17 -04:00
commit 9ba615c8ee
26 changed files with 3280 additions and 0 deletions
+647
View File
@@ -0,0 +1,647 @@
# Docs MCP Server — Build Guide
A reusable recipe for building a hosted MCP server over a product's
public documentation. Distilled from one production build; everything
product-specific has been factored out.
The end product is a streamable-HTTP MCP server with ~15 tools that
any LLM client (Claude Desktop, Claude Code, Cursor, Copilot) can
call to answer questions against the docs, surface what changed
recently, find inconsistencies, and (optionally) submit doc bugs
back upstream.
---
## What you're building
A pipeline with these stages:
```
upstream docs portal
scrape ──► corpus/<bundle>/<page>.md + .json sidecar
chunk + embed ──► chroma/ (dense vectors)
│ ──► bm25/ (FTS5 lexical index)
MCP server ──► search_docs / get_page / diff_versions / weekly_digest /
find_doc_inconsistencies / submit_doc_bug / ...
reverse proxy / Cloudflare Tunnel ──► public endpoint
```
Two CI cadences:
- **Weekly cron** (~40 min): full re-scrape, re-chunk, re-embed,
image build & push.
- **On-demand image-only** (~18 min): code-only rebuild from
committed corpus, image build & push.
A container registry (self-hosted Gitea works well), a host running
Docker Compose, Watchtower auto-updating from `:latest`, and a
reverse proxy in front.
---
## Build phases
Each phase is a discrete, shippable unit. Build them in order; each
one is useful on its own and unlocks the next. Realistic effort per
phase is given as a rough order of magnitude. Total: roughly 23
weeks of focused work for the full stack.
### Phase 0 — Project skeleton *(half a day)*
Goals: directory layout, dependency manifest, virtualenv.
- Top-level dirs: `scrape/`, `corpus/` (gitignored), `rag/`,
`docs_mcp/`, `eval/`, `scripts/`, `deploy/`, `.gitea/workflows/`.
- `requirements.txt` with the dependencies you'll need across all
phases (FastMCP, chromadb, httpx, beautifulsoup4 or whatever HTML
parser, ollama or sentence-transformers client, etc.).
- `python -m venv venv` and pin Python version (3.11 or 3.12 — be
conservative; some embedding libraries have version-specific
wheels).
- `.gitignore`: `venv/`, `corpus/` (regenerable), `chroma/`
(regenerable), `bm25/` (regenerable), `*.pyc`, `__pycache__/`,
`.pytest_cache/`.
### Phase 1 — Scraper *(24 days, product-specific)*
This is the most product-dependent phase. The goal is to write a
scraper that produces a normalized corpus layout regardless of
upstream portal shape.
Output shape (mandatory):
```
corpus/
<bundle_id>/ # one dir per "doc bundle" — see Glossary
<page_id>.md # markdown body
<page_id>.json # sidecar with structured metadata
...
bundles.json # catalog of bundles with metadata
```
**Bundle metadata** (`bundles.json` is a list of these):
```json
{
"slug": "<bundle_id>",
"title": "User-facing title",
"version": "10.9",
"platform": "VMware vSphere", // may be null
"product": "Admin Guide", // optional but useful
"language": "en-US",
"page_count": 127,
"dates": {
"Added on": "2024-01-15",
"Updated on": "2026-05-20"
},
"landing_page": "<page_id>"
}
```
**Per-page sidecar** (`<page_id>.json`) carries page-level metadata.
The one field that matters cross-cutting is `topic_cluster` (see
Phase 9):
```json
{
"bundle_id": "<bundle_id>",
"page_id": "<page_id>",
"title": "How to ...",
"ordinal": 42,
"topic_cluster": {
"clustering_title": "How to ...",
"clustered_topics": [
{"bundle_id": "...10.8", "page_id": "How_to_X.htm", "clustering_title": "..."},
{"bundle_id": "...10.9", "page_id": "How_to_X.htm", "clustering_title": "..."}
]
}
}
```
If the portal exposes a cross-version "this page corresponds to that
page" mapping, capture it here. If it doesn't, you can synthesize a
filename-based fallback (same filename across bundle versions = same
topic) and live without the editor-curated mapping. The features that
read `topic_cluster` (`list_cluster`, `diff_versions`,
`find_doc_inconsistencies`, parts of `weekly_digest`) will work
either way; they're more accurate with real clusters.
**Patterns that recur across doc portals:**
- Most modern doc portals are SPAs. Plain `requests.get` won't see
rendered content. Either find the underlying API the SPA calls (the
cheapest, most reliable path), or fall back to a headless browser
(Playwright). The API path is almost always available; sniff the
network tab.
- Portals usually expose a "bundle/topic" hierarchy under the hood
(Zoomin, Madcap Flare, Paligo, GitBook, Docusaurus all do). Map
it to `bundles.json` + `corpus/<bundle>/<page>`.
- Many portals expose `?save_local=` or `.pdf` rendered versions; the
HTML they serve is structurally cleaner than what the page shows
through the SPA shell.
**`scrape/changelog.py`** (~250 LOC; see Phase 13) — provides
`summarize_diff()`, `render_human()`, `walk_history()` and the
`--json` / `--history-out` modes. Mostly reusable as-is; the only
product-specific bit is the path layout assumption.
### Phase 2 — Chunking + embeddings + Chroma *(2 days)*
Goal: build a queryable dense index from the scraped corpus.
- `rag/chunk.py` — split each page's markdown into ~400-600 token
chunks. Strategy that works: paragraph-aware splitter with a
rich "chunk 0" containing the page title + 1-sentence summary +
bag-of-words from key terms. Chunk 0 is what dense retrieval lands
on first; getting it right dominates retrieval quality.
- `rag/embeddings.py` — pluggable embedder. Recommended start:
Ollama-hosted `nomic-embed-text` (768-dim, free, good baseline).
Other defensible choices: `text-embedding-3-small` (OpenAI),
`bge-m3` (also via Ollama). The embedder is a Chroma
`EmbeddingFunction` that returns `list[list[float]]` for a list
of texts.
- `rag/index.py` — orchestrates: read corpus → emit chunks (with
metadata: bundle_id, page_id, version, platform, ordinal) →
upsert into Chroma collection. `--rebuild` flag for a clean
reindex. Run via `python -m rag.index --rebuild`.
Chroma settings: `PersistentClient(path="chroma/")` and
`Settings(anonymized_telemetry=False)`. Single collection
(`<product>_docs`).
**GPU note**: embedding 70K chunks on CPU takes hours; on a GPU
(via Ollama with `NVIDIA_VISIBLE_DEVICES`) takes ~10 minutes. Two
GPUs in parallel: ~5 minutes. The orchestrator just needs to load-
balance HTTP requests across multiple Ollama endpoints.
### Phase 3 — MCP server skeleton *(1 day)*
Goal: working FastMCP server with three tools — `search_docs`,
`get_page`, `list_versions`.
- `docs_mcp/server.py``FastMCP("<product>-docs", stateless_http=True)`.
`stateless_http=True` is critical for production hosting: every
request creates an ephemeral session, so container recreates don't
produce a 404 storm from stale `mcp-session-id` headers on
clients.
- Lazy initialization for everything expensive (Chroma client,
embedder, bundles catalog) so the server starts cleanly even when
Ollama is briefly unreachable.
- Tool: `search_docs(query, version=None, platform=None,
bundle_id=None, k=10)`. Returns markdown of top-k chunks with full
source URLs.
- Tool: `get_page(bundle_id, page_id)`. Returns full page markdown +
metadata.
- Tool: `list_versions()`. Returns the version/platform facets
available, drawn from `bundles.json`. Helps the LLM pick filter
values.
Transports: stdio (for local Claude Desktop dev), streamable-HTTP
(for hosted production). One argparse switch.
```python
@mcp.tool()
def search_docs(
query: Annotated[str, Field(description="Natural-language query about <product>.")],
version: Annotated[str | None, Field(description="Restrict to one version")] = None,
...
) -> str:
...
```
The tool descriptions are first-class context — the LLM reads them
and decides whether to call the tool. Treat them as button labels;
use "Call when..." / "Use proactively whenever..." phrasings.
### Phase 4 — Containerization *(1 day)*
Goal: image you can run anywhere.
- `Dockerfile`: Python 3.12-slim base, install requirements, COPY
`scrape rag diff docs_mcp` + `bundles.json` + `corpus/ chroma/`
+ (later) `bm25/`. Don't COPY `scripts/` — those stay external
for ops use only.
- `ENTRYPOINT ["python", "-m", "docs_mcp.server",
"--transport", "streamable-http"]`. Configurable host/port via env.
- `deploy/docker-compose.yml`: one service, named volumes for usage
logs and any state, Watchtower label, depends_on for the reranker
sidecar (Phase 6).
Smoke-test locally: `docker compose up` should expose
`http://localhost:8000/mcp` and respond to an MCP `initialize` JSON-RPC.
### Phase 5 — CI on self-hosted Gitea Actions *(12 days)*
Goal: weekly cron rebuild + on-demand code-only ship cycle.
**Two workflows, two cadences:**
| Workflow | Trigger | Steps | Runtime |
|---|---|---|---|
| `refresh.yml` | Monday cron + manual dispatch | scrape → commit corpus → rebuild indexes → build & push image | ~40 min |
| `image-only.yml` | manual dispatch only | rebuild indexes from committed corpus → build & push image | ~18 min |
**Critical settings (learned the hard way):**
- `fetch-depth: 0` on `actions/checkout@v4`. The default depth is 1
(shallow), which breaks any step that walks git history (changelog,
digest history walker). Pay the ~10 second cost; never debug a
"0-byte history file" mystery.
- `runs-on: docker` (Gitea convention, not `ubuntu-latest`).
- Runner shell is `/bin/sh` (dash), not bash. `${VAR::N}` substring
expansion doesn't exist; use `cut` / `printf` / `awk`.
**Retry-on-race pattern for long-running scrapes:**
```bash
attempt=1
while [ $attempt -le 3 ]; do
if git push; then
echo "pushed (attempt $attempt)"
break
fi
[ $attempt -eq 3 ] && { echo "still failing"; exit 1; }
git fetch origin main
git rebase origin/main || { echo "conflict — bail"; exit 1; }
attempt=$((attempt + 1))
done
```
Works because scrape commits only touch `corpus/` + `bundles.json`,
and code merges only touch `.py` / `.yml` — disjoint paths, trivially
clean rebases.
**Image tagging — three tags per build:**
| Tag | Purpose |
|---|---|
| `:latest` | Watchtower watches this for auto-deploy |
| `:<sha12>` | Immutable; rollback target |
| `:<YYYY.MM.DD>` | Human-readable in incident notes |
Same tag set on every build; rollback is a one-line compose edit
to pin `:<sha>` instead of `:latest`.
**Container registry behind Cloudflare:**
Cloudflare's free tier has a 100 MB request body limit. Big image
layers (Chroma index can easily be 800+ MB) exceed it on push. The
fix is a LAN registry endpoint for push, public hostname for pull:
```yaml
env:
REGISTRY_PUSH: <lan-ip>:<port> # bypasses Cloudflare
REGISTRY_PULL: <public-hostname> # response bodies aren't capped
```
Runner needs the LAN endpoint in `/etc/docker/daemon.json`
`insecure-registries`. Costs nothing operationally; saves hours
of debugging.
**Registry GC:** weekly cron in the workflow that walks the package
versions, keeps `:latest` + N most-recent date tags + anything
pushed in the last 90 days, deletes the rest. Worth ~50 LOC; the
package GC on the Gitea side reclaims disk after.
### Phase 6 — Reranker *(half a day)*
Goal: lift retrieval quality 3× by cross-encoder reranking the top-N
dense candidates.
- A `/v1/rerank` HTTP endpoint backed by `llama.cpp` serving
`jina-reranker-v2-base` (GGUF). Runs as a sidecar in compose.
GPU strongly recommended (CPU latency is unworkable for live
queries).
- `_rerank(query, docs)` helper in the server: POST to the endpoint,
apply the scores, re-sort the top-N candidates. Defensive: on any
failure log a warning and fall through to dense-only.
- Env: `RERANK_URL` (off by default), `RERANK_POOL` (how deep to
pull candidates for reranking; 200 is a good default),
`RERANK_TIMEOUT` (30s for cold-start tolerance).
- **Watch the per-pair token limit.** Jina's GGUF reports
`n_ctx_train=1024` and llama.cpp will reject the ENTIRE batch if
any pair exceeds it. Truncate doc text to ~2000 chars before
reranking. The full untruncated chunk still goes back to the user;
truncation is only for the reranker scoring path.
### Phase 7 — Eval harness *(1 day)*
Goal: hand-curated golden queries + standard metrics so you can
measure the impact of any retrieval change.
- `eval/queries.jsonl`: 2025 hand-curated queries with expected
pages. Spread across versions, platforms, and difficulty levels.
Include the queries that "obviously" should work and DON'T —
those are the ones to track.
- `eval/retrievers.py`: a `Retriever` protocol with concrete
implementations: `DenseRetriever`, `RerankedRetriever`,
`BM25Retriever` (Phase 8), `HybridRetriever` (Phase 8). One
matrix dimension per knob.
- `eval/run_eval.py`: computes MRR / Recall@5 / nDCG@5 across all
retrievers; emits a markdown comparison table at
`eval/results/<baseline>.md`. Commit the result so PRs land with
the A/B evidence in the diff.
Three numbers are enough — don't overengineer. The hand-curated
queries are the value; the metrics are just a stable way to score
them.
### Phase 8 — BM25 + Hybrid retrieval *(half a day, conditional)*
**Skip unless your eval shows specific failure modes.** Dense
embeddings + cross-encoder reranker handle most queries. The case
where they don't: queries with rare technical tokens (filenames,
language names, error codes) get buried at dense rank 1000+ by a
much larger prose corpus that's semantically nearby. The reranker
only sees top-200, so it never gets a shot.
- `rag/bm25.py`: SQLite FTS5 index, in the stdlib, on-disk
(`bm25/<product>.db`). Two tables — metadata table keyed by
rowid, FTS5 virtual table for full-text. Sanitize the query
(strip FTS5 reserved keywords, OR-join tokens for recall). ~210
LOC.
- `_rrf_fuse()` in the server — Reciprocal Rank Fusion with `k=60`.
Per-id score = `sum_over_retrievers(1 / (k + rank))`. Returns
ordered ids plus per-retriever contribution dict for telemetry.
- `search_docs` hybrid path: run dense + BM25 in parallel,
RRF-fuse, hand the merged top-200 to the reranker. Env-gated:
`HYBRID_SEARCH=true`.
- Log `top1_source` per call (`dense_only` / `bm25_only` / `both`)
to usage logs so you can measure whether BM25 is actually earning
its keep on production traffic.
If after 46 weeks of production data you see `bm25_only >= 80%`,
you can simplify to BM25-only (much less infrastructure). If
`both >= 50%`, hybrid is acting as tie-breaker not rescue — keep it
or simplify depending on how much you care about the long tail.
### Phase 9 — Multi-version diff tooling *(1 day, if applicable)*
**Only relevant if the product has multiple maintained versions.**
- `diff_versions(bundle_id, page_id, against_bundle_id)`: unified
diff between two versions of the same page. Two matching
strategies: editor-curated `topic_cluster` peer (if the portal
exposes it), or same-filename fallback.
- `list_cluster(bundle_id, page_id)`: list cross-version peers
for one page.
- `bundle_changelog(bundle_id_new, bundle_id_old)`: added /
removed / changed pages between two bundles, sorted by churn.
- `_diff_churn(a, b)`: small helper, ~15 LOC of `difflib.unified_diff
--unified=0` line counting. Used by `bundle_changelog`,
`find_doc_inconsistencies`, and `weekly_digest`.
### Phase 10 — Usage logging *(half a day)*
Goal: per-call JSONL telemetry so you can answer "what are people
actually asking for" and "is the new feature getting used."
- `docs_mcp/usage.py`: `TimedCall` context manager that captures
tool name, args, elapsed time, hits returned, any extra fields
set by the tool via `_call.set(key=value)`. Writes JSONL to
`var/logs/usage.jsonl`, rotated daily, kept 90 days.
- Mount the log dir as a named compose volume so logs survive
container recreates.
- `scripts/usage_report.py` (standalone, no docs_mcp deps): reads
the JSONL files, prints per-tool counts, top queries, 0-hit
queries, filter usage histogram, reranker activity. Markdown
output flag for piping into weekly digest emails.
What to log: query text, filters, hits returned, elapsed_ms,
reranker_fired flag, hybrid top1_source, retrieval_mode. What NOT
to log: anything PII-shaped. The corpus is public, queries are
usually about the product, not personal — but be deliberate.
### Phase 11 — Curated knowledge layer *(2 days)*
The "RAG can't tell you what isn't in the docs" gap. Surfaces:
- **API quickstart repos** if the product has them. Ingest the
example scripts (Python, PowerShell, curl) into the corpus.
Rewrite chunk-0 for each script to embed naturally — explicit
natural-language H1, task description sentence, keyword bag.
Dense embeddings need an anchor.
- **A curated `<product>_api_lessons` markdown doc** for things
the swagger / OpenAPI doesn't say: auth flow gotchas, async-task
patterns, schema bugs you've hit, platform-detection quirks.
Surface as a dedicated MCP tool whose description tells the LLM:
*"Call proactively whenever the user asks you to write a script
/ integrate with the API / debug a 4xx response."*
- **An auto-hint banner** in `search_docs` results — when the
query matches a script/API trigger word, render a one-line nudge
at the top of results pointing at the dedicated tool. Belt-and-
suspenders for queries where the LLM doesn't think to call it
proactively.
### Phase 12 — Doc-bug workflow tools *(1 day, optional)*
Two tools that pair up to enable a *"check the docs for
inconsistencies, draft bugs, confirm, submit"* workflow.
- `find_doc_inconsistencies(scope_query, version=None, platform=None,
max_pages=30, checks=None)`: deterministic, read-only. Two checks:
cross-version drift (pages whose content shifted between immediate-
previous versions in the actionable 1060% churn band) and
redirect-chain detection (short pages whose body is just a "see
[other page] for details" pointer). Heavy lifting is line-level
diff (`difflib`) against editor-curated cluster peers; the model
judges which findings are real bugs.
- `submit_doc_bug(page_url, content, email=None, rating=None,
like=None)`: POSTs to the docs portal's feedback endpoint.
Env-gated by `DOC_BUG_SUBMIT_ENABLED=true` so dev/staging
deployments can't accidentally hit the upstream. The tool's
docstring is loud about a mandatory operator-confirmation
workflow per submission — LLM must draft, show, ask, then
submit. Explicit *"do not loop"* instruction. Defensive
validation upfront (URL host matches expected portal, content
non-empty, etc.) so the LLM gets a clean error instead of a
rejected POST.
**You'll need to find the docs portal's feedback endpoint.** Most
portals route the "Was this helpful?" widget through a backend
API; sniff the browser network tab on the live site. The payload
shape varies; common fields: content/body, page url/href, optional
email, optional rating, optional thumbs. Most accept anonymous
POSTs with no captcha at the JSON-API layer (even if the widget
shows a captcha). Validate before you ship — and if the endpoint
has rate limits or captcha enforcement, the tool returns a clean
"submission rejected — paste manually at <url>" fallback.
The whole point is the per-bug operator confirmation in the
LLM-side conversation flow; the tool description enforces it. Do
not bypass.
### Phase 13 — Weekly digest tool *(half a day)*
Goal: a tool that answers *"what changed in the docs in the last N
days?"* with no runtime git dependency (the prod container has no
git).
- Extend `scrape/changelog.py` with `--json` (one-shot structured
output) and `--history-out PATH` (walks `git log --first-parent
--since="<N> days ago"` for corpus-touching commits, writes one
JSON line per commit to a JSONL file).
- CI workflows write the JSONL file into the image at build time:
`corpus/.digest/history.jsonl`. Both `refresh.yml` and
`image-only.yml`. **`fetch-depth: 0` is required** — see Phase 5.
- New MCP tool `weekly_digest(days=7, version=None, platform=None,
max_bundles=25, max_pages_per_bundle=10)`: reads the JSONL,
filters to the window, applies version/platform via
`bundles.json` metadata, aggregates per-bundle change counts and
page lists, renders markdown.
- Post-filter totals are critical: the headline "X page changes
across Y bundles" must compute X from the filtered set, not the
raw record count. Otherwise filtered calls look wrong to the
reader.
Out of scope but trivial bolt-ons: scheduled HTML email of the
digest, auto-publish to a blog, per-page diff excerpts as a
follow-up tool.
---
## Standard tool set
By the end you'll have ~15 tools registered. Production-tested
shape:
| Tool | What it does |
|---|---|
| `search_docs` | Semantic search with version/platform/bundle filters |
| `get_page` | Full markdown + metadata for one page |
| `list_versions` | Discover available facet values |
| `list_cluster` | Cross-version peers for one page (if applicable) |
| `diff_versions` | Unified diff of a page across two versions |
| `bundle_changelog` | Added / removed / changed pages between two bundles |
| `weekly_digest` | What changed in the last N days, with filters |
| `corpus_status` | Freshness + size of the knowledge base |
| `find_doc_inconsistencies` | Scoped scan for doc bugs |
| `submit_doc_bug` | Submit a drafted bug (env-gated, operator-confirmed) |
| `<product>_api_lessons` | Curated API gotchas, proactively-called |
| product-specific tools | Interop matrix, lifecycle queries, etc. |
---
## Per-product customization checklist
When applying this template to a new product, here's what you have
to figure out yourself — everything else is shared infrastructure:
- **Doc portal mechanics**
- URL pattern for pages
- Bundle/version concept (Zoomin "bundle", Madcap "project",
GitBook "space", Docusaurus "docs version" — same idea, different
name)
- SPA backing API (sniff the network tab) or fallback to
headless browser
- How `topic_cluster` -equivalent cross-version peers are exposed
(or whether you synthesize them from filenames)
- **Bundle metadata schema**
- What does `version` look like? Semver, calendar, named?
- What does `platform` mean for this product? Is there a useful
facet at all?
- Other useful facets (language, product line, edition)?
- **Filterable facets** for `search_docs`
- One filter per high-cardinality facet
- Skip filters that have <5 distinct values — they're not worth
the surface area
- **Feedback endpoint** (for `submit_doc_bug`, if you want it)
- URL of the POST endpoint
- Required + optional payload fields
- Captcha / rate-limit behavior
- Whether anonymous submissions are accepted
- **Curated knowledge** for the `_api_lessons` tool
- What does the product's API documentation NOT say that you've
learned from real integration work?
- **Quickstart / example repos**
- Does the vendor publish working code? Ingest it; rewrite
chunk-0 for natural-language retrieval.
---
## Decisions worth carrying forward
Things you'll save time on by deciding the same way again:
- **Tool descriptions are user interface.** The LLM reads them
verbatim and decides whether to call the tool. *"Use when..."*
and *"Call proactively whenever..."* are real surfaces; treat
them like button labels. Most retrieval improvements turn out
to be tool-description rewrites in disguise.
- **`stateless_http=True`** on the FastMCP server. Eliminates
whole categories of session-ID-related 404 storms after
container recreates.
- **Pre-bake everything at CI time.** No runtime calls to git,
external services, or anything you wouldn't trust on a
Cloudflare outage. If the digest needs git history, write a
JSONL file at CI time. If the lessons doc needs to load fast,
bake it into the image.
- **Env-gate every side-effecting tool.** Off by default in dev;
on only in production compose. Belt and suspenders against
accidental writes from staging environments.
- **Operator-confirmation pattern for side-effecting tools.**
The tool docstring is the only place to enforce
human-in-the-loop. Make it loud. "MANDATORY", "Do not loop",
"show-confirm-then-submit" — those phrasings work.
- **Verify with hand-curated golden queries before shipping any
retrieval change.** Numbers in the diff, in the commit message.
Don't ship retrieval changes on vibes.
- **Two-cadence CI** (weekly scrape vs on-demand code-only)
saves hours per code iteration once you're past the
one-iteration-a-week stage.
- **Rolling tag + sha-pinned tag** deploy pattern. `:latest` is
what Watchtower watches; `:<sha>` is your safety net. Rollback
is a one-line compose edit, not a redeploy.
- **Usage logging is non-negotiable.** You will be wrong about
what people use. Capture the truth from day one; let it tell
you which features to keep building and which to delete.
---
## Glossary
- **Bundle** — one logical doc set in the portal. Zoomin calls
them bundles; Madcap calls them projects; the concept is the
same: a versioned, titled collection of pages. One dir under
`corpus/`.
- **Page** — one HTML page in a bundle. One `.md` + one `.json`
sidecar under the bundle dir.
- **Topic cluster** — Zoomin's name for "this page in version
10.9 corresponds to that page in version 10.8." Stored in the
per-page sidecar. The portal-agnostic concept is "cross-version
peer mapping."
- **Chunk** — a unit of text that gets independently embedded and
stored in Chroma. Target ~400-600 tokens; preserve paragraph
boundaries.
- **RRF** — Reciprocal Rank Fusion. The way to merge two ranked
lists from independent retrievers without score calibration.
---
## What's deliberately NOT in this template
Decisions you should make per-product (not copy from the original
build):
- The reverse proxy and TLS termination layer. Could be Caddy,
nginx, Traefik, Cloudflare Tunnel — pick what your infra uses.
- The Gateway / aggregator in front of multiple MCPs (MetaMCP is one
option; you may not need any aggregator if you're running a
single product MCP).
- The specific embedding model — `nomic-embed-text` is a strong
default but newer / domain-specific models may be better for
some products.
- The Ollama containers / GPU setup — depends on what hardware you
have. The pattern is one container per GPU with explicit
`NVIDIA_VISIBLE_DEVICES` pinning; the indexer load-balances
across them.
- Whether to publish a blog series alongside the build. Strongly
recommended (forces clarity, builds an audience), but optional.