Template for building hosted MCP servers over a product's public
documentation. Distilled from one production build; everything
product-specific has been factored out.
Contents:
- PLAN.md — comprehensive build guide. 13 phases from project
skeleton through weekly_digest. Includes the gotchas
("fetch-depth: 0 always", reranker per-pair token limit,
Cloudflare body cap, dash-not-bash on Gitea runners), the
decisions worth carrying forward, and a per-product
customization checklist.
- CLAUDE.md — guidance for Claude Code working in a clone of this
template. Phase identification table, conventions (env-gating +
operator confirmation for side-effecting tools, defensive
fallback for retrieval components), common commands.
- README.md — quick-start summary.
Scaffolded code (all signature-stable, with NotImplementedError
stubs where phase-specific work is required):
docs_mcp/server.py FastMCP server, stateless_http=True, with
search_docs / get_page / list_versions
baseline tools and commented stubs for the
rest of the phase set.
docs_mcp/usage.py TimedCall telemetry, JSONL, daily rotation,
90-day retention. Reusable as-is.
rag/embeddings.py Ollama embedder (nomic-embed-text default),
load-balanced across N URLs. Reusable.
rag/chunk.py Paragraph-aware chunker with synthetic
chunk 0. Per-product tunable.
rag/index.py Chroma + BM25 builder. --rebuild and
--bm25-only flags.
rag/bm25.py SQLite FTS5 lexical index. Reusable.
scrape/changelog.py --cached / --ref / --json / --history-out.
Reusable.
scrape/README.md What you write per-product.
eval/queries.jsonl.example
Curate ~25 hand-labeled queries here.
eval/retrievers.py Retriever protocol + stub classes.
eval/run_eval.py MRR / Recall@K / nDCG@K harness skeleton.
scripts/usage_report.py
Standalone log analyzer; the
FOLLOW-UP CHECKS pattern noted in the
module docstring.
scripts/registry_gc.py
Gitea container registry cleanup. Reusable.
Deployment + CI:
Dockerfile Python 3.12-slim; COPY corpus + chroma
+ bm25 last for cache efficiency.
deploy/docker-compose.yml MCP + reranker sidecar + Watchtower.
Templated with <placeholders>.
.gitea/workflows/refresh.yml Weekly cron + manual dispatch.
fetch-depth: 0, retry-on-race,
three-tag image scheme.
.gitea/workflows/image-only.yml Code-only ship cycle, ~18min.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
26 KiB
Docs MCP Server — Build Guide
A reusable recipe for building a hosted MCP server over a product's public documentation. Distilled from one production build; everything product-specific has been factored out.
The end product is a streamable-HTTP MCP server with ~15 tools that any LLM client (Claude Desktop, Claude Code, Cursor, Copilot) can call to answer questions against the docs, surface what changed recently, find inconsistencies, and (optionally) submit doc bugs back upstream.
What you're building
A pipeline with these stages:
upstream docs portal
│
▼
scrape ──► corpus/<bundle>/<page>.md + .json sidecar
│
▼
chunk + embed ──► chroma/ (dense vectors)
│ ──► bm25/ (FTS5 lexical index)
▼
MCP server ──► search_docs / get_page / diff_versions / weekly_digest /
find_doc_inconsistencies / submit_doc_bug / ...
│
▼
reverse proxy / Cloudflare Tunnel ──► public endpoint
Two CI cadences:
- Weekly cron (~40 min): full re-scrape, re-chunk, re-embed, image build & push.
- On-demand image-only (~18 min): code-only rebuild from committed corpus, image build & push.
A container registry (self-hosted Gitea works well), a host running
Docker Compose, Watchtower auto-updating from :latest, and a
reverse proxy in front.
Build phases
Each phase is a discrete, shippable unit. Build them in order; each one is useful on its own and unlocks the next. Realistic effort per phase is given as a rough order of magnitude. Total: roughly 2–3 weeks of focused work for the full stack.
Phase 0 — Project skeleton (half a day)
Goals: directory layout, dependency manifest, virtualenv.
- Top-level dirs:
scrape/,corpus/(gitignored),rag/,docs_mcp/,eval/,scripts/,deploy/,.gitea/workflows/. requirements.txtwith the dependencies you'll need across all phases (FastMCP, chromadb, httpx, beautifulsoup4 or whatever HTML parser, ollama or sentence-transformers client, etc.).python -m venv venvand pin Python version (3.11 or 3.12 — be conservative; some embedding libraries have version-specific wheels)..gitignore:venv/,corpus/(regenerable),chroma/(regenerable),bm25/(regenerable),*.pyc,__pycache__/,.pytest_cache/.
Phase 1 — Scraper (2–4 days, product-specific)
This is the most product-dependent phase. The goal is to write a scraper that produces a normalized corpus layout regardless of upstream portal shape.
Output shape (mandatory):
corpus/
<bundle_id>/ # one dir per "doc bundle" — see Glossary
<page_id>.md # markdown body
<page_id>.json # sidecar with structured metadata
...
bundles.json # catalog of bundles with metadata
Bundle metadata (bundles.json is a list of these):
{
"slug": "<bundle_id>",
"title": "User-facing title",
"version": "10.9",
"platform": "VMware vSphere", // may be null
"product": "Admin Guide", // optional but useful
"language": "en-US",
"page_count": 127,
"dates": {
"Added on": "2024-01-15",
"Updated on": "2026-05-20"
},
"landing_page": "<page_id>"
}
Per-page sidecar (<page_id>.json) carries page-level metadata.
The one field that matters cross-cutting is topic_cluster (see
Phase 9):
{
"bundle_id": "<bundle_id>",
"page_id": "<page_id>",
"title": "How to ...",
"ordinal": 42,
"topic_cluster": {
"clustering_title": "How to ...",
"clustered_topics": [
{"bundle_id": "...10.8", "page_id": "How_to_X.htm", "clustering_title": "..."},
{"bundle_id": "...10.9", "page_id": "How_to_X.htm", "clustering_title": "..."}
]
}
}
If the portal exposes a cross-version "this page corresponds to that
page" mapping, capture it here. If it doesn't, you can synthesize a
filename-based fallback (same filename across bundle versions = same
topic) and live without the editor-curated mapping. The features that
read topic_cluster (list_cluster, diff_versions,
find_doc_inconsistencies, parts of weekly_digest) will work
either way; they're more accurate with real clusters.
Patterns that recur across doc portals:
- Most modern doc portals are SPAs. Plain
requests.getwon't see rendered content. Either find the underlying API the SPA calls (the cheapest, most reliable path), or fall back to a headless browser (Playwright). The API path is almost always available; sniff the network tab. - Portals usually expose a "bundle/topic" hierarchy under the hood
(Zoomin, Madcap Flare, Paligo, GitBook, Docusaurus all do). Map
it to
bundles.json+corpus/<bundle>/<page>. - Many portals expose
?save_local=or.pdfrendered versions; the HTML they serve is structurally cleaner than what the page shows through the SPA shell.
scrape/changelog.py (~250 LOC; see Phase 13) — provides
summarize_diff(), render_human(), walk_history() and the
--json / --history-out modes. Mostly reusable as-is; the only
product-specific bit is the path layout assumption.
Phase 2 — Chunking + embeddings + Chroma (2 days)
Goal: build a queryable dense index from the scraped corpus.
rag/chunk.py— split each page's markdown into ~400-600 token chunks. Strategy that works: paragraph-aware splitter with a rich "chunk 0" containing the page title + 1-sentence summary + bag-of-words from key terms. Chunk 0 is what dense retrieval lands on first; getting it right dominates retrieval quality.rag/embeddings.py— pluggable embedder. Recommended start: Ollama-hostednomic-embed-text(768-dim, free, good baseline). Other defensible choices:text-embedding-3-small(OpenAI),bge-m3(also via Ollama). The embedder is a ChromaEmbeddingFunctionthat returnslist[list[float]]for a list of texts.rag/index.py— orchestrates: read corpus → emit chunks (with metadata: bundle_id, page_id, version, platform, ordinal) → upsert into Chroma collection.--rebuildflag for a clean reindex. Run viapython -m rag.index --rebuild.
Chroma settings: PersistentClient(path="chroma/") and
Settings(anonymized_telemetry=False). Single collection
(<product>_docs).
GPU note: embedding 70K chunks on CPU takes hours; on a GPU
(via Ollama with NVIDIA_VISIBLE_DEVICES) takes ~10 minutes. Two
GPUs in parallel: ~5 minutes. The orchestrator just needs to load-
balance HTTP requests across multiple Ollama endpoints.
Phase 3 — MCP server skeleton (1 day)
Goal: working FastMCP server with three tools — search_docs,
get_page, list_versions.
docs_mcp/server.py—FastMCP("<product>-docs", stateless_http=True).stateless_http=Trueis critical for production hosting: every request creates an ephemeral session, so container recreates don't produce a 404 storm from stalemcp-session-idheaders on clients.- Lazy initialization for everything expensive (Chroma client, embedder, bundles catalog) so the server starts cleanly even when Ollama is briefly unreachable.
- Tool:
search_docs(query, version=None, platform=None, bundle_id=None, k=10). Returns markdown of top-k chunks with full source URLs. - Tool:
get_page(bundle_id, page_id). Returns full page markdown + metadata. - Tool:
list_versions(). Returns the version/platform facets available, drawn frombundles.json. Helps the LLM pick filter values.
Transports: stdio (for local Claude Desktop dev), streamable-HTTP (for hosted production). One argparse switch.
@mcp.tool()
def search_docs(
query: Annotated[str, Field(description="Natural-language query about <product>.")],
version: Annotated[str | None, Field(description="Restrict to one version")] = None,
...
) -> str:
...
The tool descriptions are first-class context — the LLM reads them and decides whether to call the tool. Treat them as button labels; use "Call when..." / "Use proactively whenever..." phrasings.
Phase 4 — Containerization (1 day)
Goal: image you can run anywhere.
Dockerfile: Python 3.12-slim base, install requirements, COPYscrape rag diff docs_mcp+bundles.json+corpus/ chroma/- (later)
bm25/. Don't COPYscripts/— those stay external for ops use only.
- (later)
ENTRYPOINT ["python", "-m", "docs_mcp.server", "--transport", "streamable-http"]. Configurable host/port via env.deploy/docker-compose.yml: one service, named volumes for usage logs and any state, Watchtower label, depends_on for the reranker sidecar (Phase 6).
Smoke-test locally: docker compose up should expose
http://localhost:8000/mcp and respond to an MCP initialize JSON-RPC.
Phase 5 — CI on self-hosted Gitea Actions (1–2 days)
Goal: weekly cron rebuild + on-demand code-only ship cycle.
Two workflows, two cadences:
| Workflow | Trigger | Steps | Runtime |
|---|---|---|---|
refresh.yml |
Monday cron + manual dispatch | scrape → commit corpus → rebuild indexes → build & push image | ~40 min |
image-only.yml |
manual dispatch only | rebuild indexes from committed corpus → build & push image | ~18 min |
Critical settings (learned the hard way):
fetch-depth: 0onactions/checkout@v4. The default depth is 1 (shallow), which breaks any step that walks git history (changelog, digest history walker). Pay the ~10 second cost; never debug a "0-byte history file" mystery.runs-on: docker(Gitea convention, notubuntu-latest).- Runner shell is
/bin/sh(dash), not bash.${VAR::N}substring expansion doesn't exist; usecut/printf/awk.
Retry-on-race pattern for long-running scrapes:
attempt=1
while [ $attempt -le 3 ]; do
if git push; then
echo "pushed (attempt $attempt)"
break
fi
[ $attempt -eq 3 ] && { echo "still failing"; exit 1; }
git fetch origin main
git rebase origin/main || { echo "conflict — bail"; exit 1; }
attempt=$((attempt + 1))
done
Works because scrape commits only touch corpus/ + bundles.json,
and code merges only touch .py / .yml — disjoint paths, trivially
clean rebases.
Image tagging — three tags per build:
| Tag | Purpose |
|---|---|
:latest |
Watchtower watches this for auto-deploy |
:<sha12> |
Immutable; rollback target |
:<YYYY.MM.DD> |
Human-readable in incident notes |
Same tag set on every build; rollback is a one-line compose edit
to pin :<sha> instead of :latest.
Container registry behind Cloudflare:
Cloudflare's free tier has a 100 MB request body limit. Big image layers (Chroma index can easily be 800+ MB) exceed it on push. The fix is a LAN registry endpoint for push, public hostname for pull:
env:
REGISTRY_PUSH: <lan-ip>:<port> # bypasses Cloudflare
REGISTRY_PULL: <public-hostname> # response bodies aren't capped
Runner needs the LAN endpoint in /etc/docker/daemon.json
insecure-registries. Costs nothing operationally; saves hours
of debugging.
Registry GC: weekly cron in the workflow that walks the package
versions, keeps :latest + N most-recent date tags + anything
pushed in the last 90 days, deletes the rest. Worth ~50 LOC; the
package GC on the Gitea side reclaims disk after.
Phase 6 — Reranker (half a day)
Goal: lift retrieval quality 3× by cross-encoder reranking the top-N dense candidates.
- A
/v1/rerankHTTP endpoint backed byllama.cppservingjina-reranker-v2-base(GGUF). Runs as a sidecar in compose. GPU strongly recommended (CPU latency is unworkable for live queries). _rerank(query, docs)helper in the server: POST to the endpoint, apply the scores, re-sort the top-N candidates. Defensive: on any failure log a warning and fall through to dense-only.- Env:
RERANK_URL(off by default),RERANK_POOL(how deep to pull candidates for reranking; 200 is a good default),RERANK_TIMEOUT(30s for cold-start tolerance). - Watch the per-pair token limit. Jina's GGUF reports
n_ctx_train=1024and llama.cpp will reject the ENTIRE batch if any pair exceeds it. Truncate doc text to ~2000 chars before reranking. The full untruncated chunk still goes back to the user; truncation is only for the reranker scoring path.
Phase 7 — Eval harness (1 day)
Goal: hand-curated golden queries + standard metrics so you can measure the impact of any retrieval change.
eval/queries.jsonl: 20–25 hand-curated queries with expected pages. Spread across versions, platforms, and difficulty levels. Include the queries that "obviously" should work and DON'T — those are the ones to track.eval/retrievers.py: aRetrieverprotocol with concrete implementations:DenseRetriever,RerankedRetriever,BM25Retriever(Phase 8),HybridRetriever(Phase 8). One matrix dimension per knob.eval/run_eval.py: computes MRR / Recall@5 / nDCG@5 across all retrievers; emits a markdown comparison table ateval/results/<baseline>.md. Commit the result so PRs land with the A/B evidence in the diff.
Three numbers are enough — don't overengineer. The hand-curated queries are the value; the metrics are just a stable way to score them.
Phase 8 — BM25 + Hybrid retrieval (half a day, conditional)
Skip unless your eval shows specific failure modes. Dense embeddings + cross-encoder reranker handle most queries. The case where they don't: queries with rare technical tokens (filenames, language names, error codes) get buried at dense rank 1000+ by a much larger prose corpus that's semantically nearby. The reranker only sees top-200, so it never gets a shot.
rag/bm25.py: SQLite FTS5 index, in the stdlib, on-disk (bm25/<product>.db). Two tables — metadata table keyed by rowid, FTS5 virtual table for full-text. Sanitize the query (strip FTS5 reserved keywords, OR-join tokens for recall). ~210 LOC._rrf_fuse()in the server — Reciprocal Rank Fusion withk=60. Per-id score =sum_over_retrievers(1 / (k + rank)). Returns ordered ids plus per-retriever contribution dict for telemetry.search_docshybrid path: run dense + BM25 in parallel, RRF-fuse, hand the merged top-200 to the reranker. Env-gated:HYBRID_SEARCH=true.- Log
top1_sourceper call (dense_only/bm25_only/both) to usage logs so you can measure whether BM25 is actually earning its keep on production traffic.
If after 4–6 weeks of production data you see bm25_only >= 80%,
you can simplify to BM25-only (much less infrastructure). If
both >= 50%, hybrid is acting as tie-breaker not rescue — keep it
or simplify depending on how much you care about the long tail.
Phase 9 — Multi-version diff tooling (1 day, if applicable)
Only relevant if the product has multiple maintained versions.
diff_versions(bundle_id, page_id, against_bundle_id): unified diff between two versions of the same page. Two matching strategies: editor-curatedtopic_clusterpeer (if the portal exposes it), or same-filename fallback.list_cluster(bundle_id, page_id): list cross-version peers for one page.bundle_changelog(bundle_id_new, bundle_id_old): added / removed / changed pages between two bundles, sorted by churn._diff_churn(a, b): small helper, ~15 LOC ofdifflib.unified_diff --unified=0line counting. Used bybundle_changelog,find_doc_inconsistencies, andweekly_digest.
Phase 10 — Usage logging (half a day)
Goal: per-call JSONL telemetry so you can answer "what are people actually asking for" and "is the new feature getting used."
docs_mcp/usage.py:TimedCallcontext manager that captures tool name, args, elapsed time, hits returned, any extra fields set by the tool via_call.set(key=value). Writes JSONL tovar/logs/usage.jsonl, rotated daily, kept 90 days.- Mount the log dir as a named compose volume so logs survive container recreates.
scripts/usage_report.py(standalone, no docs_mcp deps): reads the JSONL files, prints per-tool counts, top queries, 0-hit queries, filter usage histogram, reranker activity. Markdown output flag for piping into weekly digest emails.
What to log: query text, filters, hits returned, elapsed_ms, reranker_fired flag, hybrid top1_source, retrieval_mode. What NOT to log: anything PII-shaped. The corpus is public, queries are usually about the product, not personal — but be deliberate.
Phase 11 — Curated knowledge layer (2 days)
The "RAG can't tell you what isn't in the docs" gap. Surfaces:
- API quickstart repos if the product has them. Ingest the example scripts (Python, PowerShell, curl) into the corpus. Rewrite chunk-0 for each script to embed naturally — explicit natural-language H1, task description sentence, keyword bag. Dense embeddings need an anchor.
- A curated
<product>_api_lessonsmarkdown doc for things the swagger / OpenAPI doesn't say: auth flow gotchas, async-task patterns, schema bugs you've hit, platform-detection quirks. Surface as a dedicated MCP tool whose description tells the LLM: "Call proactively whenever the user asks you to write a script / integrate with the API / debug a 4xx response." - An auto-hint banner in
search_docsresults — when the query matches a script/API trigger word, render a one-line nudge at the top of results pointing at the dedicated tool. Belt-and- suspenders for queries where the LLM doesn't think to call it proactively.
Phase 12 — Doc-bug workflow tools (1 day, optional)
Two tools that pair up to enable a "check the docs for inconsistencies, draft bugs, confirm, submit" workflow.
-
find_doc_inconsistencies(scope_query, version=None, platform=None, max_pages=30, checks=None): deterministic, read-only. Two checks: cross-version drift (pages whose content shifted between immediate- previous versions in the actionable 10–60% churn band) and redirect-chain detection (short pages whose body is just a "see [other page] for details" pointer). Heavy lifting is line-level diff (difflib) against editor-curated cluster peers; the model judges which findings are real bugs. -
submit_doc_bug(page_url, content, email=None, rating=None, like=None): POSTs to the docs portal's feedback endpoint. Env-gated byDOC_BUG_SUBMIT_ENABLED=trueso dev/staging deployments can't accidentally hit the upstream. The tool's docstring is loud about a mandatory operator-confirmation workflow per submission — LLM must draft, show, ask, then submit. Explicit "do not loop" instruction. Defensive validation upfront (URL host matches expected portal, content non-empty, etc.) so the LLM gets a clean error instead of a rejected POST.
You'll need to find the docs portal's feedback endpoint. Most portals route the "Was this helpful?" widget through a backend API; sniff the browser network tab on the live site. The payload shape varies; common fields: content/body, page url/href, optional email, optional rating, optional thumbs. Most accept anonymous POSTs with no captcha at the JSON-API layer (even if the widget shows a captcha). Validate before you ship — and if the endpoint has rate limits or captcha enforcement, the tool returns a clean "submission rejected — paste manually at " fallback.
The whole point is the per-bug operator confirmation in the LLM-side conversation flow; the tool description enforces it. Do not bypass.
Phase 13 — Weekly digest tool (half a day)
Goal: a tool that answers "what changed in the docs in the last N days?" with no runtime git dependency (the prod container has no git).
- Extend
scrape/changelog.pywith--json(one-shot structured output) and--history-out PATH(walksgit log --first-parent --since="<N> days ago"for corpus-touching commits, writes one JSON line per commit to a JSONL file). - CI workflows write the JSONL file into the image at build time:
corpus/.digest/history.jsonl. Bothrefresh.ymlandimage-only.yml.fetch-depth: 0is required — see Phase 5. - New MCP tool
weekly_digest(days=7, version=None, platform=None, max_bundles=25, max_pages_per_bundle=10): reads the JSONL, filters to the window, applies version/platform viabundles.jsonmetadata, aggregates per-bundle change counts and page lists, renders markdown. - Post-filter totals are critical: the headline "X page changes across Y bundles" must compute X from the filtered set, not the raw record count. Otherwise filtered calls look wrong to the reader.
Out of scope but trivial bolt-ons: scheduled HTML email of the digest, auto-publish to a blog, per-page diff excerpts as a follow-up tool.
Standard tool set
By the end you'll have ~15 tools registered. Production-tested shape:
| Tool | What it does |
|---|---|
search_docs |
Semantic search with version/platform/bundle filters |
get_page |
Full markdown + metadata for one page |
list_versions |
Discover available facet values |
list_cluster |
Cross-version peers for one page (if applicable) |
diff_versions |
Unified diff of a page across two versions |
bundle_changelog |
Added / removed / changed pages between two bundles |
weekly_digest |
What changed in the last N days, with filters |
corpus_status |
Freshness + size of the knowledge base |
find_doc_inconsistencies |
Scoped scan for doc bugs |
submit_doc_bug |
Submit a drafted bug (env-gated, operator-confirmed) |
<product>_api_lessons |
Curated API gotchas, proactively-called |
| product-specific tools | Interop matrix, lifecycle queries, etc. |
Per-product customization checklist
When applying this template to a new product, here's what you have to figure out yourself — everything else is shared infrastructure:
- Doc portal mechanics
- URL pattern for pages
- Bundle/version concept (Zoomin "bundle", Madcap "project", GitBook "space", Docusaurus "docs version" — same idea, different name)
- SPA backing API (sniff the network tab) or fallback to headless browser
- How
topic_cluster-equivalent cross-version peers are exposed (or whether you synthesize them from filenames)
- Bundle metadata schema
- What does
versionlook like? Semver, calendar, named? - What does
platformmean for this product? Is there a useful facet at all? - Other useful facets (language, product line, edition)?
- What does
- Filterable facets for
search_docs- One filter per high-cardinality facet
- Skip filters that have <5 distinct values — they're not worth the surface area
- Feedback endpoint (for
submit_doc_bug, if you want it)- URL of the POST endpoint
- Required + optional payload fields
- Captcha / rate-limit behavior
- Whether anonymous submissions are accepted
- Curated knowledge for the
_api_lessonstool- What does the product's API documentation NOT say that you've learned from real integration work?
- Quickstart / example repos
- Does the vendor publish working code? Ingest it; rewrite chunk-0 for natural-language retrieval.
Decisions worth carrying forward
Things you'll save time on by deciding the same way again:
- Tool descriptions are user interface. The LLM reads them verbatim and decides whether to call the tool. "Use when..." and "Call proactively whenever..." are real surfaces; treat them like button labels. Most retrieval improvements turn out to be tool-description rewrites in disguise.
stateless_http=Trueon the FastMCP server. Eliminates whole categories of session-ID-related 404 storms after container recreates.- Pre-bake everything at CI time. No runtime calls to git, external services, or anything you wouldn't trust on a Cloudflare outage. If the digest needs git history, write a JSONL file at CI time. If the lessons doc needs to load fast, bake it into the image.
- Env-gate every side-effecting tool. Off by default in dev; on only in production compose. Belt and suspenders against accidental writes from staging environments.
- Operator-confirmation pattern for side-effecting tools. The tool docstring is the only place to enforce human-in-the-loop. Make it loud. "MANDATORY", "Do not loop", "show-confirm-then-submit" — those phrasings work.
- Verify with hand-curated golden queries before shipping any retrieval change. Numbers in the diff, in the commit message. Don't ship retrieval changes on vibes.
- Two-cadence CI (weekly scrape vs on-demand code-only) saves hours per code iteration once you're past the one-iteration-a-week stage.
- Rolling tag + sha-pinned tag deploy pattern.
:latestis what Watchtower watches;:<sha>is your safety net. Rollback is a one-line compose edit, not a redeploy. - Usage logging is non-negotiable. You will be wrong about what people use. Capture the truth from day one; let it tell you which features to keep building and which to delete.
Glossary
- Bundle — one logical doc set in the portal. Zoomin calls
them bundles; Madcap calls them projects; the concept is the
same: a versioned, titled collection of pages. One dir under
corpus/. - Page — one HTML page in a bundle. One
.md+ one.jsonsidecar under the bundle dir. - Topic cluster — Zoomin's name for "this page in version 10.9 corresponds to that page in version 10.8." Stored in the per-page sidecar. The portal-agnostic concept is "cross-version peer mapping."
- Chunk — a unit of text that gets independently embedded and stored in Chroma. Target ~400-600 tokens; preserve paragraph boundaries.
- RRF — Reciprocal Rank Fusion. The way to merge two ranked lists from independent retrievers without score calibration.
What's deliberately NOT in this template
Decisions you should make per-product (not copy from the original build):
- The reverse proxy and TLS termination layer. Could be Caddy, nginx, Traefik, Cloudflare Tunnel — pick what your infra uses.
- The Gateway / aggregator in front of multiple MCPs (MetaMCP is one option; you may not need any aggregator if you're running a single product MCP).
- The specific embedding model —
nomic-embed-textis a strong default but newer / domain-specific models may be better for some products. - The Ollama containers / GPU setup — depends on what hardware you
have. The pattern is one container per GPU with explicit
NVIDIA_VISIBLE_DEVICESpinning; the indexer load-balances across them. - Whether to publish a blog series alongside the build. Strongly recommended (forces clarity, builds an audience), but optional.