Files
hvm-docs/PLAN.md
justin 9ba615c8ee initial: docs-mcp-template — build guide + scaffolded server
Template for building hosted MCP servers over a product's public
documentation. Distilled from one production build; everything
product-specific has been factored out.

Contents:

- PLAN.md — comprehensive build guide. 13 phases from project
  skeleton through weekly_digest. Includes the gotchas
  ("fetch-depth: 0 always", reranker per-pair token limit,
  Cloudflare body cap, dash-not-bash on Gitea runners), the
  decisions worth carrying forward, and a per-product
  customization checklist.

- CLAUDE.md — guidance for Claude Code working in a clone of this
  template. Phase identification table, conventions (env-gating +
  operator confirmation for side-effecting tools, defensive
  fallback for retrieval components), common commands.

- README.md — quick-start summary.

Scaffolded code (all signature-stable, with NotImplementedError
stubs where phase-specific work is required):

  docs_mcp/server.py    FastMCP server, stateless_http=True, with
                        search_docs / get_page / list_versions
                        baseline tools and commented stubs for the
                        rest of the phase set.
  docs_mcp/usage.py     TimedCall telemetry, JSONL, daily rotation,
                        90-day retention. Reusable as-is.
  rag/embeddings.py     Ollama embedder (nomic-embed-text default),
                        load-balanced across N URLs. Reusable.
  rag/chunk.py          Paragraph-aware chunker with synthetic
                        chunk 0. Per-product tunable.
  rag/index.py          Chroma + BM25 builder. --rebuild and
                        --bm25-only flags.
  rag/bm25.py           SQLite FTS5 lexical index. Reusable.
  scrape/changelog.py   --cached / --ref / --json / --history-out.
                        Reusable.
  scrape/README.md      What you write per-product.
  eval/queries.jsonl.example
                        Curate ~25 hand-labeled queries here.
  eval/retrievers.py    Retriever protocol + stub classes.
  eval/run_eval.py      MRR / Recall@K / nDCG@K harness skeleton.
  scripts/usage_report.py
                        Standalone log analyzer; the
                        FOLLOW-UP CHECKS pattern noted in the
                        module docstring.
  scripts/registry_gc.py
                        Gitea container registry cleanup. Reusable.

Deployment + CI:

  Dockerfile               Python 3.12-slim; COPY corpus + chroma
                           + bm25 last for cache efficiency.
  deploy/docker-compose.yml MCP + reranker sidecar + Watchtower.
                           Templated with <placeholders>.
  .gitea/workflows/refresh.yml    Weekly cron + manual dispatch.
                                  fetch-depth: 0, retry-on-race,
                                  three-tag image scheme.
  .gitea/workflows/image-only.yml Code-only ship cycle, ~18min.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:18:17 -04:00

26 KiB
Raw Permalink Blame History

Docs MCP Server — Build Guide

A reusable recipe for building a hosted MCP server over a product's public documentation. Distilled from one production build; everything product-specific has been factored out.

The end product is a streamable-HTTP MCP server with ~15 tools that any LLM client (Claude Desktop, Claude Code, Cursor, Copilot) can call to answer questions against the docs, surface what changed recently, find inconsistencies, and (optionally) submit doc bugs back upstream.


What you're building

A pipeline with these stages:

upstream docs portal
        │
        ▼
   scrape  ──► corpus/<bundle>/<page>.md + .json sidecar
        │
        ▼
    chunk + embed  ──► chroma/  (dense vectors)
        │           ──► bm25/   (FTS5 lexical index)
        ▼
   MCP server  ──► search_docs / get_page / diff_versions / weekly_digest /
                   find_doc_inconsistencies / submit_doc_bug / ...
        │
        ▼
   reverse proxy / Cloudflare Tunnel ──► public endpoint

Two CI cadences:

  • Weekly cron (~40 min): full re-scrape, re-chunk, re-embed, image build & push.
  • On-demand image-only (~18 min): code-only rebuild from committed corpus, image build & push.

A container registry (self-hosted Gitea works well), a host running Docker Compose, Watchtower auto-updating from :latest, and a reverse proxy in front.


Build phases

Each phase is a discrete, shippable unit. Build them in order; each one is useful on its own and unlocks the next. Realistic effort per phase is given as a rough order of magnitude. Total: roughly 23 weeks of focused work for the full stack.

Phase 0 — Project skeleton (half a day)

Goals: directory layout, dependency manifest, virtualenv.

  • Top-level dirs: scrape/, corpus/ (gitignored), rag/, docs_mcp/, eval/, scripts/, deploy/, .gitea/workflows/.
  • requirements.txt with the dependencies you'll need across all phases (FastMCP, chromadb, httpx, beautifulsoup4 or whatever HTML parser, ollama or sentence-transformers client, etc.).
  • python -m venv venv and pin Python version (3.11 or 3.12 — be conservative; some embedding libraries have version-specific wheels).
  • .gitignore: venv/, corpus/ (regenerable), chroma/ (regenerable), bm25/ (regenerable), *.pyc, __pycache__/, .pytest_cache/.

Phase 1 — Scraper (24 days, product-specific)

This is the most product-dependent phase. The goal is to write a scraper that produces a normalized corpus layout regardless of upstream portal shape.

Output shape (mandatory):

corpus/
  <bundle_id>/             # one dir per "doc bundle" — see Glossary
    <page_id>.md           # markdown body
    <page_id>.json         # sidecar with structured metadata
  ...
bundles.json               # catalog of bundles with metadata

Bundle metadata (bundles.json is a list of these):

{
  "slug":          "<bundle_id>",
  "title":         "User-facing title",
  "version":       "10.9",
  "platform":      "VMware vSphere",   // may be null
  "product":       "Admin Guide",       // optional but useful
  "language":      "en-US",
  "page_count":    127,
  "dates": {
    "Added on":    "2024-01-15",
    "Updated on":  "2026-05-20"
  },
  "landing_page":  "<page_id>"
}

Per-page sidecar (<page_id>.json) carries page-level metadata. The one field that matters cross-cutting is topic_cluster (see Phase 9):

{
  "bundle_id":     "<bundle_id>",
  "page_id":       "<page_id>",
  "title":         "How to ...",
  "ordinal":       42,
  "topic_cluster": {
    "clustering_title": "How to ...",
    "clustered_topics": [
      {"bundle_id": "...10.8", "page_id": "How_to_X.htm", "clustering_title": "..."},
      {"bundle_id": "...10.9", "page_id": "How_to_X.htm", "clustering_title": "..."}
    ]
  }
}

If the portal exposes a cross-version "this page corresponds to that page" mapping, capture it here. If it doesn't, you can synthesize a filename-based fallback (same filename across bundle versions = same topic) and live without the editor-curated mapping. The features that read topic_cluster (list_cluster, diff_versions, find_doc_inconsistencies, parts of weekly_digest) will work either way; they're more accurate with real clusters.

Patterns that recur across doc portals:

  • Most modern doc portals are SPAs. Plain requests.get won't see rendered content. Either find the underlying API the SPA calls (the cheapest, most reliable path), or fall back to a headless browser (Playwright). The API path is almost always available; sniff the network tab.
  • Portals usually expose a "bundle/topic" hierarchy under the hood (Zoomin, Madcap Flare, Paligo, GitBook, Docusaurus all do). Map it to bundles.json + corpus/<bundle>/<page>.
  • Many portals expose ?save_local= or .pdf rendered versions; the HTML they serve is structurally cleaner than what the page shows through the SPA shell.

scrape/changelog.py (~250 LOC; see Phase 13) — provides summarize_diff(), render_human(), walk_history() and the --json / --history-out modes. Mostly reusable as-is; the only product-specific bit is the path layout assumption.

Phase 2 — Chunking + embeddings + Chroma (2 days)

Goal: build a queryable dense index from the scraped corpus.

  • rag/chunk.py — split each page's markdown into ~400-600 token chunks. Strategy that works: paragraph-aware splitter with a rich "chunk 0" containing the page title + 1-sentence summary + bag-of-words from key terms. Chunk 0 is what dense retrieval lands on first; getting it right dominates retrieval quality.
  • rag/embeddings.py — pluggable embedder. Recommended start: Ollama-hosted nomic-embed-text (768-dim, free, good baseline). Other defensible choices: text-embedding-3-small (OpenAI), bge-m3 (also via Ollama). The embedder is a Chroma EmbeddingFunction that returns list[list[float]] for a list of texts.
  • rag/index.py — orchestrates: read corpus → emit chunks (with metadata: bundle_id, page_id, version, platform, ordinal) → upsert into Chroma collection. --rebuild flag for a clean reindex. Run via python -m rag.index --rebuild.

Chroma settings: PersistentClient(path="chroma/") and Settings(anonymized_telemetry=False). Single collection (<product>_docs).

GPU note: embedding 70K chunks on CPU takes hours; on a GPU (via Ollama with NVIDIA_VISIBLE_DEVICES) takes ~10 minutes. Two GPUs in parallel: ~5 minutes. The orchestrator just needs to load- balance HTTP requests across multiple Ollama endpoints.

Phase 3 — MCP server skeleton (1 day)

Goal: working FastMCP server with three tools — search_docs, get_page, list_versions.

  • docs_mcp/server.pyFastMCP("<product>-docs", stateless_http=True). stateless_http=True is critical for production hosting: every request creates an ephemeral session, so container recreates don't produce a 404 storm from stale mcp-session-id headers on clients.
  • Lazy initialization for everything expensive (Chroma client, embedder, bundles catalog) so the server starts cleanly even when Ollama is briefly unreachable.
  • Tool: search_docs(query, version=None, platform=None, bundle_id=None, k=10). Returns markdown of top-k chunks with full source URLs.
  • Tool: get_page(bundle_id, page_id). Returns full page markdown + metadata.
  • Tool: list_versions(). Returns the version/platform facets available, drawn from bundles.json. Helps the LLM pick filter values.

Transports: stdio (for local Claude Desktop dev), streamable-HTTP (for hosted production). One argparse switch.

@mcp.tool()
def search_docs(
    query: Annotated[str, Field(description="Natural-language query about <product>.")],
    version: Annotated[str | None, Field(description="Restrict to one version")] = None,
    ...
) -> str:
    ...

The tool descriptions are first-class context — the LLM reads them and decides whether to call the tool. Treat them as button labels; use "Call when..." / "Use proactively whenever..." phrasings.

Phase 4 — Containerization (1 day)

Goal: image you can run anywhere.

  • Dockerfile: Python 3.12-slim base, install requirements, COPY scrape rag diff docs_mcp + bundles.json + corpus/ chroma/
    • (later) bm25/. Don't COPY scripts/ — those stay external for ops use only.
  • ENTRYPOINT ["python", "-m", "docs_mcp.server", "--transport", "streamable-http"]. Configurable host/port via env.
  • deploy/docker-compose.yml: one service, named volumes for usage logs and any state, Watchtower label, depends_on for the reranker sidecar (Phase 6).

Smoke-test locally: docker compose up should expose http://localhost:8000/mcp and respond to an MCP initialize JSON-RPC.

Phase 5 — CI on self-hosted Gitea Actions (12 days)

Goal: weekly cron rebuild + on-demand code-only ship cycle.

Two workflows, two cadences:

Workflow Trigger Steps Runtime
refresh.yml Monday cron + manual dispatch scrape → commit corpus → rebuild indexes → build & push image ~40 min
image-only.yml manual dispatch only rebuild indexes from committed corpus → build & push image ~18 min

Critical settings (learned the hard way):

  • fetch-depth: 0 on actions/checkout@v4. The default depth is 1 (shallow), which breaks any step that walks git history (changelog, digest history walker). Pay the ~10 second cost; never debug a "0-byte history file" mystery.
  • runs-on: docker (Gitea convention, not ubuntu-latest).
  • Runner shell is /bin/sh (dash), not bash. ${VAR::N} substring expansion doesn't exist; use cut / printf / awk.

Retry-on-race pattern for long-running scrapes:

attempt=1
while [ $attempt -le 3 ]; do
  if git push; then
    echo "pushed (attempt $attempt)"
    break
  fi
  [ $attempt -eq 3 ] && { echo "still failing"; exit 1; }
  git fetch origin main
  git rebase origin/main || { echo "conflict — bail"; exit 1; }
  attempt=$((attempt + 1))
done

Works because scrape commits only touch corpus/ + bundles.json, and code merges only touch .py / .yml — disjoint paths, trivially clean rebases.

Image tagging — three tags per build:

Tag Purpose
:latest Watchtower watches this for auto-deploy
:<sha12> Immutable; rollback target
:<YYYY.MM.DD> Human-readable in incident notes

Same tag set on every build; rollback is a one-line compose edit to pin :<sha> instead of :latest.

Container registry behind Cloudflare:

Cloudflare's free tier has a 100 MB request body limit. Big image layers (Chroma index can easily be 800+ MB) exceed it on push. The fix is a LAN registry endpoint for push, public hostname for pull:

env:
  REGISTRY_PUSH: <lan-ip>:<port>     # bypasses Cloudflare
  REGISTRY_PULL: <public-hostname>   # response bodies aren't capped

Runner needs the LAN endpoint in /etc/docker/daemon.json insecure-registries. Costs nothing operationally; saves hours of debugging.

Registry GC: weekly cron in the workflow that walks the package versions, keeps :latest + N most-recent date tags + anything pushed in the last 90 days, deletes the rest. Worth ~50 LOC; the package GC on the Gitea side reclaims disk after.

Phase 6 — Reranker (half a day)

Goal: lift retrieval quality 3× by cross-encoder reranking the top-N dense candidates.

  • A /v1/rerank HTTP endpoint backed by llama.cpp serving jina-reranker-v2-base (GGUF). Runs as a sidecar in compose. GPU strongly recommended (CPU latency is unworkable for live queries).
  • _rerank(query, docs) helper in the server: POST to the endpoint, apply the scores, re-sort the top-N candidates. Defensive: on any failure log a warning and fall through to dense-only.
  • Env: RERANK_URL (off by default), RERANK_POOL (how deep to pull candidates for reranking; 200 is a good default), RERANK_TIMEOUT (30s for cold-start tolerance).
  • Watch the per-pair token limit. Jina's GGUF reports n_ctx_train=1024 and llama.cpp will reject the ENTIRE batch if any pair exceeds it. Truncate doc text to ~2000 chars before reranking. The full untruncated chunk still goes back to the user; truncation is only for the reranker scoring path.

Phase 7 — Eval harness (1 day)

Goal: hand-curated golden queries + standard metrics so you can measure the impact of any retrieval change.

  • eval/queries.jsonl: 2025 hand-curated queries with expected pages. Spread across versions, platforms, and difficulty levels. Include the queries that "obviously" should work and DON'T — those are the ones to track.
  • eval/retrievers.py: a Retriever protocol with concrete implementations: DenseRetriever, RerankedRetriever, BM25Retriever (Phase 8), HybridRetriever (Phase 8). One matrix dimension per knob.
  • eval/run_eval.py: computes MRR / Recall@5 / nDCG@5 across all retrievers; emits a markdown comparison table at eval/results/<baseline>.md. Commit the result so PRs land with the A/B evidence in the diff.

Three numbers are enough — don't overengineer. The hand-curated queries are the value; the metrics are just a stable way to score them.

Phase 8 — BM25 + Hybrid retrieval (half a day, conditional)

Skip unless your eval shows specific failure modes. Dense embeddings + cross-encoder reranker handle most queries. The case where they don't: queries with rare technical tokens (filenames, language names, error codes) get buried at dense rank 1000+ by a much larger prose corpus that's semantically nearby. The reranker only sees top-200, so it never gets a shot.

  • rag/bm25.py: SQLite FTS5 index, in the stdlib, on-disk (bm25/<product>.db). Two tables — metadata table keyed by rowid, FTS5 virtual table for full-text. Sanitize the query (strip FTS5 reserved keywords, OR-join tokens for recall). ~210 LOC.
  • _rrf_fuse() in the server — Reciprocal Rank Fusion with k=60. Per-id score = sum_over_retrievers(1 / (k + rank)). Returns ordered ids plus per-retriever contribution dict for telemetry.
  • search_docs hybrid path: run dense + BM25 in parallel, RRF-fuse, hand the merged top-200 to the reranker. Env-gated: HYBRID_SEARCH=true.
  • Log top1_source per call (dense_only / bm25_only / both) to usage logs so you can measure whether BM25 is actually earning its keep on production traffic.

If after 46 weeks of production data you see bm25_only >= 80%, you can simplify to BM25-only (much less infrastructure). If both >= 50%, hybrid is acting as tie-breaker not rescue — keep it or simplify depending on how much you care about the long tail.

Phase 9 — Multi-version diff tooling (1 day, if applicable)

Only relevant if the product has multiple maintained versions.

  • diff_versions(bundle_id, page_id, against_bundle_id): unified diff between two versions of the same page. Two matching strategies: editor-curated topic_cluster peer (if the portal exposes it), or same-filename fallback.
  • list_cluster(bundle_id, page_id): list cross-version peers for one page.
  • bundle_changelog(bundle_id_new, bundle_id_old): added / removed / changed pages between two bundles, sorted by churn.
  • _diff_churn(a, b): small helper, ~15 LOC of difflib.unified_diff --unified=0 line counting. Used by bundle_changelog, find_doc_inconsistencies, and weekly_digest.

Phase 10 — Usage logging (half a day)

Goal: per-call JSONL telemetry so you can answer "what are people actually asking for" and "is the new feature getting used."

  • docs_mcp/usage.py: TimedCall context manager that captures tool name, args, elapsed time, hits returned, any extra fields set by the tool via _call.set(key=value). Writes JSONL to var/logs/usage.jsonl, rotated daily, kept 90 days.
  • Mount the log dir as a named compose volume so logs survive container recreates.
  • scripts/usage_report.py (standalone, no docs_mcp deps): reads the JSONL files, prints per-tool counts, top queries, 0-hit queries, filter usage histogram, reranker activity. Markdown output flag for piping into weekly digest emails.

What to log: query text, filters, hits returned, elapsed_ms, reranker_fired flag, hybrid top1_source, retrieval_mode. What NOT to log: anything PII-shaped. The corpus is public, queries are usually about the product, not personal — but be deliberate.

Phase 11 — Curated knowledge layer (2 days)

The "RAG can't tell you what isn't in the docs" gap. Surfaces:

  • API quickstart repos if the product has them. Ingest the example scripts (Python, PowerShell, curl) into the corpus. Rewrite chunk-0 for each script to embed naturally — explicit natural-language H1, task description sentence, keyword bag. Dense embeddings need an anchor.
  • A curated <product>_api_lessons markdown doc for things the swagger / OpenAPI doesn't say: auth flow gotchas, async-task patterns, schema bugs you've hit, platform-detection quirks. Surface as a dedicated MCP tool whose description tells the LLM: "Call proactively whenever the user asks you to write a script / integrate with the API / debug a 4xx response."
  • An auto-hint banner in search_docs results — when the query matches a script/API trigger word, render a one-line nudge at the top of results pointing at the dedicated tool. Belt-and- suspenders for queries where the LLM doesn't think to call it proactively.

Phase 12 — Doc-bug workflow tools (1 day, optional)

Two tools that pair up to enable a "check the docs for inconsistencies, draft bugs, confirm, submit" workflow.

  • find_doc_inconsistencies(scope_query, version=None, platform=None, max_pages=30, checks=None): deterministic, read-only. Two checks: cross-version drift (pages whose content shifted between immediate- previous versions in the actionable 1060% churn band) and redirect-chain detection (short pages whose body is just a "see [other page] for details" pointer). Heavy lifting is line-level diff (difflib) against editor-curated cluster peers; the model judges which findings are real bugs.

  • submit_doc_bug(page_url, content, email=None, rating=None, like=None): POSTs to the docs portal's feedback endpoint. Env-gated by DOC_BUG_SUBMIT_ENABLED=true so dev/staging deployments can't accidentally hit the upstream. The tool's docstring is loud about a mandatory operator-confirmation workflow per submission — LLM must draft, show, ask, then submit. Explicit "do not loop" instruction. Defensive validation upfront (URL host matches expected portal, content non-empty, etc.) so the LLM gets a clean error instead of a rejected POST.

You'll need to find the docs portal's feedback endpoint. Most portals route the "Was this helpful?" widget through a backend API; sniff the browser network tab on the live site. The payload shape varies; common fields: content/body, page url/href, optional email, optional rating, optional thumbs. Most accept anonymous POSTs with no captcha at the JSON-API layer (even if the widget shows a captcha). Validate before you ship — and if the endpoint has rate limits or captcha enforcement, the tool returns a clean "submission rejected — paste manually at " fallback.

The whole point is the per-bug operator confirmation in the LLM-side conversation flow; the tool description enforces it. Do not bypass.

Phase 13 — Weekly digest tool (half a day)

Goal: a tool that answers "what changed in the docs in the last N days?" with no runtime git dependency (the prod container has no git).

  • Extend scrape/changelog.py with --json (one-shot structured output) and --history-out PATH (walks git log --first-parent --since="<N> days ago" for corpus-touching commits, writes one JSON line per commit to a JSONL file).
  • CI workflows write the JSONL file into the image at build time: corpus/.digest/history.jsonl. Both refresh.yml and image-only.yml. fetch-depth: 0 is required — see Phase 5.
  • New MCP tool weekly_digest(days=7, version=None, platform=None, max_bundles=25, max_pages_per_bundle=10): reads the JSONL, filters to the window, applies version/platform via bundles.json metadata, aggregates per-bundle change counts and page lists, renders markdown.
  • Post-filter totals are critical: the headline "X page changes across Y bundles" must compute X from the filtered set, not the raw record count. Otherwise filtered calls look wrong to the reader.

Out of scope but trivial bolt-ons: scheduled HTML email of the digest, auto-publish to a blog, per-page diff excerpts as a follow-up tool.


Standard tool set

By the end you'll have ~15 tools registered. Production-tested shape:

Tool What it does
search_docs Semantic search with version/platform/bundle filters
get_page Full markdown + metadata for one page
list_versions Discover available facet values
list_cluster Cross-version peers for one page (if applicable)
diff_versions Unified diff of a page across two versions
bundle_changelog Added / removed / changed pages between two bundles
weekly_digest What changed in the last N days, with filters
corpus_status Freshness + size of the knowledge base
find_doc_inconsistencies Scoped scan for doc bugs
submit_doc_bug Submit a drafted bug (env-gated, operator-confirmed)
<product>_api_lessons Curated API gotchas, proactively-called
product-specific tools Interop matrix, lifecycle queries, etc.

Per-product customization checklist

When applying this template to a new product, here's what you have to figure out yourself — everything else is shared infrastructure:

  • Doc portal mechanics
    • URL pattern for pages
    • Bundle/version concept (Zoomin "bundle", Madcap "project", GitBook "space", Docusaurus "docs version" — same idea, different name)
    • SPA backing API (sniff the network tab) or fallback to headless browser
    • How topic_cluster -equivalent cross-version peers are exposed (or whether you synthesize them from filenames)
  • Bundle metadata schema
    • What does version look like? Semver, calendar, named?
    • What does platform mean for this product? Is there a useful facet at all?
    • Other useful facets (language, product line, edition)?
  • Filterable facets for search_docs
    • One filter per high-cardinality facet
    • Skip filters that have <5 distinct values — they're not worth the surface area
  • Feedback endpoint (for submit_doc_bug, if you want it)
    • URL of the POST endpoint
    • Required + optional payload fields
    • Captcha / rate-limit behavior
    • Whether anonymous submissions are accepted
  • Curated knowledge for the _api_lessons tool
    • What does the product's API documentation NOT say that you've learned from real integration work?
  • Quickstart / example repos
    • Does the vendor publish working code? Ingest it; rewrite chunk-0 for natural-language retrieval.

Decisions worth carrying forward

Things you'll save time on by deciding the same way again:

  • Tool descriptions are user interface. The LLM reads them verbatim and decides whether to call the tool. "Use when..." and "Call proactively whenever..." are real surfaces; treat them like button labels. Most retrieval improvements turn out to be tool-description rewrites in disguise.
  • stateless_http=True on the FastMCP server. Eliminates whole categories of session-ID-related 404 storms after container recreates.
  • Pre-bake everything at CI time. No runtime calls to git, external services, or anything you wouldn't trust on a Cloudflare outage. If the digest needs git history, write a JSONL file at CI time. If the lessons doc needs to load fast, bake it into the image.
  • Env-gate every side-effecting tool. Off by default in dev; on only in production compose. Belt and suspenders against accidental writes from staging environments.
  • Operator-confirmation pattern for side-effecting tools. The tool docstring is the only place to enforce human-in-the-loop. Make it loud. "MANDATORY", "Do not loop", "show-confirm-then-submit" — those phrasings work.
  • Verify with hand-curated golden queries before shipping any retrieval change. Numbers in the diff, in the commit message. Don't ship retrieval changes on vibes.
  • Two-cadence CI (weekly scrape vs on-demand code-only) saves hours per code iteration once you're past the one-iteration-a-week stage.
  • Rolling tag + sha-pinned tag deploy pattern. :latest is what Watchtower watches; :<sha> is your safety net. Rollback is a one-line compose edit, not a redeploy.
  • Usage logging is non-negotiable. You will be wrong about what people use. Capture the truth from day one; let it tell you which features to keep building and which to delete.

Glossary

  • Bundle — one logical doc set in the portal. Zoomin calls them bundles; Madcap calls them projects; the concept is the same: a versioned, titled collection of pages. One dir under corpus/.
  • Page — one HTML page in a bundle. One .md + one .json sidecar under the bundle dir.
  • Topic cluster — Zoomin's name for "this page in version 10.9 corresponds to that page in version 10.8." Stored in the per-page sidecar. The portal-agnostic concept is "cross-version peer mapping."
  • Chunk — a unit of text that gets independently embedded and stored in Chroma. Target ~400-600 tokens; preserve paragraph boundaries.
  • RRF — Reciprocal Rank Fusion. The way to merge two ranked lists from independent retrievers without score calibration.

What's deliberately NOT in this template

Decisions you should make per-product (not copy from the original build):

  • The reverse proxy and TLS termination layer. Could be Caddy, nginx, Traefik, Cloudflare Tunnel — pick what your infra uses.
  • The Gateway / aggregator in front of multiple MCPs (MetaMCP is one option; you may not need any aggregator if you're running a single product MCP).
  • The specific embedding model — nomic-embed-text is a strong default but newer / domain-specific models may be better for some products.
  • The Ollama containers / GPU setup — depends on what hardware you have. The pattern is one container per GPU with explicit NVIDIA_VISIBLE_DEVICES pinning; the indexer load-balances across them.
  • Whether to publish a blog series alongside the build. Strongly recommended (forces clarity, builds an audience), but optional.