Files
hvm-docs/CLAUDE.md
T
justin 33b0fd652e ci: derive image name + package linking from repo, add link step
Both workflows had a static IMAGE env (<owner>/<product>-docs-mcp)
and a static --package arg in the GC step. Switch both to Gitea
Actions context variables so a clone of the template into any repo
name works on the first CI run without find/replace:

  IMAGE: ${{ github.repository_owner }}/${{ github.event.repository.name }}
  --owner ${{ github.repository_owner }}
  --package ${{ github.event.repository.name }}

Also add the "Link container package to this repo" step that was
missing from the template (and which, naively copy-pasted from the
reference build, would have linked everything back to docs-mcp-
template). The new step derives owner + package + link-target all
from the running repo's context.

The github.* namespace is Gitea Actions' inherited GitHub-Actions
context — values come from the Gitea server, not github.com. Same
mechanism the reference build's $GITHUB_SHA tag-builder uses.

CLAUDE.md updated to note that image and package naming are
repo-derived; only registry endpoints and the Ollama URL need
per-clone editing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:34:26 -04:00

11 KiB
Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Purpose

This is a template for building an MCP server over a product's public documentation. When you (Claude) are working in a clone of this repo, you are helping the user implement one specific product's docs MCP — not editing the template itself.

Read PLAN.md first. It's the canonical build guide and lays out 13 phases. Most user requests will be "implement Phase N" or "we hit a bug in Phase N." Identify the phase before doing anything else.

Working with this template

Identifying the current phase

When the user clones this template and starts working, figure out which phase they're on by inspecting:

Signal Likely phase
corpus/ doesn't exist Phase 1 (scraper) — they need to build it before anything else works
corpus/ exists, chroma/ doesn't Phase 2 (indexing)
Indexes exist, only search_docs / get_page / list_versions implemented Phase 3 (server skeleton done; next: Dockerfile + CI)
No Dockerfile or .gitea/workflows/ updated Phase 45
RERANK_URL env unset in compose Phase 6 not done
HYBRID_SEARCH env unset, no rag/bm25.py content Phase 8 not done
No eval/results/ directory Phase 7 not done
find_doc_inconsistencies / submit_doc_bug are commented-out stubs in docs_mcp/server.py Phase 12
No corpus/.digest/ produced by CI Phase 13

When in doubt, ask the user: "Which phase from PLAN.md are we working on?"

The scaffolded server has stubs

docs_mcp/server.py ships with three working tools (search_docs, get_page, list_versions) and signature-only stubs for the phase-specific tools. The stubs raise NotImplementedError with a phase hint in the docstring. When implementing a phase, you'll be filling these bodies in — DO NOT change the signatures unless the user has a specific reason. Signatures are the public contract between the MCP and its clients (Claude Desktop, Claude Code, Cursor, etc.).

Layout

.
├── PLAN.md                       # Read first. Phase-by-phase build guide.
├── README.md                     # Quick-start summary.
├── CLAUDE.md                     # This file.
├── requirements.txt
├── Dockerfile
├── deploy/docker-compose.yml
├── .gitea/workflows/
│   ├── refresh.yml               # Weekly cron: scrape + index + image
│   └── image-only.yml            # On-demand: code-only ship cycle
├── scrape/                       # Phase 1 — product-specific scraper here
│   └── changelog.py              # Reusable: --json, --history-out
├── rag/                          # Phase 2/8 — indexing
│   ├── embeddings.py             # Ollama embedder (swappable)
│   ├── chunk.py                  # Page → chunks (adjust per page format)
│   ├── index.py                  # Builds Chroma + BM25
│   └── bm25.py                   # SQLite FTS5 lexical index
├── docs_mcp/                     # Phase 3+ — MCP server
│   ├── server.py                 # FastMCP + tool definitions
│   └── usage.py                  # TimedCall telemetry
├── eval/                         # Phase 7 — golden-query harness
│   ├── queries.jsonl.example
│   ├── retrievers.py
│   └── run_eval.py
├── scripts/                      # Standalone ops scripts
│   ├── usage_report.py
│   └── registry_gc.py
└── deploy/
    └── docker-compose.yml

Conventions

Tool docstrings are user interface

The text in @mcp.tool() docstrings is what the LLM sees and uses to decide whether to call the tool. Treat it like a button label. "Use when...", "Call proactively whenever..." phrasings work well. Don't bury the headline in implementation notes.

Side-effecting tools must be env-gated AND operator-confirmed

Any tool that POSTs to an external service (submit_doc_bug being the canonical example):

  1. Must check an env flag at call time and return a "disabled, manual fallback at " message if unset.
  2. Must have a loud docstring requiring per-call operator confirmation in the LLM conversation flow (the LLM drafts, shows the operator the exact payload, asks yes/no, only then calls).
  3. Must do upfront validation (URL allowlist, content length, etc.) so the LLM gets a clean error instead of a wire-level failure.

Match the submit_doc_bug patterns documented in PLAN.md Phase 12.

Defensive fallback for retrieval components

The reranker, BM25 index, and any external dependency must fail gracefully:

  • Catch the specific exception type
  • Log a warning with enough info to debug
  • Fall back to a working baseline (dense-only, no reranker, etc.)
  • Never block a search_docs call on a single failure

The user's MCP is in front of real people; partial degradation beats a 500.

Verify retrieval changes with the eval harness

Any change that touches retrieval (new embedder, chunker tweak, reranker model, filter shape) ships with eval numbers in the commit message. Don't ship retrieval changes on vibes. If eval/queries.jsonl isn't populated yet, populate it before changing retrieval — it's the most important file in the repo.

Standard infrastructure choices

These are reasoned defaults — only deviate if you have a specific need:

  • Embedding model: nomic-embed-text via Ollama (768-dim, free, on-prem)
  • Reranker: jina-reranker-v2-base GGUF via llama.cpp /v1/rerank endpoint
  • Vector store: Chroma PersistentClient
  • Lexical store: SQLite FTS5 (stdlib)
  • Fusion: Reciprocal Rank Fusion with k=60
  • Transport: streamable-HTTP in prod, stdio for local dev
  • MCP framework: FastMCP with stateless_http=True
  • Container deploy: Watchtower auto-pull on :latest, rollback via :<sha12> pin

Naming the product

The template uses PRODUCT_NAME env var (defaults to "myproduct") throughout. Set it on first build. References show up in:

  • docs_mcp/server.pyFastMCP(f"{PRODUCT_NAME}-docs", ...)
  • Collection name (<product>_docs)
  • BM25 db filename
  • Tool names that include the product name (e.g., the _api_lessons tool — convention is to name it <product>_api_lessons)

Use lowercase, underscores not hyphens, since it ends up in tool identifiers that the LLM reads.

Image name and package linking are repo-name-derived

You do NOT need to edit the IMAGE env or the --package arg in the workflows. Both derive from the repo at runtime via ${{ github.repository_owner }} and ${{ github.event.repository.name }}. So a clone into a repo named my-product-docs automatically pushes the container as <owner>/my-product-docs:latest and links the package to its own repo. (github.* is Gitea Actions' inherited GitHub-Actions namespace — the values come from the Gitea server, no github.com involvement.)

The only workflow placeholders you still have to replace per clone are the registry endpoints (REGISTRY_PUSH, REGISTRY_PULL) and the Ollama URL, because those depend on the deployment environment, not the repo identity.

Common commands

# Set up dev environment
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt

# Run the MCP server locally for Claude Desktop dev
python -m docs_mcp.server --transport stdio

# Run as HTTP for integration testing
python -m docs_mcp.server --transport streamable-http --port 8000

# Rebuild Chroma + BM25 indexes from corpus
python -m rag.index --rebuild

# Rebuild only BM25 (fast iteration)
python -m rag.index --bm25-only

# Run the eval harness
python -m eval.run_eval --queries eval/queries.jsonl --output eval/results/baseline.md

# Generate changelog summary (called by CI, useful locally too)
python -m scrape.changelog --cached
python -m scrape.changelog --history-out corpus/.digest/history.jsonl --history-days 120

Gotchas (carried forward from the reference build)

  • fetch-depth: 0 on actions/checkout@v4 in both workflows. Default is shallow; history-walking steps (changelog, digest) silently produce empty output otherwise. This is the #1 thing people miss.
  • Reranker per-pair token limit: jina-reranker GGUF rejects the ENTIRE batch if any doc exceeds n_ctx_train=1024. Truncate docs to ~2000 chars before sending to rerank. Full chunk text still goes back to the user; truncation is reranking-only.
  • FastMCP stateless_http=True: critical for production hosting behind Watchtower auto-updates. Without it, every container recreate produces a 404 storm from clients with stale session IDs.
  • Runner shell is /bin/sh (dash): no ${VAR::N} substring expansion in workflow scripts. Use cut/awk/printf.
  • Cloudflare 100 MB body cap: if pushing through a Cloudflare- fronted registry, push via LAN endpoint, pull via public hostname. Same registry, different URLs.

When the user says...

User says You do
"Let's start building" / "set up the project" Read PLAN.md Phase 0; create dirs, requirements.txt, etc. Confirm Python version and existing tooling.
"Build the scraper" / "scrape the docs" Read PLAN.md Phase 1. Find the upstream portal's underlying API by sniffing; AVOID headless-browser solutions unless the API path is truly closed.
"Get retrieval working" / "make search work" Read PLAN.md Phase 2-3. Implement chunking, embedder, Chroma indexer, then the three baseline tools.
"Add a reranker" Read PLAN.md Phase 6. Stand up the llama.cpp sidecar, implement _rerank(). Verify with the eval harness.
"Search is missing X queries" Run the eval harness first to confirm the failure. Then consider: rich chunk-0 rewrites, hybrid retrieval, curated knowledge layer. Don't just tune cosine.
"Let's add hybrid search" Read PLAN.md Phase 8. Only after you've established the failure mode with eval queries — hybrid is not free.
"Make a tool that submits doc bugs" Read PLAN.md Phase 12. Find the docs portal's feedback endpoint by sniffing. Build with operator confirmation as a hard requirement in the tool docstring.
"I want a 'what changed' tool" Read PLAN.md Phase 13. Don't try to do this at runtime — pre-bake the history JSONL at CI time.

Out-of-scope concerns (don't try to solve here)

  • Reverse proxy / TLS termination — outside the repo. User picks Caddy / Cloudflare Tunnel / nginx / Traefik based on their infra.
  • MetaMCP or other gateway — outside the repo. Optional, only matters when running multiple MCPs.
  • GPU container orchestration — outside the repo. Pattern is one Ollama container per GPU; the indexer load-balances. Document it in deploy/ but don't build it in this template.
  • Email/blog delivery for weekly_digest — out of scope per PLAN.md ("Out of scope" section). Add a separate script in scripts/ if/when the user asks.