Files
morpheus-docs/CLAUDE.md
T
justin 9ba615c8ee initial: docs-mcp-template — build guide + scaffolded server
Template for building hosted MCP servers over a product's public
documentation. Distilled from one production build; everything
product-specific has been factored out.

Contents:

- PLAN.md — comprehensive build guide. 13 phases from project
  skeleton through weekly_digest. Includes the gotchas
  ("fetch-depth: 0 always", reranker per-pair token limit,
  Cloudflare body cap, dash-not-bash on Gitea runners), the
  decisions worth carrying forward, and a per-product
  customization checklist.

- CLAUDE.md — guidance for Claude Code working in a clone of this
  template. Phase identification table, conventions (env-gating +
  operator confirmation for side-effecting tools, defensive
  fallback for retrieval components), common commands.

- README.md — quick-start summary.

Scaffolded code (all signature-stable, with NotImplementedError
stubs where phase-specific work is required):

  docs_mcp/server.py    FastMCP server, stateless_http=True, with
                        search_docs / get_page / list_versions
                        baseline tools and commented stubs for the
                        rest of the phase set.
  docs_mcp/usage.py     TimedCall telemetry, JSONL, daily rotation,
                        90-day retention. Reusable as-is.
  rag/embeddings.py     Ollama embedder (nomic-embed-text default),
                        load-balanced across N URLs. Reusable.
  rag/chunk.py          Paragraph-aware chunker with synthetic
                        chunk 0. Per-product tunable.
  rag/index.py          Chroma + BM25 builder. --rebuild and
                        --bm25-only flags.
  rag/bm25.py           SQLite FTS5 lexical index. Reusable.
  scrape/changelog.py   --cached / --ref / --json / --history-out.
                        Reusable.
  scrape/README.md      What you write per-product.
  eval/queries.jsonl.example
                        Curate ~25 hand-labeled queries here.
  eval/retrievers.py    Retriever protocol + stub classes.
  eval/run_eval.py      MRR / Recall@K / nDCG@K harness skeleton.
  scripts/usage_report.py
                        Standalone log analyzer; the
                        FOLLOW-UP CHECKS pattern noted in the
                        module docstring.
  scripts/registry_gc.py
                        Gitea container registry cleanup. Reusable.

Deployment + CI:

  Dockerfile               Python 3.12-slim; COPY corpus + chroma
                           + bm25 last for cache efficiency.
  deploy/docker-compose.yml MCP + reranker sidecar + Watchtower.
                           Templated with <placeholders>.
  .gitea/workflows/refresh.yml    Weekly cron + manual dispatch.
                                  fetch-depth: 0, retry-on-race,
                                  three-tag image scheme.
  .gitea/workflows/image-only.yml Code-only ship cycle, ~18min.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:18:17 -04:00

233 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when
working with code in this repository.
## Purpose
This is a **template** for building an MCP server over a product's
public documentation. When you (Claude) are working in a clone of this
repo, you are helping the user implement one specific product's docs
MCP — not editing the template itself.
**Read `PLAN.md` first.** It's the canonical build guide and lays out
13 phases. Most user requests will be "implement Phase N" or "we hit
a bug in Phase N." Identify the phase before doing anything else.
## Working with this template
### Identifying the current phase
When the user clones this template and starts working, figure out
which phase they're on by inspecting:
| Signal | Likely phase |
|---|---|
| `corpus/` doesn't exist | Phase 1 (scraper) — they need to build it before anything else works |
| `corpus/` exists, `chroma/` doesn't | Phase 2 (indexing) |
| Indexes exist, only `search_docs` / `get_page` / `list_versions` implemented | Phase 3 (server skeleton done; next: Dockerfile + CI) |
| No `Dockerfile` or `.gitea/workflows/` updated | Phase 45 |
| `RERANK_URL` env unset in compose | Phase 6 not done |
| `HYBRID_SEARCH` env unset, no `rag/bm25.py` content | Phase 8 not done |
| No `eval/results/` directory | Phase 7 not done |
| `find_doc_inconsistencies` / `submit_doc_bug` are commented-out stubs in `docs_mcp/server.py` | Phase 12 |
| No `corpus/.digest/` produced by CI | Phase 13 |
When in doubt, ask the user: *"Which phase from PLAN.md are we
working on?"*
### The scaffolded server has stubs
`docs_mcp/server.py` ships with three working tools (`search_docs`,
`get_page`, `list_versions`) and signature-only stubs for the
phase-specific tools. The stubs `raise NotImplementedError` with a
phase hint in the docstring. When implementing a phase, you'll be
filling these bodies in — DO NOT change the signatures unless the
user has a specific reason. Signatures are the public contract
between the MCP and its clients (Claude Desktop, Claude Code,
Cursor, etc.).
## Layout
```
.
├── PLAN.md # Read first. Phase-by-phase build guide.
├── README.md # Quick-start summary.
├── CLAUDE.md # This file.
├── requirements.txt
├── Dockerfile
├── deploy/docker-compose.yml
├── .gitea/workflows/
│ ├── refresh.yml # Weekly cron: scrape + index + image
│ └── image-only.yml # On-demand: code-only ship cycle
├── scrape/ # Phase 1 — product-specific scraper here
│ └── changelog.py # Reusable: --json, --history-out
├── rag/ # Phase 2/8 — indexing
│ ├── embeddings.py # Ollama embedder (swappable)
│ ├── chunk.py # Page → chunks (adjust per page format)
│ ├── index.py # Builds Chroma + BM25
│ └── bm25.py # SQLite FTS5 lexical index
├── docs_mcp/ # Phase 3+ — MCP server
│ ├── server.py # FastMCP + tool definitions
│ └── usage.py # TimedCall telemetry
├── eval/ # Phase 7 — golden-query harness
│ ├── queries.jsonl.example
│ ├── retrievers.py
│ └── run_eval.py
├── scripts/ # Standalone ops scripts
│ ├── usage_report.py
│ └── registry_gc.py
└── deploy/
└── docker-compose.yml
```
## Conventions
### Tool docstrings are user interface
The text in `@mcp.tool()` docstrings is what the LLM sees and uses to
decide whether to call the tool. Treat it like a button label.
*"Use when..."*, *"Call proactively whenever..."* phrasings work
well. Don't bury the headline in implementation notes.
### Side-effecting tools must be env-gated AND operator-confirmed
Any tool that POSTs to an external service (submit_doc_bug being the
canonical example):
1. Must check an env flag at call time and return a "disabled,
manual fallback at <URL>" message if unset.
2. Must have a loud docstring requiring per-call operator
confirmation in the LLM conversation flow (the LLM drafts, shows
the operator the exact payload, asks yes/no, only then calls).
3. Must do upfront validation (URL allowlist, content length, etc.)
so the LLM gets a clean error instead of a wire-level failure.
Match the `submit_doc_bug` patterns documented in PLAN.md Phase 12.
### Defensive fallback for retrieval components
The reranker, BM25 index, and any external dependency must fail
gracefully:
- Catch the specific exception type
- Log a warning with enough info to debug
- Fall back to a working baseline (dense-only, no reranker, etc.)
- Never block a search_docs call on a single failure
The user's MCP is in front of real people; partial degradation
beats a 500.
### Verify retrieval changes with the eval harness
Any change that touches retrieval (new embedder, chunker tweak,
reranker model, filter shape) ships with eval numbers in the commit
message. Don't ship retrieval changes on vibes. If `eval/queries.jsonl`
isn't populated yet, populate it before changing retrieval — it's
the most important file in the repo.
### Standard infrastructure choices
These are reasoned defaults — only deviate if you have a specific
need:
- **Embedding model**: `nomic-embed-text` via Ollama (768-dim,
free, on-prem)
- **Reranker**: `jina-reranker-v2-base` GGUF via llama.cpp
`/v1/rerank` endpoint
- **Vector store**: Chroma `PersistentClient`
- **Lexical store**: SQLite FTS5 (stdlib)
- **Fusion**: Reciprocal Rank Fusion with k=60
- **Transport**: streamable-HTTP in prod, stdio for local dev
- **MCP framework**: FastMCP with `stateless_http=True`
- **Container deploy**: Watchtower auto-pull on `:latest`, rollback
via `:<sha12>` pin
### Naming the product
The template uses `PRODUCT_NAME` env var (defaults to `"myproduct"`)
throughout. Set it on first build. References show up in:
- `docs_mcp/server.py``FastMCP(f"{PRODUCT_NAME}-docs", ...)`
- Collection name (`<product>_docs`)
- BM25 db filename
- Tool names that include the product name (e.g., the `_api_lessons`
tool — convention is to name it `<product>_api_lessons`)
Use lowercase, underscores not hyphens, since it ends up in tool
identifiers that the LLM reads.
## Common commands
```bash
# Set up dev environment
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
# Run the MCP server locally for Claude Desktop dev
python -m docs_mcp.server --transport stdio
# Run as HTTP for integration testing
python -m docs_mcp.server --transport streamable-http --port 8000
# Rebuild Chroma + BM25 indexes from corpus
python -m rag.index --rebuild
# Rebuild only BM25 (fast iteration)
python -m rag.index --bm25-only
# Run the eval harness
python -m eval.run_eval --queries eval/queries.jsonl --output eval/results/baseline.md
# Generate changelog summary (called by CI, useful locally too)
python -m scrape.changelog --cached
python -m scrape.changelog --history-out corpus/.digest/history.jsonl --history-days 120
```
## Gotchas (carried forward from the reference build)
- **`fetch-depth: 0` on `actions/checkout@v4`** in both workflows.
Default is shallow; history-walking steps (changelog, digest)
silently produce empty output otherwise. This is the #1 thing
people miss.
- **Reranker per-pair token limit**: jina-reranker GGUF rejects the
ENTIRE batch if any doc exceeds `n_ctx_train=1024`. Truncate docs
to ~2000 chars before sending to rerank. Full chunk text still
goes back to the user; truncation is reranking-only.
- **FastMCP `stateless_http=True`**: critical for production
hosting behind Watchtower auto-updates. Without it, every
container recreate produces a 404 storm from clients with
stale session IDs.
- **Runner shell is `/bin/sh` (dash)**: no `${VAR::N}` substring
expansion in workflow scripts. Use `cut`/`awk`/`printf`.
- **Cloudflare 100 MB body cap**: if pushing through a Cloudflare-
fronted registry, push via LAN endpoint, pull via public
hostname. Same registry, different URLs.
## When the user says...
| User says | You do |
|---|---|
| "Let's start building" / "set up the project" | Read PLAN.md Phase 0; create dirs, requirements.txt, etc. Confirm Python version and existing tooling. |
| "Build the scraper" / "scrape the docs" | Read PLAN.md Phase 1. Find the upstream portal's underlying API by sniffing; AVOID headless-browser solutions unless the API path is truly closed. |
| "Get retrieval working" / "make search work" | Read PLAN.md Phase 2-3. Implement chunking, embedder, Chroma indexer, then the three baseline tools. |
| "Add a reranker" | Read PLAN.md Phase 6. Stand up the llama.cpp sidecar, implement `_rerank()`. Verify with the eval harness. |
| "Search is missing X queries" | Run the eval harness first to confirm the failure. Then consider: rich chunk-0 rewrites, hybrid retrieval, curated knowledge layer. Don't just tune cosine. |
| "Let's add hybrid search" | Read PLAN.md Phase 8. Only after you've established the failure mode with eval queries — hybrid is not free. |
| "Make a tool that submits doc bugs" | Read PLAN.md Phase 12. Find the docs portal's feedback endpoint by sniffing. Build with operator confirmation as a hard requirement in the tool docstring. |
| "I want a 'what changed' tool" | Read PLAN.md Phase 13. Don't try to do this at runtime — pre-bake the history JSONL at CI time. |
## Out-of-scope concerns (don't try to solve here)
- **Reverse proxy / TLS termination** — outside the repo. User
picks Caddy / Cloudflare Tunnel / nginx / Traefik based on their
infra.
- **MetaMCP or other gateway** — outside the repo. Optional, only
matters when running multiple MCPs.
- **GPU container orchestration** — outside the repo. Pattern is
one Ollama container per GPU; the indexer load-balances. Document
it in deploy/ but don't build it in this template.
- **Email/blog delivery for weekly_digest** — out of scope per
PLAN.md ("Out of scope" section). Add a separate script in
scripts/ if/when the user asks.