justin dda044eb95 search: BM25-default + cross-encoder rerank, hybrid behind env gate
Phase 3/6/7/8 in one pass since they depend on each other.

* docs_mcp/server.py
  - Wire search_docs / get_page / list_versions tool bodies.
  - search_docs flow: BM25 first (rag.bm25 FTS5) → over-fetch RERANK_POOL
    chunks → POST to RERANK_URL/v1/rerank → return top-k. Dense is the
    fallback when BM25 finds nothing. HYBRID_SEARCH=true switches to
    dense+BM25+RRF (fused via the new _rrf_fuse helper).
  - All retrieval failures are caught and fall back to the next layer,
    so a dead reranker or missing BM25 db never blocks a search.
  - Source URLs built from the bundle's docId so results link straight
    into support.hpe.com.

* eval/
  - 22 hand-curated golden queries grounded in real corpus page titles.
  - DenseRetriever / BM25Retriever / HybridRetriever / RerankedRetriever
    + MRR/Recall@K/nDCG@K harness. RERANK_URL env activates the
    reranked variants.
  - Committed eval/results/baseline.md. On this corpus:
        dense:                MRR 0.539
        bm25:                 MRR 0.880
        hybrid_rrf:           MRR 0.692
        bm25+rerank:          MRR 0.920  (winner)
        hybrid_rrf+rerank:    MRR 0.875
    HPE structured docs use controlled vocabulary, so lexical match
    dominates. Hybrid loses because dense pollutes the fused pool.

* scripts/rerank_server.py
  - Minimal HTTP /v1/rerank over sentence-transformers
    cross-encoder/ms-marco-MiniLM-L-6-v2. Cohere-style request/response.
  - This is the dev/CPU fallback; production replaces it with the
    llama.cpp + jina-reranker-v2-base GGUF sidecar (same wire protocol).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 13:06:51 -04:00

docs-mcp-template

A reusable template for building hosted MCP servers over a product's public documentation. Distilled from one production build; everything product-specific has been factored out.

The end product is a streamable-HTTP MCP server with ~15 tools that any LLM client (Claude Desktop, Claude Code, Cursor, Copilot) can call to answer questions against the docs, surface what changed recently, find inconsistencies, and (optionally) submit doc bugs back upstream.

What's here

  • PLAN.md — comprehensive build guide. Phased approach (13 phases, ~23 weeks of focused work for the full stack). Includes the design decisions, the gotchas, and a per-product customization checklist.
  • Scaffolded skeleton — working FastMCP server with stub tools, Dockerfile, docker-compose, CI workflows, eval harness layout, usage logging. Everything you need to git clone and start filling in the product-specific bits.

Quick start

git clone https://git.jpaul.io/justin/docs-mcp-template.git my-product-docs
cd my-product-docs
git remote remove origin  # detach from template
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt

# Read PLAN.md before doing anything else. Pay particular attention to
# Phase 1 (scraper) — that's the most product-specific phase.

# Run the stub server (no corpus yet — just verifies the wiring):
python -m docs_mcp.server --transport stdio

Repo layout

.
├── PLAN.md                        # The build guide. Read first.
├── README.md
├── requirements.txt
├── Dockerfile
├── .gitignore
├── .gitea/workflows/
│   ├── refresh.yml                # Weekly scrape + index + image push
│   └── image-only.yml             # On-demand code-only ship
├── scrape/
│   ├── README.md                  # Product-specific scraper goes here
│   └── changelog.py               # Reusable: --json, --history-out
├── rag/
│   ├── embeddings.py              # Ollama embedder, swappable
│   ├── chunk.py                   # Chunker — adjust per page format
│   ├── index.py                   # Builds Chroma + (optionally) BM25
│   └── bm25.py                    # SQLite FTS5 lexical index
├── docs_mcp/
│   ├── server.py                  # FastMCP server with stub tools
│   └── usage.py                   # TimedCall + JSONL telemetry
├── eval/
│   ├── queries.jsonl.example      # Curate ~25 hand-labeled queries
│   ├── retrievers.py              # Retriever protocol + implementations
│   └── run_eval.py                # MRR / Recall@k / nDCG@k harness
├── scripts/
│   ├── usage_report.py            # Standalone log analyzer
│   └── registry_gc.py             # Container registry cleanup
└── deploy/
    └── docker-compose.yml         # Hosting stack: MCP + reranker + Watchtower

What's product-specific (must implement)

  • scrape/ — the scraper itself. The template gives you the corpus layout contract and a working changelog.py; the actual extraction logic is yours.
  • The corpus on disk (gitignored; rebuilt by CI).
  • The reranker GGUF model and llama.cpp container (commented in deploy/docker-compose.yml).
  • The reverse proxy / TLS layer in front of the public endpoint.
  • The hand-curated knowledge surface (your product's API gotchas, example scripts, anything the LLM should know that the docs don't say).

What's NOT product-specific (works as-is)

  • FastMCP server skeleton + tool decoration pattern
  • Chroma + Ollama embedding pipeline
  • BM25 / SQLite FTS5 lexical index
  • Hybrid retrieval (RRF) + reranker integration
  • Eval harness (Retriever protocol, MRR/Recall/nDCG)
  • Usage logging (TimedCall, JSONL, daily rotation)
  • CI workflow shape (weekly + on-demand, retry-on-race, three-tag image scheme)
  • Registry GC script
  • Standard tools: search_docs, get_page, list_versions, diff_versions, bundle_changelog, weekly_digest, find_doc_inconsistencies, submit_doc_bug, etc.

License

Internal template. Adjust before publishing.

S
Description
No description provided
Readme 900 KiB
Languages
Python 92.3%
HTML 7%
Dockerfile 0.7%