8766d73327f0c1492e02fc710db1722b96cb3485
Image pushed to git.jpaul.io/justin/crop-chem-docs with three tags:
:latest — Watchtower auto-pull target
:a97107de4636 — commit-sha rollback pin
:corpus-2026.05.24 — corpus-snapshot pin (prod-recommended)
Drawbar compose snippet at deploy/drawbar-compose-snippet.md.
Wires the container against the existing infra:
- Ollama pool: 192.168.0.2:11434, 192.168.0.2:11435,
192.168.0.125:11434, 10.10.1.65:11434
- Reranker: http://10.10.1.65:8082
- HYBRID_SEARCH=true (production retrieval — BM25 + dense + rerank)
- Exposes streamable-HTTP MCP on port 8000
Pull path uses git.jpaul.io (public hostname, CF-fronted; pull
response bodies aren't capped). Push path uses 192.168.0.2:1234
(LAN endpoint, bypasses CF 100MB body cap). Same registry,
different URLs — per the template gotcha doc.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
docs-mcp-template
A reusable template for building hosted MCP servers over a product's public documentation. Distilled from one production build; everything product-specific has been factored out.
The end product is a streamable-HTTP MCP server with ~15 tools that any LLM client (Claude Desktop, Claude Code, Cursor, Copilot) can call to answer questions against the docs, surface what changed recently, and flag likely inconsistencies.
What's here
- PLAN.md — comprehensive build guide. Phased approach (13 phases, ~2–3 weeks of focused work for the full stack). Includes the design decisions, the gotchas, and a per-product customization checklist.
- Scaffolded skeleton — working FastMCP server with stub tools,
Dockerfile, docker-compose, CI workflows, eval harness layout,
usage logging. Everything you need to
git cloneand start filling in the product-specific bits.
Quick start
git clone https://git.jpaul.io/justin/docs-mcp-template.git my-product-docs
cd my-product-docs
git remote remove origin # detach from template
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
# Read PLAN.md before doing anything else. Pay particular attention to
# Phase 1 (scraper) — that's the most product-specific phase.
# Run the stub server (no corpus yet — just verifies the wiring):
python -m docs_mcp.server --transport stdio
Repo layout
.
├── PLAN.md # The build guide. Read first.
├── README.md
├── requirements.txt
├── Dockerfile
├── .gitignore
├── .gitea/workflows/
│ ├── refresh.yml # Weekly scrape + index + image push
│ └── image-only.yml # On-demand code-only ship
├── scrape/
│ ├── README.md # Product-specific scraper goes here
│ └── changelog.py # Reusable: --json, --history-out
├── rag/
│ ├── embeddings.py # Ollama embedder, swappable
│ ├── chunk.py # Chunker — adjust per page format
│ ├── index.py # Builds Chroma + (optionally) BM25
│ └── bm25.py # SQLite FTS5 lexical index
├── docs_mcp/
│ ├── server.py # FastMCP server with stub tools
│ └── usage.py # TimedCall + JSONL telemetry
├── eval/
│ ├── queries.jsonl.example # Curate ~25 hand-labeled queries
│ ├── retrievers.py # Retriever protocol + implementations
│ └── run_eval.py # MRR / Recall@k / nDCG@k harness
├── scripts/
│ ├── usage_report.py # Standalone log analyzer
│ └── registry_gc.py # Container registry cleanup
└── deploy/
└── docker-compose.yml # Hosting stack: MCP + reranker + Watchtower
What's product-specific (must implement)
scrape/— the scraper itself. The template gives you the corpus layout contract and a workingchangelog.py; the actual extraction logic is yours.- The corpus on disk (gitignored; rebuilt by CI).
- The reranker GGUF model and llama.cpp container (commented in
deploy/docker-compose.yml). - The reverse proxy / TLS layer in front of the public endpoint.
- The hand-curated knowledge surface (your product's API gotchas, example scripts, anything the LLM should know that the docs don't say).
What's NOT product-specific (works as-is)
- FastMCP server skeleton + tool decoration pattern
- Chroma + Ollama embedding pipeline
- BM25 / SQLite FTS5 lexical index
- Hybrid retrieval (RRF) + reranker integration
- Eval harness (Retriever protocol, MRR/Recall/nDCG)
- Usage logging (TimedCall, JSONL, daily rotation)
- CI workflow shape (weekly + on-demand, retry-on-race, three-tag image scheme)
- Registry GC script
- Standard tools:
search_docs,get_page,list_versions,diff_versions,bundle_changelog,weekly_digest,find_doc_inconsistencies, etc.
License
Internal template. Adjust before publishing.
Description
MCP server over US row-crop pesticide labels (EPA PPLS + manufacturer sites). Feeds Drawbar farmer advisor.
Languages
Python
98.8%
Dockerfile
1.2%