docs: replace template README with HVM-specific content #4

Merged
justin merged 1 commits from docs/hvm-readme into main 2026-05-22 14:04:16 -04:00
+152 -75
View File
@@ -1,104 +1,181 @@
# docs-mcp-template
# hvm-docs
A reusable template for building hosted MCP servers over a product's
public documentation. Distilled from one production build; everything
product-specific has been factored out.
A hosted MCP server over the public documentation for **HPE Morpheus
VM Essentials Software** (HVM) — the KVM-based hypervisor platform
from HPE. Lets any MCP-aware client (Claude Desktop, Claude Code,
Cursor, Copilot, MetaMCP) answer questions against the User Manual,
Release Notes, and Deployment Guide; diff pages across 8.1.x
versions; surface what changed recently; and (when enabled) submit
documentation bugs back to HPE.
The end product is a streamable-HTTP MCP server with ~15 tools that
any LLM client (Claude Desktop, Claude Code, Cursor, Copilot) can
call to answer questions against the docs, surface what changed
recently, find inconsistencies, and (optionally) submit doc bugs
back upstream.
Live behind MetaMCP at `https://mcp.jpaul.io/metamcp/hvm-docs/mcp`
once deployed.
## What's here
## Tools
- **[PLAN.md](PLAN.md)** — comprehensive build guide. Phased
approach (13 phases, ~23 weeks of focused work for the full
stack). Includes the design decisions, the gotchas, and a
per-product customization checklist.
- **Scaffolded skeleton** — working FastMCP server with stub tools,
Dockerfile, docker-compose, CI workflows, eval harness layout,
usage logging. Everything you need to `git clone` and start
filling in the product-specific bits.
11 tools, registered over MCP streamable-HTTP:
## Quick start
| Tool | Use |
|---|---|
| `search_docs` | BM25-default search with optional version / platform / bundle filters; cross-encoder reranked when `RERANK_URL` is set |
| `get_page` | Full markdown of one page with metadata header + source URL |
| `list_versions` | Discover available versions, doc types, and bundle slugs |
| `list_cluster` | Cross-version peers of a page (synthesized from same-GUID overlap) |
| `diff_versions` | Unified diff of one topic between two bundles |
| `bundle_changelog` | Added / removed / churn-ranked changed pages between two bundles |
| `weekly_digest` | "What changed in the docs in the last N days" — reads CI-baked history.jsonl |
| `corpus_status` | Image build time, upstream Published date, total bundles/pages/chunks |
| `hvm_api_lessons` | Curated operator gotchas (manager sizing, upgrade ordering, plugin/worker compat, console keyboards, backups setup) |
| `find_doc_inconsistencies` | Scoped scan for cross-version drift + redirect-chain stub pages |
| `submit_doc_bug` | Env-gated draft → confirm → submit workflow to HPE's docs feedback (endpoint TBD; currently refuses with manual-fallback) |
## Corpus
Confirmed bundles (scraped 2026-05-22 from HPE Support DocPortal):
| Bundle | docId | Pages |
|---|---|---|
| `hvm_user_manual_8_1_0` | `sd00007520en_us` | 374 |
| `hvm_user_manual_8_1_1` | `sd00007620en_us` | 376 |
| `hvm_user_manual_8_1_2` | `sd00007735en_us` | 376 |
| `hvm_release_notes_8_1_0` | `sd00007497en_us` | 1 |
| `hvm_release_notes_8_1_1` | `sd00007609en_us` | 1 |
| `hvm_release_notes_8_1_2` | `sd00007734en_us` | 1 |
| `hvm_deployment_guide` | `sd00007332en_us` | 32 |
Total: ~1,161 pages → 2,650 chunks in Chroma + same chunks indexed in
SQLite FTS5 (BM25).
GUIDs are stable across HVM versions, so `topic_cluster` cross-version
peer mapping is free (no fuzzy matching needed).
## Retrieval
Eval against 22 hand-curated golden queries — see
[`eval/results/baseline.md`](eval/results/baseline.md):
| Retriever | MRR | Recall@5 | nDCG@5 | latency |
|---|---:|---:|---:|---:|
| dense (Ollama nomic-embed-text) | 0.539 | 0.621 | 0.558 | 88 ms |
| BM25 (SQLite FTS5) | 0.880 | 0.909 | 0.883 | 3 ms |
| hybrid (dense + BM25 + RRF) | 0.692 | 0.818 | 0.713 | 69 ms |
| **bm25 + jina-rerank** | **0.920** | **0.939** | **0.927** | 490 ms (CPU) / ~50 ms (GPU) |
HPE docs use controlled vocabulary, so lexical match dominates; the
cross-encoder cleans up the long tail. See PLAN.md Phase 7/8 for the
reasoning.
## Architecture
```
HPE Support DocPortal (sniff-the-API, no auth)
scrape/ ──► corpus/<bundle>/<GUID>.{md,json} (committed)
rag/index ──► chroma/ (dense, 768-dim nomic-embed-text)
──► bm25/ (SQLite FTS5)
docs_mcp.server (FastMCP, streamable-HTTP)
├── BM25 → reranker (jina-reranker-v2-base GGUF, GPU sidecar)
deploy/docker-compose.yml
├── MetaMCP gateway ── public at mcp.jpaul.io behind Cloudflare Tunnel
├── jina-rerank ── shared GPU sidecar (1080 Ti)
└── Watchtower ── auto-pulls :latest on weekly refresh
```
## CI (Gitea Actions on `git.jpaul.io`)
Two cadences:
- **`refresh.yml`** — weekly Monday 06:00 UTC cron + manual dispatch.
Re-scrapes upstream, commits corpus diffs, rebuilds Chroma + BM25,
builds & pushes image. ~58 min on the GPU pool.
- **`image-only.yml`** — manual dispatch. Skips scrape; rebuilds
indexes from committed corpus and ships a new image. ~3 min.
Image: `git.jpaul.io/justin/hvm-docs:latest` (Watchtower target),
plus rolling `:<sha7>` and `:YYYY.MM.DD` tags.
Embeddings fan out across the two GPU-pinned Ollama containers on
the Gitea host (`192.168.0.2:11435` Titan X, `:11436` 1080 Ti) — same
infra zerto-docs uses; see `OLLAMA_URLS` in both workflows.
## Local dev
```bash
git clone https://git.jpaul.io/justin/docs-mcp-template.git my-product-docs
cd my-product-docs
git remote remove origin # detach from template
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
# Read PLAN.md before doing anything else. Pay particular attention to
# Phase 1 (scraper) — that's the most product-specific phase.
# (Optional) the CPU dev reranker — pulls PyTorch (~2 GB); skip if
# you'll just be running stdio queries.
pip install -r requirements-rerank.txt
# Run the stub server (no corpus yet — just verifies the wiring):
# Build / refresh the corpus + indexes
python -m scrape.bundles
python -m scrape.runner --all --force --concurrency 6
python -m rag.index --rebuild
# Local stdio server (Claude Desktop dev)
python -m docs_mcp.server --transport stdio
# Local streamable-HTTP for integration testing
python -m docs_mcp.server --transport streamable-http --port 8000
# Run the eval harness (without reranker)
python -m eval.run_eval --k 5
# With the dev reranker
python -m scripts.rerank_server &
RERANK_URL=http://127.0.0.1:8001 python -m eval.run_eval --k 5
```
## Repo layout
```
.
├── PLAN.md # The build guide. Read first.
├── README.md
├── requirements.txt
├── PLAN.md # 13-phase build guide (template-shared)
├── CLAUDE.md # Claude Code guidance
├── README.md # this file
├── Dockerfile
├── .gitignore
├── .gitea/workflows/
│ ├── refresh.yml # Weekly scrape + index + image push
│ └── image-only.yml # On-demand code-only ship
├── requirements.txt # production deps
├── requirements-rerank.txt # dev CPU reranker only
├── bundles.json # bundle catalog (committed)
├── corpus/ # 1,161 scraped pages (committed)
├── .gitea/workflows/ # refresh.yml + image-only.yml
├── scrape/
│ ├── README.md # Product-specific scraper goes here
── changelog.py # Reusable: --json, --history-out
│ ├── bundles.py # HVM bundle catalog + discovery
── runner.py # TOC + single-doc page scraper
│ └── changelog.py # git-history → digest JSONL
├── rag/
│ ├── embeddings.py # Ollama embedder, swappable
│ ├── chunk.py # Chunker — adjust per page format
│ ├── index.py # Builds Chroma + (optionally) BM25
│ └── bm25.py # SQLite FTS5 lexical index
│ ├── chunk.py # paragraph-aware splitter w/ 6 KB hard cap
│ ├── embeddings.py # OLLAMA_URLS (zerto-style fan-out)
│ ├── index.py # builds Chroma + BM25
│ └── bm25.py # FTS5 lexical index
├── docs_mcp/
│ ├── server.py # FastMCP server with stub tools
── usage.py # TimedCall + JSONL telemetry
│ ├── server.py # FastMCP + 11 tools
── usage.py # TimedCall JSONL telemetry
│ └── api_lessons.md # curated HVM operator gotchas
├── eval/
│ ├── queries.jsonl.example # Curate ~25 hand-labeled queries
│ ├── retrievers.py # Retriever protocol + implementations
── run_eval.py # MRR / Recall@k / nDCG@k harness
│ ├── queries.jsonl # 22 hand-curated golden queries
│ ├── retrievers.py # Dense/BM25/Hybrid/Reranked
── run_eval.py # MRR / Recall@K / nDCG@K
│ └── results/baseline.md # committed eval results
├── scripts/
│ ├── usage_report.py # Standalone log analyzer
── registry_gc.py # Container registry cleanup
│ ├── rerank_server.py # dev/CPU cross-encoder /v1/rerank
── usage_report.py # log summarizer
│ └── registry_gc.py # Gitea container-registry cleanup
└── deploy/
└── docker-compose.yml # Hosting stack: MCP + reranker + Watchtower
└── docker-compose.yml # production hosting (MCP + reranker + Watchtower)
```
## What's product-specific (must implement)
- `scrape/` — the scraper itself. The template gives you the corpus
layout contract and a working `changelog.py`; the actual extraction
logic is yours.
- The corpus on disk (gitignored; rebuilt by CI).
- The reranker GGUF model and llama.cpp container (commented in
`deploy/docker-compose.yml`).
- The reverse proxy / TLS layer in front of the public endpoint.
- The hand-curated knowledge surface (your product's API gotchas,
example scripts, anything the LLM should know that the docs
don't say).
## What's NOT product-specific (works as-is)
- FastMCP server skeleton + tool decoration pattern
- Chroma + Ollama embedding pipeline
- BM25 / SQLite FTS5 lexical index
- Hybrid retrieval (RRF) + reranker integration
- Eval harness (Retriever protocol, MRR/Recall/nDCG)
- Usage logging (TimedCall, JSONL, daily rotation)
- CI workflow shape (weekly + on-demand, retry-on-race, three-tag
image scheme)
- Registry GC script
- Standard tools: `search_docs`, `get_page`, `list_versions`,
`diff_versions`, `bundle_changelog`, `weekly_digest`,
`find_doc_inconsistencies`, `submit_doc_bug`, etc.
## License
Internal template. Adjust before publishing.
Internal — HVM is HPE's product; the docs MCP is a side project, not
HPE-sanctioned.