docs: replace template README with HVM-specific content #4

Merged
justin merged 1 commits from docs/hvm-readme into main 2026-05-22 14:04:16 -04:00
+152 -75
View File
@@ -1,104 +1,181 @@
# docs-mcp-template # hvm-docs
A reusable template for building hosted MCP servers over a product's A hosted MCP server over the public documentation for **HPE Morpheus
public documentation. Distilled from one production build; everything VM Essentials Software** (HVM) — the KVM-based hypervisor platform
product-specific has been factored out. from HPE. Lets any MCP-aware client (Claude Desktop, Claude Code,
Cursor, Copilot, MetaMCP) answer questions against the User Manual,
Release Notes, and Deployment Guide; diff pages across 8.1.x
versions; surface what changed recently; and (when enabled) submit
documentation bugs back to HPE.
The end product is a streamable-HTTP MCP server with ~15 tools that Live behind MetaMCP at `https://mcp.jpaul.io/metamcp/hvm-docs/mcp`
any LLM client (Claude Desktop, Claude Code, Cursor, Copilot) can once deployed.
call to answer questions against the docs, surface what changed
recently, find inconsistencies, and (optionally) submit doc bugs
back upstream.
## What's here ## Tools
- **[PLAN.md](PLAN.md)** — comprehensive build guide. Phased 11 tools, registered over MCP streamable-HTTP:
approach (13 phases, ~23 weeks of focused work for the full
stack). Includes the design decisions, the gotchas, and a
per-product customization checklist.
- **Scaffolded skeleton** — working FastMCP server with stub tools,
Dockerfile, docker-compose, CI workflows, eval harness layout,
usage logging. Everything you need to `git clone` and start
filling in the product-specific bits.
## Quick start | Tool | Use |
|---|---|
| `search_docs` | BM25-default search with optional version / platform / bundle filters; cross-encoder reranked when `RERANK_URL` is set |
| `get_page` | Full markdown of one page with metadata header + source URL |
| `list_versions` | Discover available versions, doc types, and bundle slugs |
| `list_cluster` | Cross-version peers of a page (synthesized from same-GUID overlap) |
| `diff_versions` | Unified diff of one topic between two bundles |
| `bundle_changelog` | Added / removed / churn-ranked changed pages between two bundles |
| `weekly_digest` | "What changed in the docs in the last N days" — reads CI-baked history.jsonl |
| `corpus_status` | Image build time, upstream Published date, total bundles/pages/chunks |
| `hvm_api_lessons` | Curated operator gotchas (manager sizing, upgrade ordering, plugin/worker compat, console keyboards, backups setup) |
| `find_doc_inconsistencies` | Scoped scan for cross-version drift + redirect-chain stub pages |
| `submit_doc_bug` | Env-gated draft → confirm → submit workflow to HPE's docs feedback (endpoint TBD; currently refuses with manual-fallback) |
## Corpus
Confirmed bundles (scraped 2026-05-22 from HPE Support DocPortal):
| Bundle | docId | Pages |
|---|---|---|
| `hvm_user_manual_8_1_0` | `sd00007520en_us` | 374 |
| `hvm_user_manual_8_1_1` | `sd00007620en_us` | 376 |
| `hvm_user_manual_8_1_2` | `sd00007735en_us` | 376 |
| `hvm_release_notes_8_1_0` | `sd00007497en_us` | 1 |
| `hvm_release_notes_8_1_1` | `sd00007609en_us` | 1 |
| `hvm_release_notes_8_1_2` | `sd00007734en_us` | 1 |
| `hvm_deployment_guide` | `sd00007332en_us` | 32 |
Total: ~1,161 pages → 2,650 chunks in Chroma + same chunks indexed in
SQLite FTS5 (BM25).
GUIDs are stable across HVM versions, so `topic_cluster` cross-version
peer mapping is free (no fuzzy matching needed).
## Retrieval
Eval against 22 hand-curated golden queries — see
[`eval/results/baseline.md`](eval/results/baseline.md):
| Retriever | MRR | Recall@5 | nDCG@5 | latency |
|---|---:|---:|---:|---:|
| dense (Ollama nomic-embed-text) | 0.539 | 0.621 | 0.558 | 88 ms |
| BM25 (SQLite FTS5) | 0.880 | 0.909 | 0.883 | 3 ms |
| hybrid (dense + BM25 + RRF) | 0.692 | 0.818 | 0.713 | 69 ms |
| **bm25 + jina-rerank** | **0.920** | **0.939** | **0.927** | 490 ms (CPU) / ~50 ms (GPU) |
HPE docs use controlled vocabulary, so lexical match dominates; the
cross-encoder cleans up the long tail. See PLAN.md Phase 7/8 for the
reasoning.
## Architecture
```
HPE Support DocPortal (sniff-the-API, no auth)
scrape/ ──► corpus/<bundle>/<GUID>.{md,json} (committed)
rag/index ──► chroma/ (dense, 768-dim nomic-embed-text)
──► bm25/ (SQLite FTS5)
docs_mcp.server (FastMCP, streamable-HTTP)
├── BM25 → reranker (jina-reranker-v2-base GGUF, GPU sidecar)
deploy/docker-compose.yml
├── MetaMCP gateway ── public at mcp.jpaul.io behind Cloudflare Tunnel
├── jina-rerank ── shared GPU sidecar (1080 Ti)
└── Watchtower ── auto-pulls :latest on weekly refresh
```
## CI (Gitea Actions on `git.jpaul.io`)
Two cadences:
- **`refresh.yml`** — weekly Monday 06:00 UTC cron + manual dispatch.
Re-scrapes upstream, commits corpus diffs, rebuilds Chroma + BM25,
builds & pushes image. ~58 min on the GPU pool.
- **`image-only.yml`** — manual dispatch. Skips scrape; rebuilds
indexes from committed corpus and ships a new image. ~3 min.
Image: `git.jpaul.io/justin/hvm-docs:latest` (Watchtower target),
plus rolling `:<sha7>` and `:YYYY.MM.DD` tags.
Embeddings fan out across the two GPU-pinned Ollama containers on
the Gitea host (`192.168.0.2:11435` Titan X, `:11436` 1080 Ti) — same
infra zerto-docs uses; see `OLLAMA_URLS` in both workflows.
## Local dev
```bash ```bash
git clone https://git.jpaul.io/justin/docs-mcp-template.git my-product-docs
cd my-product-docs
git remote remove origin # detach from template
python -m venv venv && source venv/bin/activate python -m venv venv && source venv/bin/activate
pip install -r requirements.txt pip install -r requirements.txt
# Read PLAN.md before doing anything else. Pay particular attention to # (Optional) the CPU dev reranker — pulls PyTorch (~2 GB); skip if
# Phase 1 (scraper) — that's the most product-specific phase. # you'll just be running stdio queries.
pip install -r requirements-rerank.txt
# Run the stub server (no corpus yet — just verifies the wiring): # Build / refresh the corpus + indexes
python -m scrape.bundles
python -m scrape.runner --all --force --concurrency 6
python -m rag.index --rebuild
# Local stdio server (Claude Desktop dev)
python -m docs_mcp.server --transport stdio python -m docs_mcp.server --transport stdio
# Local streamable-HTTP for integration testing
python -m docs_mcp.server --transport streamable-http --port 8000
# Run the eval harness (without reranker)
python -m eval.run_eval --k 5
# With the dev reranker
python -m scripts.rerank_server &
RERANK_URL=http://127.0.0.1:8001 python -m eval.run_eval --k 5
``` ```
## Repo layout ## Repo layout
``` ```
. .
├── PLAN.md # The build guide. Read first. ├── PLAN.md # 13-phase build guide (template-shared)
├── README.md ├── CLAUDE.md # Claude Code guidance
├── requirements.txt ├── README.md # this file
├── Dockerfile ├── Dockerfile
├── .gitignore ├── requirements.txt # production deps
├── .gitea/workflows/ ├── requirements-rerank.txt # dev CPU reranker only
│ ├── refresh.yml # Weekly scrape + index + image push ├── bundles.json # bundle catalog (committed)
│ └── image-only.yml # On-demand code-only ship ├── corpus/ # 1,161 scraped pages (committed)
├── .gitea/workflows/ # refresh.yml + image-only.yml
├── scrape/ ├── scrape/
│ ├── README.md # Product-specific scraper goes here │ ├── bundles.py # HVM bundle catalog + discovery
── changelog.py # Reusable: --json, --history-out ── runner.py # TOC + single-doc page scraper
│ └── changelog.py # git-history → digest JSONL
├── rag/ ├── rag/
│ ├── embeddings.py # Ollama embedder, swappable │ ├── chunk.py # paragraph-aware splitter w/ 6 KB hard cap
│ ├── chunk.py # Chunker — adjust per page format │ ├── embeddings.py # OLLAMA_URLS (zerto-style fan-out)
│ ├── index.py # Builds Chroma + (optionally) BM25 │ ├── index.py # builds Chroma + BM25
│ └── bm25.py # SQLite FTS5 lexical index │ └── bm25.py # FTS5 lexical index
├── docs_mcp/ ├── docs_mcp/
│ ├── server.py # FastMCP server with stub tools │ ├── server.py # FastMCP + 11 tools
── usage.py # TimedCall + JSONL telemetry ── usage.py # TimedCall JSONL telemetry
│ └── api_lessons.md # curated HVM operator gotchas
├── eval/ ├── eval/
│ ├── queries.jsonl.example # Curate ~25 hand-labeled queries │ ├── queries.jsonl # 22 hand-curated golden queries
│ ├── retrievers.py # Retriever protocol + implementations │ ├── retrievers.py # Dense/BM25/Hybrid/Reranked
── run_eval.py # MRR / Recall@k / nDCG@k harness ── run_eval.py # MRR / Recall@K / nDCG@K
│ └── results/baseline.md # committed eval results
├── scripts/ ├── scripts/
│ ├── usage_report.py # Standalone log analyzer │ ├── rerank_server.py # dev/CPU cross-encoder /v1/rerank
── registry_gc.py # Container registry cleanup ── usage_report.py # log summarizer
│ └── registry_gc.py # Gitea container-registry cleanup
└── deploy/ └── deploy/
└── docker-compose.yml # Hosting stack: MCP + reranker + Watchtower └── docker-compose.yml # production hosting (MCP + reranker + Watchtower)
``` ```
## What's product-specific (must implement)
- `scrape/` — the scraper itself. The template gives you the corpus
layout contract and a working `changelog.py`; the actual extraction
logic is yours.
- The corpus on disk (gitignored; rebuilt by CI).
- The reranker GGUF model and llama.cpp container (commented in
`deploy/docker-compose.yml`).
- The reverse proxy / TLS layer in front of the public endpoint.
- The hand-curated knowledge surface (your product's API gotchas,
example scripts, anything the LLM should know that the docs
don't say).
## What's NOT product-specific (works as-is)
- FastMCP server skeleton + tool decoration pattern
- Chroma + Ollama embedding pipeline
- BM25 / SQLite FTS5 lexical index
- Hybrid retrieval (RRF) + reranker integration
- Eval harness (Retriever protocol, MRR/Recall/nDCG)
- Usage logging (TimedCall, JSONL, daily rotation)
- CI workflow shape (weekly + on-demand, retry-on-race, three-tag
image scheme)
- Registry GC script
- Standard tools: `search_docs`, `get_page`, `list_versions`,
`diff_versions`, `bundle_changelog`, `weekly_digest`,
`find_doc_inconsistencies`, `submit_doc_bug`, etc.
## License ## License
Internal template. Adjust before publishing. Internal — HVM is HPE's product; the docs MCP is a side project, not
HPE-sanctioned.