2026-05-22 14:04:16 -04:00
1 changed files with 152 additions and 75 deletions
@@ -1,104 +1,181 @@
-# docs-mcp-template
+# hvm-docs

-A reusable template for building hosted MCP servers over a product's
-public documentation. Distilled from one production build; everything
-product-specific has been factored out.
+A hosted MCP server over the public documentation for **HPE Morpheus
+VM Essentials Software** (HVM) — the KVM-based hypervisor platform
+from HPE. Lets any MCP-aware client (Claude Desktop, Claude Code,
+Cursor, Copilot, MetaMCP) answer questions against the User Manual,
+Release Notes, and Deployment Guide; diff pages across 8.1.x
+versions; surface what changed recently; and (when enabled) submit
+documentation bugs back to HPE.

-The end product is a streamable-HTTP MCP server with ~15 tools that
-any LLM client (Claude Desktop, Claude Code, Cursor, Copilot) can
-call to answer questions against the docs, surface what changed
-recently, find inconsistencies, and (optionally) submit doc bugs
-back upstream.
+Live behind MetaMCP at `https://mcp.jpaul.io/metamcp/hvm-docs/mcp`
+once deployed.

-## What's here
+## Tools

- **[PLAN.md](PLAN.md)** — comprehensive build guide. Phased
-  approach (13 phases, ~2–3 weeks of focused work for the full
-  stack). Includes the design decisions, the gotchas, and a
-  per-product customization checklist.
- **Scaffolded skeleton** — working FastMCP server with stub tools,
-  Dockerfile, docker-compose, CI workflows, eval harness layout,
-  usage logging. Everything you need to `git clone` and start
-  filling in the product-specific bits.
+11 tools, registered over MCP streamable-HTTP:

-## Quick start
+| Tool | Use |
+|---|---|
+| `search_docs` | BM25-default search with optional version / platform / bundle filters; cross-encoder reranked when `RERANK_URL` is set |
+| `get_page` | Full markdown of one page with metadata header + source URL |
+| `list_versions` | Discover available versions, doc types, and bundle slugs |
+| `list_cluster` | Cross-version peers of a page (synthesized from same-GUID overlap) |
+| `diff_versions` | Unified diff of one topic between two bundles |
+| `bundle_changelog` | Added / removed / churn-ranked changed pages between two bundles |
+| `weekly_digest` | "What changed in the docs in the last N days" — reads CI-baked history.jsonl |
+| `corpus_status` | Image build time, upstream Published date, total bundles/pages/chunks |
+| `hvm_api_lessons` | Curated operator gotchas (manager sizing, upgrade ordering, plugin/worker compat, console keyboards, backups setup) |
+| `find_doc_inconsistencies` | Scoped scan for cross-version drift + redirect-chain stub pages |
+| `submit_doc_bug` | Env-gated draft → confirm → submit workflow to HPE's docs feedback (endpoint TBD; currently refuses with manual-fallback) |
+
+## Corpus
+
+Confirmed bundles (scraped 2026-05-22 from HPE Support DocPortal):
+
+| Bundle | docId | Pages |
+|---|---|---|
+| `hvm_user_manual_8_1_0` | `sd00007520en_us` | 374 |
+| `hvm_user_manual_8_1_1` | `sd00007620en_us` | 376 |
+| `hvm_user_manual_8_1_2` | `sd00007735en_us` | 376 |
+| `hvm_release_notes_8_1_0` | `sd00007497en_us` | 1 |
+| `hvm_release_notes_8_1_1` | `sd00007609en_us` | 1 |
+| `hvm_release_notes_8_1_2` | `sd00007734en_us` | 1 |
+| `hvm_deployment_guide` | `sd00007332en_us` | 32 |
+
+Total: ~1,161 pages → 2,650 chunks in Chroma + same chunks indexed in
+SQLite FTS5 (BM25).
+
+GUIDs are stable across HVM versions, so `topic_cluster` cross-version
+peer mapping is free (no fuzzy matching needed).
+
+## Retrieval
+
+Eval against 22 hand-curated golden queries — see
+[`eval/results/baseline.md`](eval/results/baseline.md):
+
+| Retriever | MRR | Recall@5 | nDCG@5 | latency |
+|---|---:|---:|---:|---:|
+| dense (Ollama nomic-embed-text) | 0.539 | 0.621 | 0.558 | 88 ms |
+| BM25 (SQLite FTS5) | 0.880 | 0.909 | 0.883 | 3 ms |
+| hybrid (dense + BM25 + RRF) | 0.692 | 0.818 | 0.713 | 69 ms |
+| **bm25 + jina-rerank** | **0.920** | **0.939** | **0.927** | 490 ms (CPU) / ~50 ms (GPU) |
+
+HPE docs use controlled vocabulary, so lexical match dominates; the
+cross-encoder cleans up the long tail. See PLAN.md Phase 7/8 for the
+reasoning.
+
+## Architecture
+
+```
+HPE Support DocPortal (sniff-the-API, no auth)
+        │
+        ▼
+   scrape/        ──► corpus/<bundle>/<GUID>.{md,json}  (committed)
+        │
+        ▼
+   rag/index      ──► chroma/  (dense, 768-dim nomic-embed-text)
+                  ──► bm25/    (SQLite FTS5)
+        │
+        ▼
+   docs_mcp.server (FastMCP, streamable-HTTP)
+        │
+        ├── BM25 → reranker (jina-reranker-v2-base GGUF, GPU sidecar)
+        │
+        ▼
+   deploy/docker-compose.yml
+        │
+        ├── MetaMCP gateway   ── public at mcp.jpaul.io behind Cloudflare Tunnel
+        ├── jina-rerank       ── shared GPU sidecar (1080 Ti)
+        └── Watchtower        ── auto-pulls :latest on weekly refresh
+```
+
+## CI (Gitea Actions on `git.jpaul.io`)
+
+Two cadences:
+
+- **`refresh.yml`** — weekly Monday 06:00 UTC cron + manual dispatch.
+  Re-scrapes upstream, commits corpus diffs, rebuilds Chroma + BM25,
+  builds & pushes image. ~5–8 min on the GPU pool.
+- **`image-only.yml`** — manual dispatch. Skips scrape; rebuilds
+  indexes from committed corpus and ships a new image. ~3 min.
+
+Image: `git.jpaul.io/justin/hvm-docs:latest` (Watchtower target),
+plus rolling `:<sha7>` and `:YYYY.MM.DD` tags.
+
+Embeddings fan out across the two GPU-pinned Ollama containers on
+the Gitea host (`192.168.0.2:11435` Titan X, `:11436` 1080 Ti) — same
+infra zerto-docs uses; see `OLLAMA_URLS` in both workflows.
+
+## Local dev

 ```bash
-git clone https://git.jpaul.io/justin/docs-mcp-template.git my-product-docs
-cd my-product-docs
-git remote remove origin  # detach from template
 python -m venv venv && source venv/bin/activate
 pip install -r requirements.txt

-# Read PLAN.md before doing anything else. Pay particular attention to
-# Phase 1 (scraper) — that's the most product-specific phase.
+# (Optional) the CPU dev reranker — pulls PyTorch (~2 GB); skip if
+# you'll just be running stdio queries.
+pip install -r requirements-rerank.txt

-# Run the stub server (no corpus yet — just verifies the wiring):
+# Build / refresh the corpus + indexes
+python -m scrape.bundles
+python -m scrape.runner --all --force --concurrency 6
+python -m rag.index --rebuild
+
+# Local stdio server (Claude Desktop dev)
 python -m docs_mcp.server --transport stdio
+
+# Local streamable-HTTP for integration testing
+python -m docs_mcp.server --transport streamable-http --port 8000
+
+# Run the eval harness (without reranker)
+python -m eval.run_eval --k 5
+
+# With the dev reranker
+python -m scripts.rerank_server &
+RERANK_URL=http://127.0.0.1:8001 python -m eval.run_eval --k 5
 ```

 ## Repo layout

 ```
 .
-├── PLAN.md                        # The build guide. Read first.
-├── README.md
-├── requirements.txt
+├── PLAN.md                       # 13-phase build guide (template-shared)
+├── CLAUDE.md                     # Claude Code guidance
+├── README.md                     # this file
 ├── Dockerfile
-├── .gitignore
-├── .gitea/workflows/
-│   ├── refresh.yml                # Weekly scrape + index + image push
-│   └── image-only.yml             # On-demand code-only ship
+├── requirements.txt              # production deps
+├── requirements-rerank.txt       # dev CPU reranker only
+├── bundles.json                  # bundle catalog (committed)
+├── corpus/                       # 1,161 scraped pages (committed)
+├── .gitea/workflows/             # refresh.yml + image-only.yml
 ├── scrape/
-│   ├── README.md                  # Product-specific scraper goes here
-│   └── changelog.py               # Reusable: --json, --history-out
+│   ├── bundles.py                # HVM bundle catalog + discovery
+│   ├── runner.py                 # TOC + single-doc page scraper
+│   └── changelog.py              # git-history → digest JSONL
 ├── rag/
-│   ├── embeddings.py              # Ollama embedder, swappable
-│   ├── chunk.py                   # Chunker — adjust per page format
-│   ├── index.py                   # Builds Chroma + (optionally) BM25
-│   └── bm25.py                    # SQLite FTS5 lexical index
+│   ├── chunk.py                  # paragraph-aware splitter w/ 6 KB hard cap
+│   ├── embeddings.py             # OLLAMA_URLS (zerto-style fan-out)
+│   ├── index.py                  # builds Chroma + BM25
+│   └── bm25.py                   # FTS5 lexical index
 ├── docs_mcp/
-│   ├── server.py                  # FastMCP server with stub tools
-│   └── usage.py                   # TimedCall + JSONL telemetry
+│   ├── server.py                 # FastMCP + 11 tools
+│   ├── usage.py                  # TimedCall JSONL telemetry
+│   └── api_lessons.md            # curated HVM operator gotchas
 ├── eval/
-│   ├── queries.jsonl.example      # Curate ~25 hand-labeled queries
-│   ├── retrievers.py              # Retriever protocol + implementations
-│   └── run_eval.py                # MRR / Recall@k / nDCG@k harness
+│   ├── queries.jsonl             # 22 hand-curated golden queries
+│   ├── retrievers.py             # Dense/BM25/Hybrid/Reranked
+│   ├── run_eval.py               # MRR / Recall@K / nDCG@K
+│   └── results/baseline.md       # committed eval results
 ├── scripts/
-│   ├── usage_report.py            # Standalone log analyzer
-│   └── registry_gc.py             # Container registry cleanup
+│   ├── rerank_server.py          # dev/CPU cross-encoder /v1/rerank
+│   ├── usage_report.py           # log summarizer
+│   └── registry_gc.py            # Gitea container-registry cleanup
 └── deploy/
-    └── docker-compose.yml         # Hosting stack: MCP + reranker + Watchtower
+    └── docker-compose.yml        # production hosting (MCP + reranker + Watchtower)
 ```

-## What's product-specific (must implement)
-
- `scrape/` — the scraper itself. The template gives you the corpus
-  layout contract and a working `changelog.py`; the actual extraction
-  logic is yours.
- The corpus on disk (gitignored; rebuilt by CI).
- The reranker GGUF model and llama.cpp container (commented in
-  `deploy/docker-compose.yml`).
- The reverse proxy / TLS layer in front of the public endpoint.
- The hand-curated knowledge surface (your product's API gotchas,
-  example scripts, anything the LLM should know that the docs
-  don't say).
-
-## What's NOT product-specific (works as-is)
-
- FastMCP server skeleton + tool decoration pattern
- Chroma + Ollama embedding pipeline
- BM25 / SQLite FTS5 lexical index
- Hybrid retrieval (RRF) + reranker integration
- Eval harness (Retriever protocol, MRR/Recall/nDCG)
- Usage logging (TimedCall, JSONL, daily rotation)
- CI workflow shape (weekly + on-demand, retry-on-race, three-tag
-  image scheme)
- Registry GC script
- Standard tools: `search_docs`, `get_page`, `list_versions`,
-  `diff_versions`, `bundle_changelog`, `weekly_digest`,
-  `find_doc_inconsistencies`, `submit_doc_bug`, etc.
-
 ## License

-Internal template. Adjust before publishing.
+Internal — HVM is HPE's product; the docs MCP is a side project, not
+HPE-sanctioned.