2026-05-22 14:04:16 -04:00
1 changed files with 152 additions and 75 deletions
@@ -1,104 +1,181 @@
-# docs-mcp-template
+# hvm-docs
-A reusable template for building hosted MCP servers over a product's
+A hosted MCP server over the public documentation for **HPE Morpheus
-public documentation. Distilled from one production build; everything
+VM Essentials Software** (HVM) — the KVM-based hypervisor platform
-product-specific has been factored out.
+from HPE. Lets any MCP-aware client (Claude Desktop, Claude Code,
 Cursor, Copilot, MetaMCP) answer questions against the User Manual,
 Release Notes, and Deployment Guide; diff pages across 8.1.x
 versions; surface what changed recently; and (when enabled) submit
 documentation bugs back to HPE.
-The end product is a streamable-HTTP MCP server with ~15 tools that
+Live behind MetaMCP at `https://mcp.jpaul.io/metamcp/hvm-docs/mcp`
-any LLM client (Claude Desktop, Claude Code, Cursor, Copilot) can
+once deployed.
 call to answer questions against the docs, surface what changed
 recently, find inconsistencies, and (optionally) submit doc bugs
 back upstream.
-## What's here
+## Tools
- **[PLAN.md](PLAN.md)** — comprehensive build guide. Phased
+11 tools, registered over MCP streamable-HTTP:
  approach (13 phases, ~2–3 weeks of focused work for the full
  stack). Includes the design decisions, the gotchas, and a
  per-product customization checklist.
 - **Scaffolded skeleton** — working FastMCP server with stub tools,
  Dockerfile, docker-compose, CI workflows, eval harness layout,
  usage logging. Everything you need to `git clone` and start
  filling in the product-specific bits.
-## Quick start
+| Tool | Use |
 |---|---|
 | `search_docs` | BM25-default search with optional version / platform / bundle filters; cross-encoder reranked when `RERANK_URL` is set |
 | `get_page` | Full markdown of one page with metadata header + source URL |
 | `list_versions` | Discover available versions, doc types, and bundle slugs |
 | `list_cluster` | Cross-version peers of a page (synthesized from same-GUID overlap) |
 | `diff_versions` | Unified diff of one topic between two bundles |
 | `bundle_changelog` | Added / removed / churn-ranked changed pages between two bundles |
 | `weekly_digest` | "What changed in the docs in the last N days" — reads CI-baked history.jsonl |
 | `corpus_status` | Image build time, upstream Published date, total bundles/pages/chunks |
 | `hvm_api_lessons` | Curated operator gotchas (manager sizing, upgrade ordering, plugin/worker compat, console keyboards, backups setup) |
 | `find_doc_inconsistencies` | Scoped scan for cross-version drift + redirect-chain stub pages |
 | `submit_doc_bug` | Env-gated draft → confirm → submit workflow to HPE's docs feedback (endpoint TBD; currently refuses with manual-fallback) |
 ## Corpus
 Confirmed bundles (scraped 2026-05-22 from HPE Support DocPortal):
 | Bundle | docId | Pages |
 |---|---|---|
 | `hvm_user_manual_8_1_0` | `sd00007520en_us` | 374 |
 | `hvm_user_manual_8_1_1` | `sd00007620en_us` | 376 |
 | `hvm_user_manual_8_1_2` | `sd00007735en_us` | 376 |
 | `hvm_release_notes_8_1_0` | `sd00007497en_us` | 1 |
 | `hvm_release_notes_8_1_1` | `sd00007609en_us` | 1 |
 | `hvm_release_notes_8_1_2` | `sd00007734en_us` | 1 |
 | `hvm_deployment_guide` | `sd00007332en_us` | 32 |
 Total: ~1,161 pages → 2,650 chunks in Chroma + same chunks indexed in
 SQLite FTS5 (BM25).
 GUIDs are stable across HVM versions, so `topic_cluster` cross-version
 peer mapping is free (no fuzzy matching needed).
 ## Retrieval
 Eval against 22 hand-curated golden queries — see
 [`eval/results/baseline.md`](eval/results/baseline.md):
 | Retriever | MRR | Recall@5 | nDCG@5 | latency |
 |---|---:|---:|---:|---:|
 | dense (Ollama nomic-embed-text) | 0.539 | 0.621 | 0.558 | 88 ms |
 | BM25 (SQLite FTS5) | 0.880 | 0.909 | 0.883 | 3 ms |
 | hybrid (dense + BM25 + RRF) | 0.692 | 0.818 | 0.713 | 69 ms |
 | **bm25 + jina-rerank** | **0.920** | **0.939** | **0.927** | 490 ms (CPU) / ~50 ms (GPU) |
 HPE docs use controlled vocabulary, so lexical match dominates; the
 cross-encoder cleans up the long tail. See PLAN.md Phase 7/8 for the
 reasoning.
 ## Architecture
 ```
 HPE Support DocPortal (sniff-the-API, no auth)
        │
        ▼
   scrape/        ──► corpus/<bundle>/<GUID>.{md,json}  (committed)
        │
        ▼
   rag/index      ──► chroma/  (dense, 768-dim nomic-embed-text)
                  ──► bm25/    (SQLite FTS5)
        │
        ▼
   docs_mcp.server (FastMCP, streamable-HTTP)
        │
        ├── BM25 → reranker (jina-reranker-v2-base GGUF, GPU sidecar)
        │
        ▼
   deploy/docker-compose.yml
        │
        ├── MetaMCP gateway   ── public at mcp.jpaul.io behind Cloudflare Tunnel
        ├── jina-rerank       ── shared GPU sidecar (1080 Ti)
        └── Watchtower        ── auto-pulls :latest on weekly refresh
 ```
 ## CI (Gitea Actions on `git.jpaul.io`)
 Two cadences:
 - **`refresh.yml`** — weekly Monday 06:00 UTC cron + manual dispatch.
  Re-scrapes upstream, commits corpus diffs, rebuilds Chroma + BM25,
  builds & pushes image. ~5–8 min on the GPU pool.
 - **`image-only.yml`** — manual dispatch. Skips scrape; rebuilds
  indexes from committed corpus and ships a new image. ~3 min.
 Image: `git.jpaul.io/justin/hvm-docs:latest` (Watchtower target),
 plus rolling `:<sha7>` and `:YYYY.MM.DD` tags.
 Embeddings fan out across the two GPU-pinned Ollama containers on
 the Gitea host (`192.168.0.2:11435` Titan X, `:11436` 1080 Ti) — same
 infra zerto-docs uses; see `OLLAMA_URLS` in both workflows.
 ## Local dev
 ```bash
 git clone https://git.jpaul.io/justin/docs-mcp-template.git my-product-docs
 cd my-product-docs
 git remote remove origin  # detach from template
 python -m venv venv && source venv/bin/activate
 pip install -r requirements.txt
-# Read PLAN.md before doing anything else. Pay particular attention to
+# (Optional) the CPU dev reranker — pulls PyTorch (~2 GB); skip if
-# Phase 1 (scraper) — that's the most product-specific phase.
+# you'll just be running stdio queries.
 pip install -r requirements-rerank.txt
-# Run the stub server (no corpus yet — just verifies the wiring):
+# Build / refresh the corpus + indexes
 python -m scrape.bundles
 python -m scrape.runner --all --force --concurrency 6
 python -m rag.index --rebuild
 # Local stdio server (Claude Desktop dev)
 python -m docs_mcp.server --transport stdio
 # Local streamable-HTTP for integration testing
 python -m docs_mcp.server --transport streamable-http --port 8000
 # Run the eval harness (without reranker)
 python -m eval.run_eval --k 5
 # With the dev reranker
 python -m scripts.rerank_server &
 RERANK_URL=http://127.0.0.1:8001 python -m eval.run_eval --k 5
 ```
 ## Repo layout
 ```
 .
-├── PLAN.md                        # The build guide. Read first.
+├── PLAN.md                       # 13-phase build guide (template-shared)
-├── README.md
+├── CLAUDE.md                     # Claude Code guidance
-├── requirements.txt
+├── README.md                     # this file
 ├── Dockerfile
-├── .gitignore
+├── requirements.txt              # production deps
-├── .gitea/workflows/
+├── requirements-rerank.txt       # dev CPU reranker only
-│   ├── refresh.yml                # Weekly scrape + index + image push
+├── bundles.json                  # bundle catalog (committed)
-│   └── image-only.yml             # On-demand code-only ship
+├── corpus/                       # 1,161 scraped pages (committed)
 ├── .gitea/workflows/             # refresh.yml + image-only.yml
 ├── scrape/
-│   ├── README.md                  # Product-specific scraper goes here
+│   ├── bundles.py                # HVM bundle catalog + discovery
-│   └── changelog.py               # Reusable: --json, --history-out
+│   ├── runner.py                 # TOC + single-doc page scraper
 │   └── changelog.py              # git-history → digest JSONL
 ├── rag/
-│   ├── embeddings.py              # Ollama embedder, swappable
+│   ├── chunk.py                  # paragraph-aware splitter w/ 6 KB hard cap
-│   ├── chunk.py                   # Chunker — adjust per page format
+│   ├── embeddings.py             # OLLAMA_URLS (zerto-style fan-out)
-│   ├── index.py                   # Builds Chroma + (optionally) BM25
+│   ├── index.py                  # builds Chroma + BM25
-│   └── bm25.py                    # SQLite FTS5 lexical index
+│   └── bm25.py                   # FTS5 lexical index
 ├── docs_mcp/
-│   ├── server.py                  # FastMCP server with stub tools
+│   ├── server.py                 # FastMCP + 11 tools
-│   └── usage.py                   # TimedCall + JSONL telemetry
+│   ├── usage.py                  # TimedCall JSONL telemetry
 │   └── api_lessons.md            # curated HVM operator gotchas
 ├── eval/
-│   ├── queries.jsonl.example      # Curate ~25 hand-labeled queries
+│   ├── queries.jsonl             # 22 hand-curated golden queries
-│   ├── retrievers.py              # Retriever protocol + implementations
+│   ├── retrievers.py             # Dense/BM25/Hybrid/Reranked
-│   └── run_eval.py                # MRR / Recall@k / nDCG@k harness
+│   ├── run_eval.py               # MRR / Recall@K / nDCG@K
 │   └── results/baseline.md       # committed eval results
 ├── scripts/
-│   ├── usage_report.py            # Standalone log analyzer
+│   ├── rerank_server.py          # dev/CPU cross-encoder /v1/rerank
-│   └── registry_gc.py             # Container registry cleanup
+│   ├── usage_report.py           # log summarizer
 │   └── registry_gc.py            # Gitea container-registry cleanup
 └── deploy/
-    └── docker-compose.yml         # Hosting stack: MCP + reranker + Watchtower
+    └── docker-compose.yml        # production hosting (MCP + reranker + Watchtower)
 ```
 ## What's product-specific (must implement)
 - `scrape/` — the scraper itself. The template gives you the corpus
  layout contract and a working `changelog.py`; the actual extraction
  logic is yours.
 - The corpus on disk (gitignored; rebuilt by CI).
 - The reranker GGUF model and llama.cpp container (commented in
  `deploy/docker-compose.yml`).
 - The reverse proxy / TLS layer in front of the public endpoint.
 - The hand-curated knowledge surface (your product's API gotchas,
  example scripts, anything the LLM should know that the docs
  don't say).
 ## What's NOT product-specific (works as-is)
 - FastMCP server skeleton + tool decoration pattern
 - Chroma + Ollama embedding pipeline
 - BM25 / SQLite FTS5 lexical index
 - Hybrid retrieval (RRF) + reranker integration
 - Eval harness (Retriever protocol, MRR/Recall/nDCG)
 - Usage logging (TimedCall, JSONL, daily rotation)
 - CI workflow shape (weekly + on-demand, retry-on-race, three-tag
  image scheme)
 - Registry GC script
 - Standard tools: `search_docs`, `get_page`, `list_versions`,
  `diff_versions`, `bundle_changelog`, `weekly_digest`,
  `find_doc_inconsistencies`, `submit_doc_bug`, etc.
 ## License
-Internal template. Adjust before publishing.
+Internal — HVM is HPE's product; the docs MCP is a side project, not
 HPE-sanctioned.