hvm-docs-refresh
7b2c39bb0a
weekly refresh: 2026-06-29T06:04Z — 0 content change(s) across 0 bundle(s)
...
0 content change(s) across 0 bundle(s)
0 sidecar metadata update(s)
2026-06-29 06:04:21 +00:00
hvm-docs-refresh
12298d1c7d
weekly refresh: 2026-06-22T06:18Z — 1 content change(s) across 1 bundle(s)
...
1 content change(s) across 1 bundle(s)
0 sidecar metadata update(s)
Bundles with content changes:
morpheus_quickspecs: 1 page(s)
- a50009231enw
2026-06-22 06:18:14 +00:00
justin
b29f5636fc
rag: resilient embedder — rotate/split on endpoint errors; 4-GPU embed pool ( #3 )
2026-06-10 15:47:03 -04:00
hvm-docs-refresh
5b929e40c2
weekly refresh: 2026-05-25T06:09Z — 0 content change(s) across 0 bundle(s)
...
0 content change(s) across 0 bundle(s)
0 sidecar metadata update(s)
2026-05-25 06:09:23 +00:00
justin
c8362953bb
remove submit_doc_bug tool
2026-05-24 07:44:40 -04:00
hvm-docs-refresh
797750a4dc
weekly refresh: 2026-05-22T19:31Z — 1599 content change(s) across 7 bundle(s)
...
1599 content change(s) across 7 bundle(s)
1599 sidecar metadata update(s)
7 new bundle(s) added
Bundles with content changes:
morpheus_quickspecs (NEW): 1 page(s)
- a50009231enw
morpheus_release_notes_8_1_0 (NEW): 1 page(s)
- sd00007496en_us
morpheus_release_notes_8_1_1 (NEW): 1 page(s)
- sd00007610en_us
morpheus_release_notes_8_1_2 (NEW): 1 page(s)
- sd00007733en_us
morpheus_user_manual_8_1_0 (NEW): 531 page(s)
- GUID-008B6CB0-CC99-4E40-89CE-2EA5D3389D8F
- GUID-00CE7A1B-7CBC-4227-B285-2E8E4FFC73B1
- GUID-00E2C383-D602-4EAD-BE7F-64E69F5DD5C1
- GUID-014CB518-CCE4-449A-9BCC-241FF8FEA3C5
- GUID-016909A4-5AF9-4DE7-8088-DB62BBC6B00C
... and 526 more
morpheus_user_manual_8_1_1 (NEW): 532 page(s)
- GUID-008B6CB0-CC99-4E40-89CE-2EA5D3389D8F
- GUID-00CE7A1B-7CBC-4227-B285-2E8E4FFC73B1
- GUID-00E2C383-D602-4EAD-BE7F-64E69F5DD5C1
- GUID-014CB518-CCE4-449A-9BCC-241FF8FEA3C5
- GUID-016909A4-5AF9-4DE7-8088-DB62BBC6B00C
... and 527 more
morpheus_user_manual_8_1_2 (NEW): 532 page(s)
- GUID-008B6CB0-CC99-4E40-89CE-2EA5D3389D8F
- GUID-00CE7A1B-7CBC-4227-B285-2E8E4FFC73B1
- GUID-00E2C383-D602-4EAD-BE7F-64E69F5DD5C1
- GUID-014CB518-CCE4-449A-9BCC-241FF8FEA3C5
- GUID-016909A4-5AF9-4DE7-8088-DB62BBC6B00C
... and 527 more
2026-05-22 19:31:35 +00:00
justin
a9889ffdd6
fix: un-ignore corpus/ and tag commits as morpheus-docs-refresh ( #1 )
2026-05-22 15:29:52 -04:00
justin
fa448f94e1
build out morpheus-docs MCP stack, mirroring hvm-docs through Phases 1-13
...
Initial scaffold: the docs-mcp-template clone with all the
HVM-validated stack ported across, customized for Morpheus
Enterprise (PRODUCT_NAME=morpheus, server name morpheus-docs).
Bundles (live-discovered 2026-05-22; 1710 cataloged pages total):
* morpheus_user_manual_8_1_0 sd00007510en_us 568 pages (Feb 2026)
* morpheus_user_manual_8_1_1 sd00007621en_us 569 pages (Mar 2026)
* morpheus_user_manual_8_1_2 sd00007732en_us 569 pages (Apr 2026)
* morpheus_release_notes_8_1_0 sd00007496en_us single-doc
* morpheus_release_notes_8_1_1 sd00007610en_us single-doc
* morpheus_release_notes_8_1_2 sd00007733en_us single-doc
* morpheus_quickspecs a50009231enw html-file (live
curl_cffi against www.hpe.com; all 12+ Enterprise SKUs captured —
S6E64..S6E73AAE for new/renewal/upgrade × 1/3/5-yr terms, plus
services SKUs HA124A1#V38/V39 and H46SBA1).
No Deployment Guide or Qualification Matrix on HPE Support for
Morpheus Enterprise specifically — the only QM (sd00006551en_us)
covers HVM clusters managed by Morpheus and lives in hvm-docs.
Stack carried forward from hvm-docs:
* rag/{index,chunk,embeddings,bm25}.py — including the
MAX_CHARS=4000 chunk-cap fix for table-dense content
* docs_mcp/{server,usage}.py — 11 MCP tools, BM25-default search,
cross-encoder rerank, hybrid behind HYBRID_SEARCH=true,
morpheus_api_lessons (renamed from hvm_api_lessons), env-gated
submit_doc_bug
* docs_mcp/api_lessons.md — Morpheus-specific scaffold covering
licensing model, HVM elevation path, REST vs Plugin API, with
TODO markers for sections to flesh out from real ops experience
* scrape/{runner,quickspecs,changelog,bundles}.py — TOC + single-doc
+ html-file modes, curl_cffi Chrome120 for www.hpe.com edge bypass
* eval/{retrievers,run_eval}.py + queries.jsonl scaffold (4 placeholder
queries; populate after first scrape)
* scripts/{rerank_server,usage_report,registry_gc}.py
* .gitea/workflows/{refresh,image-only}.yml — same Gitea Actions
setup zerto-docs uses (push LAN, pull public-URL, GPU Ollama pool)
* deploy/docker-compose.yml — morpheus-docs-mcp service definition,
shared jina-rerank sidecar, Watchtower-labeled
* Dockerfile, requirements.txt, requirements-rerank.txt
Verified locally: scrape produced 1599 .md pages (some TOC entries
are parent-only and yield no body), 6353 chunks all under the 4 KB
cap, MCP server boots and lists 11 tools cleanly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-22 15:26:24 -04:00
justin
43728320bf
ci: default PRODUCT_NAME to repo name (caught by template dispatch test)
...
First dispatch on the empty template failed at Chroma collection
creation because PRODUCT_NAME was the literal string "<product>"
(YAML doesn't expand placeholders), and Chroma rejects collection
names containing characters outside [a-zA-Z0-9._-]:
chromadb.errors.InvalidArgumentError: Validation error: name:
Expected a name containing 3-512 characters from [a-zA-Z0-9._-],
starting and ending with a character in [a-zA-Z0-9]. Got:
<product>_docs
Same fix as the IMAGE env: derive from the repo name dynamically
via ${{ github.event.repository.name }}. Cloners can still override
explicitly, but a fresh clone now runs the index-rebuild step
cleanly out of the box.
Verified by re-dispatch — should fail next at docker login (placeholder
REGISTRY_PUSH hostname), which is the next-expected fail point and a
real per-deployment config the cloner has to fill in.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-22 09:37:07 -04:00
justin
33b0fd652e
ci: derive image name + package linking from repo, add link step
...
Both workflows had a static IMAGE env (<owner>/<product>-docs-mcp)
and a static --package arg in the GC step. Switch both to Gitea
Actions context variables so a clone of the template into any repo
name works on the first CI run without find/replace:
IMAGE: ${{ github.repository_owner }}/${{ github.event.repository.name }}
--owner ${{ github.repository_owner }}
--package ${{ github.event.repository.name }}
Also add the "Link container package to this repo" step that was
missing from the template (and which, naively copy-pasted from the
reference build, would have linked everything back to docs-mcp-
template). The new step derives owner + package + link-target all
from the running repo's context.
The github.* namespace is Gitea Actions' inherited GitHub-Actions
context — values come from the Gitea server, not github.com. Same
mechanism the reference build's $GITHUB_SHA tag-builder uses.
CLAUDE.md updated to note that image and package naming are
repo-derived; only registry endpoints and the Ollama URL need
per-clone editing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-22 09:34:26 -04:00
justin
9ba615c8ee
initial: docs-mcp-template — build guide + scaffolded server
...
Template for building hosted MCP servers over a product's public
documentation. Distilled from one production build; everything
product-specific has been factored out.
Contents:
- PLAN.md — comprehensive build guide. 13 phases from project
skeleton through weekly_digest. Includes the gotchas
("fetch-depth: 0 always", reranker per-pair token limit,
Cloudflare body cap, dash-not-bash on Gitea runners), the
decisions worth carrying forward, and a per-product
customization checklist.
- CLAUDE.md — guidance for Claude Code working in a clone of this
template. Phase identification table, conventions (env-gating +
operator confirmation for side-effecting tools, defensive
fallback for retrieval components), common commands.
- README.md — quick-start summary.
Scaffolded code (all signature-stable, with NotImplementedError
stubs where phase-specific work is required):
docs_mcp/server.py FastMCP server, stateless_http=True, with
search_docs / get_page / list_versions
baseline tools and commented stubs for the
rest of the phase set.
docs_mcp/usage.py TimedCall telemetry, JSONL, daily rotation,
90-day retention. Reusable as-is.
rag/embeddings.py Ollama embedder (nomic-embed-text default),
load-balanced across N URLs. Reusable.
rag/chunk.py Paragraph-aware chunker with synthetic
chunk 0. Per-product tunable.
rag/index.py Chroma + BM25 builder. --rebuild and
--bm25-only flags.
rag/bm25.py SQLite FTS5 lexical index. Reusable.
scrape/changelog.py --cached / --ref / --json / --history-out.
Reusable.
scrape/README.md What you write per-product.
eval/queries.jsonl.example
Curate ~25 hand-labeled queries here.
eval/retrievers.py Retriever protocol + stub classes.
eval/run_eval.py MRR / Recall@K / nDCG@K harness skeleton.
scripts/usage_report.py
Standalone log analyzer; the
FOLLOW-UP CHECKS pattern noted in the
module docstring.
scripts/registry_gc.py
Gitea container registry cleanup. Reusable.
Deployment + CI:
Dockerfile Python 3.12-slim; COPY corpus + chroma
+ bm25 last for cache efficiency.
deploy/docker-compose.yml MCP + reranker sidecar + Watchtower.
Templated with <placeholders>.
.gitea/workflows/refresh.yml Weekly cron + manual dispatch.
fetch-depth: 0, retry-on-race,
three-tag image scheme.
.gitea/workflows/image-only.yml Code-only ship cycle, ~18min.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-22 09:18:17 -04:00