feat: implement Phases 9 / 11 / 12 / 13 — diff/lessons/inconsistencies/digest
Eight new MCP tools on top of the Phase 3 baseline. Each one uses
TimedCall so calls show up in usage.jsonl alongside search/get/list.
Phase 9 — multi-version diff:
* list_cluster(bundle_id, page_id) — cross-version peers from the
synthesized topic_cluster (same GUID across 8.1.x versions).
* diff_versions(bundle_id, page_id, against_bundle_id) — unified
diff between two bundles; uses topic_cluster first, falls back to
same page_id (which works because HVM GUIDs are stable cross-version).
* bundle_changelog(new, old) — page-level adds/removes/churn summary,
sorted by lines moved; uses _diff_churn helper.
Phase 11 — curated knowledge:
* hvm_api_lessons(topic?) — surfaces docs_mcp/api_lessons.md (manager
sizing, upgrade ordering, plugin/worker version compat, backups
setup, console keyboard, elevation, ops gotchas). topic= filters to
matching H2 sections. Marked "call proactively for HVM scripting /
integration / upgrade questions" in the docstring so the LLM uses it.
Phase 12 — doc-bug workflow:
* find_doc_inconsistencies(scope_query, ...) — read-only scan with two
checks: cross_version_drift (line-diff vs cluster peers, in-band
10-60% of file = high confidence) and redirect_chain (short body
that's mostly a "see [other page]" pointer).
* submit_doc_bug(page_url, content, ...) — env-gated OFF
(DOC_BUG_SUBMIT_ENABLED) AND requires DOC_BUG_API_URL. Refuses
cleanly with a manual-fallback message when either is unset.
Allowlist: support.hpe.com only. Mandatory operator-confirmation
pattern in the docstring; loud "do not loop" warning. The actual
HPE feedback endpoint hasn't been sniffed yet — when it is, set
both env vars and verify the payload shape against the schema.
Phase 13 — weekly digest:
* _digest_history() reads corpus/.digest/history.jsonl (built by
scrape.changelog --history-out in the CI refresh workflow).
* weekly_digest(days, version?, platform?, ...) aggregates corpus-
touching commits in the window. Post-filter totals so version /
platform filters give honest "X page changes" numbers, not the
pre-filter commit count.
* corpus_status() reports image build time, latest upstream Published
date, total bundles/pages/chunks, and the 5 most-recently-edited
bundles.
Tool count now: 11 registered (search_docs, get_page, list_versions,
list_cluster, diff_versions, bundle_changelog, weekly_digest,
corpus_status, hvm_api_lessons, find_doc_inconsistencies, submit_doc_bug).
Verified end-to-end via MCP stdio tools/list.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,236 @@
|
||||
# HPE Morpheus VM Essentials — Lessons
|
||||
|
||||
Notes and gotchas about running, integrating with, and upgrading
|
||||
**HPE Morpheus VM Essentials (HVM)** that aren't obvious from the
|
||||
official docs alone. The official docs (User Manual, Release Notes,
|
||||
Deployment Guide) describe the product as designed; this file is
|
||||
what experienced operators actually learn.
|
||||
|
||||
> Treat this as living context. Update it when you (or the LLM
|
||||
> driving this MCP) discover something non-obvious that the docs
|
||||
> don't say or don't make findable. Each section is an H2 so the
|
||||
> `hvm_api_lessons(topic=...)` tool can return just the relevant
|
||||
> piece.
|
||||
|
||||
## TL;DR
|
||||
|
||||
- **HVM is KVM under the hood.** The "VM Essentials manager" is a
|
||||
control-plane VM that orchestrates KVM hosts ("HVM Hosts"). The
|
||||
manager runs the same web UI you'd see in HPE Morpheus Enterprise,
|
||||
but scoped to the HVM hypervisor capability set. Elevating
|
||||
unlocks the full Morpheus Enterprise feature set.
|
||||
- **Versioning is independent per upgrade hop.** The manager,
|
||||
HVM Host agents, and the Plugin API are versioned separately. After
|
||||
every manager upgrade, re-check HVM Host agent versions — they
|
||||
don't auto-upgrade with the manager.
|
||||
- **Compatibility matrix lives in the Release Notes.** Each version's
|
||||
Release Notes lists the compatible Plugin API + Worker versions
|
||||
for that build. Use `weekly_digest` or `get_page` on the Release
|
||||
Notes when planning an upgrade.
|
||||
- **Two consumer surfaces:** the web UI on the manager (port 443),
|
||||
and the REST API (same host, `/api/` path). For automation use
|
||||
the REST API; for ad-hoc admin use the UI.
|
||||
|
||||
## Manager sizing — pick based on cluster count
|
||||
|
||||
The Deployment Guide page `VME Manager sizing` is the authoritative
|
||||
table. Quick mental model:
|
||||
|
||||
| Size | vCPU | RAM | Max HVM clusters (prod-realistic) |
|
||||
|--------|------|-------|------------------------------------|
|
||||
| Small | 2 | 12GB | 1 (test/PoC only) |
|
||||
| Medium | 4 | 16GB | 3 |
|
||||
| Large | 4 | 32GB | up to 10 |
|
||||
|
||||
**Pick Medium by default for production.** Small is genuinely
|
||||
underpowered for anything beyond a lab — UI latency degrades fast
|
||||
when concurrent provisioning runs.
|
||||
|
||||
You CAN'T resize the manager in place by editing the VM. Backup, deploy
|
||||
a new manager at the new size, restore. See `Backing Up and Restoring
|
||||
the HPE Morpheus VM Essentials Manager` in the User Manual.
|
||||
|
||||
## Upgrade ordering — manager first, then host agents
|
||||
|
||||
HVM doesn't auto-upgrade host agents along with the manager. Every
|
||||
manager upgrade hop is two steps:
|
||||
|
||||
1. **Upgrade the manager** (e.g., 8.1.0 → 8.1.1). Standard upgrade
|
||||
process per the Deployment Guide.
|
||||
2. **Upgrade each HVM Host's agent.** Manager → HVM Host detail →
|
||||
ACTIONS menu → "Upgrade Agent". Alternative: download the agent
|
||||
upgrade script and run it on each host. Scripts are UNIQUE PER
|
||||
HOST — don't reuse one script across multiple hosts.
|
||||
|
||||
Some upgrade hops won't actually require a host-agent update (the
|
||||
manager release didn't change the agent). Check the Release Notes for
|
||||
the target version to see whether a host agent upgrade is included.
|
||||
|
||||
> Skipping the agent upgrade step is the #1 cause of cluster issues
|
||||
> right after a manager upgrade. Symptoms: hosts appear "out of
|
||||
> sync" or "agent disconnected" in the manager UI, even though the
|
||||
> hosts themselves are healthy.
|
||||
|
||||
## Plugin API + Worker version compatibility
|
||||
|
||||
Each HVM version pins a specific compatible Plugin API version and
|
||||
minimum HVM Host Worker version. The Release Notes for that version
|
||||
state both. Examples (verify against the live Release Notes):
|
||||
|
||||
- HVM 8.1.2 → Plugin API 1.3.4, HVM Host Worker ≥ 5.4.8
|
||||
- (Earlier versions: check the corresponding `hvm_release_notes_*`
|
||||
bundle.)
|
||||
|
||||
When writing a plugin or integration:
|
||||
|
||||
- Pin the Plugin API version explicitly in your integration's deps.
|
||||
- Test against the matrix combinations of the HVM versions you're
|
||||
promising support for.
|
||||
- Be aware that minor versions can change the Plugin API surface;
|
||||
don't assume drop-in compat across 8.1.x.
|
||||
|
||||
## Backups — the "global settings first" trap
|
||||
|
||||
`Initial Backups Setup` says to configure Global Backup Settings,
|
||||
Storage Providers, and Execution Schedules **before** creating any
|
||||
backup jobs. This is real — if you skip global settings and just
|
||||
create a backup, the job runs with defaults that may not match what
|
||||
you want (retention, schedule alignment, synthetic-full cadence).
|
||||
|
||||
Order of operations:
|
||||
|
||||
1. **Global Backup Settings** (Administration → Settings → Backups)
|
||||
— enable scheduled backups globally, set defaults.
|
||||
2. **Storage Providers** (Infrastructure → Storage) — configure
|
||||
where backups land.
|
||||
3. **Execution Schedules** (Library → Automation → Execute
|
||||
Scheduling) — define when jobs run.
|
||||
4. **Then** create backup jobs (Backups → Backups → ADD).
|
||||
|
||||
### Synthetic Full backups
|
||||
|
||||
Only meaningful for HVM Instance backups (not VMware/etc.). When
|
||||
enabled, the manager periodically synthesizes a "new full" from a
|
||||
chain of incrementals so restore time stays bounded as the chain
|
||||
grows. Recommended on for production HVM Instance backups.
|
||||
|
||||
### VMware-specific quirk
|
||||
|
||||
For VMware Cloud-type backups, the manager merges + consolidates
|
||||
snapshots BEFORE the OVF export. That means a long-held snapshot
|
||||
chain gets squashed into the backup — fine, but the source VM's
|
||||
snapshot list changes as a side effect of the backup run. Coordinate
|
||||
with anyone who relied on those snapshots existing.
|
||||
|
||||
## Console access — hypervisor vs guest, and keyboard layouts
|
||||
|
||||
HVM VMs expose **two** console paths:
|
||||
|
||||
- **Guest console** — what's inside the VM (the OS desktop /
|
||||
console). Subject to whatever the guest OS provides.
|
||||
- **Hypervisor console** — the KVM-level console (pre-OS, BIOS,
|
||||
bootloader). Useful for OS install, kernel-panic recovery,
|
||||
GRUB editing.
|
||||
|
||||
Per-VM keyboard layout is set on the VM, then applied to each
|
||||
session. Choose based on the keyboard the operator who'll actually
|
||||
use the console has. Mismatches show up as characters being typed
|
||||
"wrong" in the console even though the keyboard is fine — usually
|
||||
operator picks the layout setting once during VM provisioning and
|
||||
forgets to revisit.
|
||||
|
||||
### 8.1.2 added Japanese 101/102 keyboard layout
|
||||
|
||||
Only in 8.1.2 and later: Japanese 101/102 hardware keyboards are
|
||||
supported. The 106/109 layout (the more common JP keyboard form
|
||||
factor) was announced as a planned future enhancement at 8.1.2.
|
||||
|
||||
The 8.1.2 console UI also gained a "keyboard tips" pop-over (the
|
||||
"?" button) using the HPE Morpheus key extension. Works for VMware,
|
||||
vCD, and HVM/KVM consoles.
|
||||
|
||||
## Elevating to HPE Morpheus Enterprise
|
||||
|
||||
HVM ships as the VM-only subset of HPE Morpheus's platform. Elevating
|
||||
upgrades the manager into the full Morpheus Enterprise SKU, which
|
||||
unlocks multi-cloud, container, automation, and policy features.
|
||||
|
||||
Elevation is an in-place upgrade — you don't redeploy the manager.
|
||||
Per the User Manual page `Elevating to HPE Morpheus Enterprise`:
|
||||
|
||||
- Acquire/install a Morpheus Enterprise license.
|
||||
- Run the elevation procedure (UI-driven from the manager).
|
||||
- HVM clusters keep working unchanged after elevation.
|
||||
|
||||
> When elevated, the docs you want to read shift to the Morpheus
|
||||
> Enterprise documentation set (sd0000{7510,7621,7732}en_us). A
|
||||
> separate `morpheus-docs` MCP exists for that surface; treat HVM
|
||||
> docs as a strict subset.
|
||||
|
||||
## REST API surface
|
||||
|
||||
**TODO — fill in as we learn it.** The Deployment Guide includes an
|
||||
"API reference" entry; the actual API is the Morpheus REST API
|
||||
(scoped to the HVM-relevant subset on a non-elevated manager). Until
|
||||
this section gets real content, the practical advice is:
|
||||
|
||||
- Hit `https://<manager>/api/` — same shape as Morpheus Enterprise's
|
||||
API. Use bearer-token auth (issue tokens from the user profile /
|
||||
api-tokens UI).
|
||||
- The Morpheus public OpenAPI / API docs document the full surface;
|
||||
HVM-only managers reject Enterprise-only endpoints with 404.
|
||||
- Plugin API and REST API are SEPARATE surfaces. Plugins extend the
|
||||
manager; the REST API is for external automation.
|
||||
|
||||
If you've integrated against HVM REST in anger and have specific
|
||||
notes (auth quirks, undocumented endpoints, schema gotchas,
|
||||
pagination behavior, etc.), drop them here.
|
||||
|
||||
## Networking & deployment scenarios
|
||||
|
||||
The Deployment Guide outlines two recommended network topologies:
|
||||
|
||||
- **Scenario 1: Converged networking** (one bonded interface group
|
||||
carries management + storage + VM traffic, segmented by VLAN).
|
||||
Simpler to deploy; recommended for most environments.
|
||||
- **Scenario 2: Decoupled networking** (separate physical NICs for
|
||||
management vs storage vs VM traffic). More throughput for
|
||||
high-density VM hosts but more cabling.
|
||||
|
||||
`Network Bonding` covers the bond configuration. Active-Backup is the
|
||||
safe default; LACP requires switch coordination.
|
||||
|
||||
The **Required ports** page is the authoritative list — check it
|
||||
before firewalling anything between the manager and HVM Hosts. The
|
||||
list changes as features evolve; don't memorize.
|
||||
|
||||
## Common operational gotchas
|
||||
|
||||
- **Storage Buckets vs Backup Targets.** Same concept, different
|
||||
context. A "Storage Bucket" is a configured storage provider in
|
||||
Morpheus terms; a "Backup Target" is what the backup wizard calls
|
||||
the same thing during the create-backup flow. They're the same
|
||||
underlying object.
|
||||
- **Disabling 2FA.** Per-user, set from the user's profile page.
|
||||
There's no global "disable 2FA for everyone" toggle — for
|
||||
environments that need to back out a 2FA rollout, disable
|
||||
per-user.
|
||||
- **VM Essentials Network Pool.** Defined in Infrastructure →
|
||||
Network → Pools. The HVM provisioning flow allocates IPs from
|
||||
this pool — if pool is exhausted, provisioning fails with a
|
||||
confusing error message in the activity log; check pool free
|
||||
count first when troubleshooting.
|
||||
|
||||
## Adding to this doc
|
||||
|
||||
Two ways:
|
||||
|
||||
1. Manually edit `docs_mcp/api_lessons.md` in this repo and commit.
|
||||
The next image build picks it up.
|
||||
2. Use `submit_doc_bug` for upstream issues, and append the
|
||||
takeaway here once the docs team responds.
|
||||
|
||||
The point of this doc is to surface the kind of context an
|
||||
experienced operator would mention in a hallway conversation but
|
||||
that doesn't quite fit anywhere in the formal product docs. Keep
|
||||
sections tight — one H2 = one topic the LLM can return on demand.
|
||||
+806
-20
@@ -18,6 +18,8 @@ stable across products — clients depend on them.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import datetime as _dt
|
||||
import difflib
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
@@ -48,6 +50,8 @@ CORPUS = ROOT / "corpus"
|
||||
CHROMA_DIR = ROOT / "chroma"
|
||||
BM25_DB = Path(os.environ.get("BM25_DB", str(ROOT / "bm25" / f"{PRODUCT_NAME}_docs.db")))
|
||||
BUNDLES_JSON = ROOT / "bundles.json"
|
||||
DIGEST_HISTORY_PATH = CORPUS / ".digest" / "history.jsonl"
|
||||
API_LESSONS_MD = Path(__file__).resolve().parent / "api_lessons.md"
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Feature flags (Phase 6 / 8 / 12 enable these as you ship each phase).
|
||||
@@ -455,34 +459,816 @@ def list_versions() -> str:
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Stubs for later phases — keep the signatures in this file so refactors
|
||||
# don't lose the contracts. Implementations come per phase.
|
||||
# ---------------------------------------------------------------------------
|
||||
# ===========================================================================
|
||||
# Phase 9 — cross-version tools
|
||||
# ===========================================================================
|
||||
|
||||
# @mcp.tool() # Phase 9
|
||||
# def list_cluster(bundle_id: str, page_id: str) -> str: ...
|
||||
def _bundle_pages(bundle_id: str) -> set[str]:
|
||||
"""Page IDs (= GUID-XXXX) on disk in a bundle. Mirrors rag.index's md_path.stem."""
|
||||
bd = CORPUS / bundle_id
|
||||
if not bd.is_dir():
|
||||
return set()
|
||||
return {p.stem for p in bd.glob("*.md")}
|
||||
|
||||
# @mcp.tool() # Phase 9
|
||||
# def diff_versions(bundle_id: str, page_id: str, against_bundle_id: str, context: int = 3) -> str: ...
|
||||
|
||||
# @mcp.tool() # Phase 9
|
||||
# def bundle_changelog(bundle_id_new: str, bundle_id_old: str, min_churn: int = 5, max_changed: int = 50) -> str: ...
|
||||
def _diff_churn(a: str, b: str) -> tuple[int, int]:
|
||||
"""Cheap (added, removed) line counts for a pair of markdown bodies."""
|
||||
diff = difflib.unified_diff(a.splitlines(keepends=False),
|
||||
b.splitlines(keepends=False), n=0)
|
||||
added = removed = 0
|
||||
for line in diff:
|
||||
if line.startswith(("+++", "---", "@@")):
|
||||
continue
|
||||
if line.startswith("+"):
|
||||
added += 1
|
||||
elif line.startswith("-"):
|
||||
removed += 1
|
||||
return added, removed
|
||||
|
||||
# @mcp.tool() # Phase 13
|
||||
# def weekly_digest(days: int = 7, version: str | None = None, platform: str | None = None, ...) -> str: ...
|
||||
|
||||
# @mcp.tool() # Phase 9 (or 3 — useful early)
|
||||
# def corpus_status() -> str: ...
|
||||
@mcp.tool()
|
||||
def list_cluster(
|
||||
bundle_id: Annotated[str, Field(description="Bundle slug of the source topic.")],
|
||||
page_id: Annotated[str, Field(description="Page id (GUID-XXXX) of the source topic.")],
|
||||
) -> str:
|
||||
"""List cross-version peers of a topic in the HVM docs.
|
||||
|
||||
# @mcp.tool() # Phase 11
|
||||
# def myproduct_api_lessons(topic: str | None = None) -> str: ...
|
||||
HPE re-mints the docId per product version but keeps page GUIDs stable,
|
||||
so the scrape pipeline synthesizes `topic_cluster.clustered_topics`
|
||||
from same-GUID overlap (374/376/376 pages overlap across 8.1.0/.1/.2).
|
||||
"""
|
||||
with TimedCall("list_cluster", {"bundle_id": bundle_id, "page_id": page_id}) as _call:
|
||||
out = _read_page(bundle_id, page_id)
|
||||
if out is None:
|
||||
_call.set(found=False)
|
||||
return f"Not found: {bundle_id}/{page_id}"
|
||||
_, side = out
|
||||
cluster = side.get("topic_cluster") or {}
|
||||
peers = cluster.get("clustered_topics") or []
|
||||
_call.set(hits_returned=len(peers))
|
||||
src_label = cluster.get("clustering_title") or side.get("title") or page_id
|
||||
lines = [f"# Cluster for {bundle_id}/{page_id} ({src_label})", ""]
|
||||
if not peers:
|
||||
lines.append("_No peer topics in cluster._")
|
||||
return "\n".join(lines)
|
||||
for p in peers:
|
||||
lines.append(f"- `{p['bundle_id']}/{p['page_id']}` — {p.get('clustering_title') or ''}")
|
||||
return "\n".join(lines)
|
||||
|
||||
# @mcp.tool() # Phase 12
|
||||
# def find_doc_inconsistencies(scope_query: str, ...) -> str: ...
|
||||
|
||||
# @mcp.tool() # Phase 12
|
||||
# def submit_doc_bug(page_url: str, content: str, email: str | None = None, ...) -> str: ...
|
||||
@mcp.tool()
|
||||
def diff_versions(
|
||||
bundle_id: Annotated[str, Field(description="Bundle slug of the source topic (the 'new' side).")],
|
||||
page_id: Annotated[str, Field(description="Page id of the source topic.")],
|
||||
against_bundle_id: Annotated[str, Field(description="Bundle slug to diff against. Must be in the source's cluster, or share the same page_id.")],
|
||||
context: Annotated[int, Field(description="Lines of context around each hunk.", ge=0, le=10)] = 3,
|
||||
) -> str:
|
||||
"""Unified diff of one topic between two bundles (typically two HVM versions).
|
||||
|
||||
Two matching strategies, tried in order:
|
||||
|
||||
1. `topic_cluster` peer (synthesized from same-GUID overlap by the scraper).
|
||||
2. Same `page_id` fallback (works because GUIDs are stable across HVM versions).
|
||||
"""
|
||||
with TimedCall("diff_versions", {
|
||||
"bundle_id": bundle_id, "page_id": page_id,
|
||||
"against_bundle_id": against_bundle_id, "context": context,
|
||||
}) as _call:
|
||||
src = _read_page(bundle_id, page_id)
|
||||
if src is None:
|
||||
_call.set(matched_via=None, reason="source_not_found")
|
||||
return f"Source not found: {bundle_id}/{page_id}"
|
||||
src_md, side = src
|
||||
cluster = side.get("topic_cluster") or {}
|
||||
peers = {p["bundle_id"]: p for p in (cluster.get("clustered_topics") or [])}
|
||||
|
||||
peer = peers.get(against_bundle_id)
|
||||
if peer is not None:
|
||||
peer_page_id = peer["page_id"]
|
||||
matched_via = "topic_cluster"
|
||||
elif _read_page(against_bundle_id, page_id) is not None:
|
||||
peer_page_id = page_id
|
||||
matched_via = "filename"
|
||||
else:
|
||||
_call.set(matched_via=None, reason="no_peer")
|
||||
valid = list(peers) or ["(no peers)"]
|
||||
return (f"No match for {bundle_id}/{page_id} in {against_bundle_id}.\n"
|
||||
f"- No cluster peer. Available peers: {valid}\n"
|
||||
f"- No page {page_id!r} in {against_bundle_id} either.")
|
||||
|
||||
_call.set(matched_via=matched_via)
|
||||
peer_data = _read_page(against_bundle_id, peer_page_id)
|
||||
if peer_data is None:
|
||||
return f"Peer not found in corpus: {against_bundle_id}/{peer_page_id}"
|
||||
peer_md, _ = peer_data
|
||||
diff = difflib.unified_diff(peer_md.splitlines(keepends=True),
|
||||
src_md.splitlines(keepends=True),
|
||||
fromfile=f"{against_bundle_id}/{peer_page_id}",
|
||||
tofile=f"{bundle_id}/{page_id}",
|
||||
n=context)
|
||||
body = "".join(diff)
|
||||
header = f"_matched via {matched_via}_\n\n"
|
||||
if not body.strip():
|
||||
return header + f"No differences between {bundle_id}/{page_id} and {against_bundle_id}/{peer_page_id}."
|
||||
return header + f"```diff\n{body}```"
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def bundle_changelog(
|
||||
bundle_id_new: Annotated[str, Field(description="New-side bundle slug, e.g. 'hvm_user_manual_8_1_2'.")],
|
||||
bundle_id_old: Annotated[str, Field(description="Old-side bundle slug, e.g. 'hvm_user_manual_8_1_1'.")],
|
||||
min_churn: Annotated[int, Field(description="Min (added + removed) lines to flag a page as changed.", ge=1, le=1000)] = 5,
|
||||
max_changed: Annotated[int, Field(description="Max changed pages to list (sorted by churn desc).", ge=1, le=500)] = 50,
|
||||
) -> str:
|
||||
"""High-level diff between two HVM bundles.
|
||||
|
||||
Lists pages added, removed, and changed between an old bundle and a
|
||||
new one. Match is by page_id (which is the stable GUID — same GUID
|
||||
across versions = same topic). Use after `list_versions` to discover
|
||||
valid bundle slugs.
|
||||
"""
|
||||
with TimedCall("bundle_changelog", {
|
||||
"bundle_id_new": bundle_id_new, "bundle_id_old": bundle_id_old,
|
||||
"min_churn": min_churn, "max_changed": max_changed,
|
||||
}) as _call:
|
||||
new_pages = _bundle_pages(bundle_id_new)
|
||||
old_pages = _bundle_pages(bundle_id_old)
|
||||
if not new_pages and not old_pages:
|
||||
_call.set(reason="both_empty")
|
||||
return f"Neither bundle has pages on disk: {bundle_id_new}, {bundle_id_old}"
|
||||
if not new_pages:
|
||||
return f"Bundle not found or empty: {bundle_id_new}"
|
||||
if not old_pages:
|
||||
return f"Bundle not found or empty: {bundle_id_old}"
|
||||
|
||||
added = sorted(new_pages - old_pages)
|
||||
removed = sorted(old_pages - new_pages)
|
||||
common = sorted(new_pages & old_pages)
|
||||
|
||||
changed: list[tuple[str, int, int]] = []
|
||||
for pid in common:
|
||||
n = _read_page(bundle_id_new, pid)
|
||||
o = _read_page(bundle_id_old, pid)
|
||||
if n is None or o is None:
|
||||
continue
|
||||
a_lines, r_lines = _diff_churn(o[0], n[0])
|
||||
if a_lines + r_lines >= min_churn:
|
||||
changed.append((pid, a_lines, r_lines))
|
||||
changed.sort(key=lambda t: -(t[1] + t[2]))
|
||||
_call.set(added=len(added), removed=len(removed),
|
||||
changed=len(changed), unchanged=len(common) - len(changed))
|
||||
|
||||
lines = [
|
||||
f"# Bundle changelog: {bundle_id_new} vs {bundle_id_old}", "",
|
||||
f"- pages in new: **{len(new_pages)}**",
|
||||
f"- pages in old: **{len(old_pages)}**",
|
||||
f"- common: **{len(common)}**",
|
||||
f"- **added** (in new only): {len(added)}",
|
||||
f"- **removed** (in old only): {len(removed)}",
|
||||
f"- **changed** (≥{min_churn} lines): {len(changed)} of {len(common)} common",
|
||||
f"- unchanged: {len(common) - len(changed)}", "",
|
||||
]
|
||||
if added:
|
||||
lines += [f"## Added pages ({len(added)})", *(f"- `{p}`" for p in added), ""]
|
||||
if removed:
|
||||
lines += [f"## Removed pages ({len(removed)})", *(f"- `{p}`" for p in removed), ""]
|
||||
if changed:
|
||||
shown = changed[:max_changed]
|
||||
lines += [
|
||||
f"## Changed pages — top {len(shown)} of {len(changed)} by churn", "",
|
||||
"| page | +lines | -lines | total |", "|---|---|---|---|",
|
||||
]
|
||||
for p, a, r in shown:
|
||||
lines.append(f"| `{p}` | +{a} | -{r} | {a + r} |")
|
||||
if len(changed) > max_changed:
|
||||
lines.append(f"\n_({len(changed) - max_changed} more changed pages omitted; raise `max_changed` to see them.)_")
|
||||
lines.append("\nInspect a specific page: `diff_versions(bundle_id_new, page_id, bundle_id_old)`.")
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
# ===========================================================================
|
||||
# Phase 13 — weekly digest from corpus/.digest/history.jsonl (built in CI)
|
||||
# ===========================================================================
|
||||
|
||||
_digest_cache: list[dict] | None = None
|
||||
|
||||
|
||||
def _digest_history() -> list[dict]:
|
||||
"""Lazy load of the digest history JSONL written by scrape.changelog at CI time."""
|
||||
global _digest_cache
|
||||
if _digest_cache is not None:
|
||||
return _digest_cache
|
||||
if not DIGEST_HISTORY_PATH.exists():
|
||||
log.warning("digest history not found at %s — weekly_digest will return empty.",
|
||||
DIGEST_HISTORY_PATH)
|
||||
_digest_cache = []
|
||||
return _digest_cache
|
||||
records: list[dict] = []
|
||||
try:
|
||||
with open(DIGEST_HISTORY_PATH) as fh:
|
||||
for ln, line in enumerate(fh, start=1):
|
||||
line = line.strip()
|
||||
if not line:
|
||||
continue
|
||||
try:
|
||||
records.append(json.loads(line))
|
||||
except json.JSONDecodeError as e:
|
||||
log.warning("digest history: skipping malformed line %d: %s", ln, e)
|
||||
except OSError as e:
|
||||
log.warning("digest history read failed: %s", e)
|
||||
_digest_cache = records
|
||||
return _digest_cache
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def weekly_digest(
|
||||
days: Annotated[int, Field(description="How far back to summarize. 7=last week, 30=last month. Horizon ~120 days.", ge=1, le=120)] = 7,
|
||||
version: Annotated[str | None, Field(description="OPTIONAL version filter, e.g. '8.1.2'.")] = None,
|
||||
platform: Annotated[str | None, Field(description="OPTIONAL platform filter (HVM bundles don't set platform — leave None).")] = None,
|
||||
max_bundles: Annotated[int, Field(description="Cap on per-bundle detail blocks.", ge=1, le=100)] = 25,
|
||||
max_pages_per_bundle: Annotated[int, Field(description="Pages to list per bundle.", ge=1, le=50)] = 10,
|
||||
) -> str:
|
||||
"""Summarize what changed in the HVM docs over the past N days.
|
||||
|
||||
Call when the user asks *"what's new in HVM docs this week?"*,
|
||||
*"what changed in 8.1.2?"*, or *"is there anything new since the
|
||||
last release?"*. Reads the pre-baked digest history JSONL written
|
||||
by CI from git log over corpus-touching commits.
|
||||
"""
|
||||
with TimedCall("weekly_digest", {
|
||||
"days": days, "version": version, "platform": platform,
|
||||
"max_bundles": max_bundles, "max_pages_per_bundle": max_pages_per_bundle,
|
||||
}) as _call:
|
||||
records = _digest_history()
|
||||
if not records:
|
||||
_call.set(returned="empty_no_history", record_count=0)
|
||||
return ("# Weekly digest\n\n"
|
||||
f"_No digest history on this image. `{DIGEST_HISTORY_PATH}` is "
|
||||
"missing — it's populated by the weekly refresh workflow._")
|
||||
|
||||
now = _dt.datetime.now(_dt.timezone.utc)
|
||||
cutoff = now - _dt.timedelta(days=days)
|
||||
filtered: list[dict] = []
|
||||
for r in records:
|
||||
try:
|
||||
ts = _dt.datetime.fromisoformat(r["timestamp"])
|
||||
except (KeyError, ValueError):
|
||||
continue
|
||||
if ts.tzinfo is None:
|
||||
ts = ts.replace(tzinfo=_dt.timezone.utc)
|
||||
if ts >= cutoff:
|
||||
filtered.append({**r, "_ts": ts})
|
||||
|
||||
if not filtered:
|
||||
_call.set(returned="empty_window", record_count=0)
|
||||
covers = ""
|
||||
if records:
|
||||
oldest = min(records, key=lambda r: r.get("timestamp", ""))
|
||||
newest = max(records, key=lambda r: r.get("timestamp", ""))
|
||||
covers = (f"\n\n_(History on this image covers "
|
||||
f"{oldest.get('timestamp','?')[:10]} through "
|
||||
f"{newest.get('timestamp','?')[:10]}.)_")
|
||||
return (f"# Weekly digest — last {days} day{'s' if days != 1 else ''}\n\n"
|
||||
f"_No corpus changes recorded in this window._" + covers)
|
||||
|
||||
cat = _bundles()
|
||||
def _passes(bid: str) -> bool:
|
||||
if not (version or platform):
|
||||
return True
|
||||
b = cat.get(bid)
|
||||
if b is None:
|
||||
return False
|
||||
if version and b.get("version") != version:
|
||||
return False
|
||||
if platform and b.get("platform") != platform:
|
||||
return False
|
||||
return True
|
||||
|
||||
filtered.sort(key=lambda r: r["_ts"], reverse=True)
|
||||
per_bundle_pages: dict[str, list[str]] = {}
|
||||
new_bundles_set: set[str] = set()
|
||||
drift_bundles_set: set[str] = set()
|
||||
commits_in_window = 0
|
||||
for r in filtered:
|
||||
commits_in_window += 1
|
||||
for bid in r.get("new_bundles", []):
|
||||
if _passes(bid):
|
||||
new_bundles_set.add(bid)
|
||||
for bid in r.get("json_only_bundles", []):
|
||||
if _passes(bid):
|
||||
drift_bundles_set.add(bid)
|
||||
for bid, pages in (r.get("content_bundles") or {}).items():
|
||||
if not _passes(bid):
|
||||
continue
|
||||
seen = set(per_bundle_pages.get(bid, []))
|
||||
fresh = [p for p in pages if p not in seen]
|
||||
if fresh:
|
||||
per_bundle_pages.setdefault(bid, []).extend(fresh)
|
||||
|
||||
total_md = sum(len(p) for p in per_bundle_pages.values())
|
||||
bundles_ranked = sorted(per_bundle_pages.items(), key=lambda kv: (-len(kv[1]), kv[0]))
|
||||
_call.set(returned="ok", record_count=commits_in_window,
|
||||
bundles_changed=len(per_bundle_pages),
|
||||
new_bundles=len(new_bundles_set))
|
||||
|
||||
ts_oldest = filtered[-1]["_ts"].date().isoformat()
|
||||
ts_newest = filtered[0]["_ts"].date().isoformat()
|
||||
lines = [
|
||||
f"# HVM docs digest — last {days} day{'s' if days != 1 else ''}", "",
|
||||
f"_Window: {ts_oldest} → {ts_newest}_ • _Filters: version={version}, platform={platform}_", "",
|
||||
"## Headline", "",
|
||||
f"- **{total_md}** page change(s) across **{len(per_bundle_pages)}** bundle(s)",
|
||||
f"- **{commits_in_window}** corpus-touching commit(s) in this window",
|
||||
f"- **{len(new_bundles_set)}** bundle(s) newly added",
|
||||
f"- **{len(drift_bundles_set)}** bundle(s) with sidecar-only drift", "",
|
||||
]
|
||||
if not per_bundle_pages and not new_bundles_set:
|
||||
lines.append(f"_No bundle changes matched the filter in this window._")
|
||||
return "\n".join(lines)
|
||||
if new_bundles_set:
|
||||
lines += ["## New bundles added", ""]
|
||||
for bid in sorted(new_bundles_set):
|
||||
b = cat.get(bid, {})
|
||||
t = b.get("title") or ""
|
||||
tag = f" *({b.get('version') or '?'})*" if b.get("version") else ""
|
||||
lines.append(f"- `{bid}`{tag} {t}")
|
||||
lines.append("")
|
||||
if bundles_ranked:
|
||||
top = bundles_ranked[:max_bundles]
|
||||
remainder = len(bundles_ranked) - len(top)
|
||||
lines += [f"## Bundles with content changes — top {len(top)}" +
|
||||
(f" of {len(bundles_ranked)}" if remainder else ""), ""]
|
||||
for bid, pages in top:
|
||||
b = cat.get(bid, {})
|
||||
tag = f" *({b.get('version') or ''})*" if b.get("version") else ""
|
||||
lines.append(f"### `{bid}`{tag}")
|
||||
if b.get("title"):
|
||||
lines.append(f"_{b['title']}_")
|
||||
lines.append(f"{len(pages)} page change(s).")
|
||||
for p in pages[:max_pages_per_bundle]:
|
||||
lines.append(f"- `{p}`")
|
||||
if len(pages) > max_pages_per_bundle:
|
||||
lines.append(f" _(+{len(pages) - max_pages_per_bundle} more)_")
|
||||
lines.append("")
|
||||
lines.append("\nInspect a specific page: `get_page(bundle_id, page_id)` or `diff_versions(...)`.")
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def corpus_status() -> str:
|
||||
"""Freshness + size of the knowledge base.
|
||||
|
||||
Combines: (1) image build time (bundles.json mtime in container),
|
||||
(2) most-recent upstream Published date across bundles, (3) total
|
||||
bundles / pages / Chroma chunks.
|
||||
"""
|
||||
lines: list[str] = ["# Corpus status", ""]
|
||||
try:
|
||||
ts = _dt.datetime.fromtimestamp(BUNDLES_JSON.stat().st_mtime, tz=_dt.timezone.utc).isoformat(timespec="seconds")
|
||||
lines.append(f"- This image built at: **{ts}**")
|
||||
except OSError:
|
||||
lines.append("- This image build time: _unknown_")
|
||||
|
||||
cat = _bundles()
|
||||
latest_pub: str | None = None
|
||||
per_bundle: list[tuple[str, str]] = []
|
||||
for slug, b in cat.items():
|
||||
pub = (b.get("dates") or {}).get("Published")
|
||||
if pub:
|
||||
if latest_pub is None or pub > latest_pub:
|
||||
latest_pub = pub
|
||||
per_bundle.append((slug, pub))
|
||||
if latest_pub:
|
||||
lines.append(f"- Most-recent upstream Published date (any bundle): **{latest_pub}**")
|
||||
lines.append("")
|
||||
try:
|
||||
chunk_count = _collection().count()
|
||||
except Exception:
|
||||
chunk_count = -1
|
||||
pages_count = sum(1 for d in (CORPUS.iterdir() if CORPUS.exists() else [])
|
||||
if d.is_dir() for _ in d.glob("*.md"))
|
||||
lines += [
|
||||
f"- Bundles indexed: **{len(cat)}**",
|
||||
f"- Pages in corpus: **{pages_count}**",
|
||||
f"- Chunks in Chroma: **{chunk_count}**" if chunk_count >= 0 else "- Chunks in Chroma: _(query failed)_",
|
||||
"",
|
||||
]
|
||||
if per_bundle:
|
||||
per_bundle.sort(key=lambda kv: kv[1], reverse=True)
|
||||
lines.append("## Most-recently-edited bundles (by HPE)")
|
||||
for slug, when in per_bundle[:5]:
|
||||
b = cat.get(slug, {})
|
||||
lines.append(f"- `{slug}` — {b.get('title') or slug} (published {when})")
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
# ===========================================================================
|
||||
# Phase 11 — curated knowledge: hvm_api_lessons
|
||||
# ===========================================================================
|
||||
|
||||
def _split_lessons_sections(md: str) -> list[tuple[str, str]]:
|
||||
sections: list[tuple[str, str]] = []
|
||||
current_title: str | None = None
|
||||
current_lines: list[str] = []
|
||||
for line in md.splitlines(keepends=True):
|
||||
m = re.match(r"^##\s+(.+?)\s*$", line)
|
||||
if m:
|
||||
if current_lines:
|
||||
sections.append((current_title or "(prelude)", "".join(current_lines)))
|
||||
current_title = m.group(1).strip()
|
||||
current_lines = [line]
|
||||
else:
|
||||
current_lines.append(line)
|
||||
if current_lines:
|
||||
sections.append((current_title or "(prelude)", "".join(current_lines)))
|
||||
return sections
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def hvm_api_lessons(
|
||||
topic: Annotated[str | None, Field(description="Optional keyword filter — returns only H2 sections whose heading or body contains this substring. Examples: 'manager', 'agent upgrade', 'plugin api', 'worker', 'console keyboard'. Omit for the full doc.")] = None,
|
||||
) -> str:
|
||||
"""Curated lessons about HPE Morpheus VM Essentials — non-obvious bits
|
||||
that aren't in the official docs and gotchas learned from real
|
||||
integration / operation work.
|
||||
|
||||
**Call this proactively whenever the user asks you to:**
|
||||
- install, upgrade, or troubleshoot an HVM cluster or manager
|
||||
- integrate with HVM (REST API, automation, scripting)
|
||||
- upgrade across versions (8.1.0 → 8.1.1 → 8.1.2)
|
||||
- work with HVM Host agents
|
||||
- configure backups, networking, or storage
|
||||
- elevate to HPE Morpheus Enterprise
|
||||
|
||||
With ``topic=...`` you'll get just the relevant H2 section(s). With
|
||||
no argument you'll get the full doc — usually the right call when
|
||||
starting on a new task since the TL;DR at the top primes the rest.
|
||||
"""
|
||||
with TimedCall("hvm_api_lessons", {"topic": topic}) as _call:
|
||||
try:
|
||||
md = API_LESSONS_MD.read_text()
|
||||
except OSError as e:
|
||||
_call.set(error=str(e))
|
||||
return f"Lessons doc not present at {API_LESSONS_MD}: {e}"
|
||||
if not topic:
|
||||
_call.set(returned="full")
|
||||
return md
|
||||
needle = topic.lower()
|
||||
sections = _split_lessons_sections(md)
|
||||
kept: list[str] = []
|
||||
for title, body in sections:
|
||||
if needle in title.lower() or needle in body.lower():
|
||||
kept.append(body)
|
||||
if not kept:
|
||||
_call.set(returned="empty", topic_matched=False)
|
||||
return (f"_No sections matched topic={topic!r}. Returning the full document._\n\n" + md)
|
||||
_call.set(returned="filtered", sections_matched=len(kept))
|
||||
return f"_Filtered to {len(kept)} section(s) matching topic={topic!r}._\n\n" + "".join(kept)
|
||||
|
||||
|
||||
# ===========================================================================
|
||||
# Phase 12 — find_doc_inconsistencies + submit_doc_bug
|
||||
# ===========================================================================
|
||||
|
||||
_REDIRECT_PHRASE_RE = re.compile(
|
||||
r"\bsee\s+(?:the\s+)?[A-Z`\[][^.!?\n]{2,80}(?:for|topic|section|chapter|guide)\b",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
_VERSION_SUFFIX_RE = re.compile(r"_(\d+_\d+_\d+)$")
|
||||
|
||||
|
||||
def _bundle_family(bundle_id: str) -> str:
|
||||
"""Strip a trailing `_X_Y_Z` version suffix from an HVM bundle slug.
|
||||
|
||||
`hvm_user_manual_8_1_0` → `hvm_user_manual`
|
||||
`hvm_deployment_guide` → `hvm_deployment_guide` (no version)
|
||||
|
||||
Same-family bundles are version peers; cross-family pairs (User Manual
|
||||
vs Release Notes) are intentionally different content.
|
||||
"""
|
||||
return _VERSION_SUFFIX_RE.sub("", bundle_id)
|
||||
|
||||
|
||||
def _check_cross_version_drift(bundle_id: str, page_id: str, md: str, meta: dict) -> dict | None:
|
||||
cluster = (meta.get("topic_cluster") or {}).get("clustered_topics") or []
|
||||
if not cluster:
|
||||
return None
|
||||
src_family = _bundle_family(bundle_id)
|
||||
src_lines = max(1, len(md.splitlines()))
|
||||
in_band: list[tuple[int, str, str, int]] = []
|
||||
out_band: list[tuple[int, str, str, int]] = []
|
||||
for peer in cluster:
|
||||
peer_bid = peer.get("bundle_id")
|
||||
peer_pid = peer.get("page_id")
|
||||
if not (peer_bid and peer_pid) or peer_bid == bundle_id:
|
||||
continue
|
||||
if _bundle_family(peer_bid) != src_family:
|
||||
continue
|
||||
peer_data = _read_page(peer_bid, peer_pid)
|
||||
if peer_data is None:
|
||||
continue
|
||||
peer_md, _ = peer_data
|
||||
added, removed = _diff_churn(md, peer_md)
|
||||
churn = added + removed
|
||||
peer_lines = max(1, len(peer_md.splitlines()))
|
||||
denom = max(src_lines, peer_lines)
|
||||
pct = (churn * 100) // denom if denom else 0
|
||||
tup = (churn, peer_bid, peer_pid, peer_lines)
|
||||
if 10 <= pct <= 60:
|
||||
in_band.append(tup)
|
||||
elif churn >= 5:
|
||||
out_band.append(tup)
|
||||
if in_band:
|
||||
chosen = min(in_band, key=lambda t: t[0])
|
||||
confidence = "high"
|
||||
elif out_band:
|
||||
chosen = min(out_band, key=lambda t: t[0])
|
||||
confidence = "low"
|
||||
else:
|
||||
return None
|
||||
churn, peer_bid, peer_pid, peer_lines = chosen
|
||||
denom = max(src_lines, peer_lines)
|
||||
churn_pct = min(100, (churn * 100) // denom) if denom else 0
|
||||
return {
|
||||
"check": "cross_version_drift",
|
||||
"bundle_id": bundle_id, "page_id": page_id,
|
||||
"page_url": _source_url(bundle_id, page_id),
|
||||
"peer_bundle_id": peer_bid, "peer_page_id": peer_pid,
|
||||
"churn_lines": churn, "churn_pct_of_file": churn_pct,
|
||||
"confidence": confidence,
|
||||
"summary": (f"Drifts {churn} lines (~{churn_pct}% of file) vs peer "
|
||||
f"{peer_bid}/{peer_pid}. Inspect with "
|
||||
f"diff_versions({bundle_id!r}, {page_id!r}, {peer_bid!r})."),
|
||||
}
|
||||
|
||||
|
||||
def _check_redirect_chain(bundle_id: str, page_id: str, md: str, meta: dict) -> dict | None:
|
||||
body = re.sub(r"^#[^\n]*\n", "", md, count=1).strip()
|
||||
if "```" in body:
|
||||
return None
|
||||
text_only = re.sub(r"[`\[\]()*_>#-]", "", body)
|
||||
text_only = re.sub(r"\s+", " ", text_only).strip()
|
||||
if len(text_only) > 600:
|
||||
return None
|
||||
redirect_matches = list(_REDIRECT_PHRASE_RE.finditer(body))
|
||||
if not redirect_matches:
|
||||
return None
|
||||
evidence = redirect_matches[0].group(0).strip()
|
||||
return {
|
||||
"check": "redirect_chain",
|
||||
"bundle_id": bundle_id, "page_id": page_id,
|
||||
"page_url": _source_url(bundle_id, page_id),
|
||||
"body_chars": len(text_only),
|
||||
"redirect_phrase": evidence[:200],
|
||||
"confidence": "medium",
|
||||
"summary": (f"Page is {len(text_only)} chars of body text with a "
|
||||
f'"see ... for ..." redirect: "{evidence[:120]}". '
|
||||
"Inspect with get_page to confirm."),
|
||||
}
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def find_doc_inconsistencies(
|
||||
scope_query: Annotated[str, Field(description="Natural-language scope describing what slice to scan. Used as a search to pick candidate pages. Examples: 'backup configuration', 'HVM cluster setup', 'VME manager installation'.")],
|
||||
version: Annotated[str | None, Field(description="OPTIONAL version filter — e.g. '8.1.2'.")] = None,
|
||||
platform: Annotated[str | None, Field(description="OPTIONAL platform filter (HVM bundles don't set platform — usually leave None).")] = None,
|
||||
bundle_id: Annotated[str | None, Field(description="OPTIONAL specific bundle slug to restrict scanning to.")] = None,
|
||||
max_pages: Annotated[int, Field(description="How many candidate pages to inspect.", ge=5, le=200)] = 30,
|
||||
checks: Annotated[list[str] | None, Field(description="Which checks to run. Available: 'cross_version_drift', 'redirect_chain'. Defaults to all.")] = None,
|
||||
) -> str:
|
||||
"""Scan a scoped set of HVM docs pages for likely documentation bugs.
|
||||
|
||||
Surfaces concrete candidates for human review — NOT a stream of
|
||||
bugs to auto-submit. Workflow:
|
||||
|
||||
1. Run this against a focused scope.
|
||||
2. Review each finding; many will be false positives.
|
||||
3. For real bugs, drill in with `get_page` / `diff_versions`.
|
||||
4. Draft a bug report; show the operator; ask explicitly.
|
||||
5. Only then call `submit_doc_bug`. One bug = one confirmation.
|
||||
|
||||
**Do NOT loop submissions.** Even on "submit them all", confirm each
|
||||
one individually. HPE's docs queue is a shared resource.
|
||||
"""
|
||||
with TimedCall("find_doc_inconsistencies", {
|
||||
"scope_query": scope_query, "version": version, "platform": platform,
|
||||
"bundle_id": bundle_id, "max_pages": max_pages, "checks": checks,
|
||||
}) as _call:
|
||||
all_checks = {"cross_version_drift", "redirect_chain"}
|
||||
requested = all_checks if checks is None else {c for c in checks if c in all_checks}
|
||||
if not requested:
|
||||
_call.set(error="no_valid_checks")
|
||||
return f"No valid checks requested. Available: {sorted(all_checks)}."
|
||||
try:
|
||||
col = _collection()
|
||||
except Exception as e:
|
||||
_call.set(error=f"collection: {e}")
|
||||
return f"Couldn't open Chroma collection: {e}"
|
||||
where = _build_where(version, platform, bundle_id)
|
||||
try:
|
||||
res = col.query(query_texts=[scope_query], n_results=max_pages * 3,
|
||||
where=where, include=["metadatas"])
|
||||
except Exception as e:
|
||||
_call.set(error=f"query: {e}")
|
||||
return f"Scope query failed: {e}"
|
||||
seen: set[tuple[str, str]] = set()
|
||||
candidates: list[tuple[str, str]] = []
|
||||
for meta in (res.get("metadatas") or [[]])[0]:
|
||||
key = (meta.get("bundle_id") or "", meta.get("page_id") or "")
|
||||
if not key[0] or not key[1] or key in seen:
|
||||
continue
|
||||
seen.add(key)
|
||||
candidates.append(key)
|
||||
if len(candidates) >= max_pages:
|
||||
break
|
||||
_call.set(pages_inspected=len(candidates), checks=sorted(requested))
|
||||
if not candidates:
|
||||
return f"No pages matched scope `{scope_query}`."
|
||||
findings: dict[str, list[dict]] = {c: [] for c in requested}
|
||||
for bid, pid in candidates:
|
||||
data = _read_page(bid, pid)
|
||||
if data is None:
|
||||
continue
|
||||
md, meta = data
|
||||
if "cross_version_drift" in requested:
|
||||
f = _check_cross_version_drift(bid, pid, md, meta)
|
||||
if f:
|
||||
findings["cross_version_drift"].append(f)
|
||||
if "redirect_chain" in requested:
|
||||
f = _check_redirect_chain(bid, pid, md, meta)
|
||||
if f:
|
||||
findings["redirect_chain"].append(f)
|
||||
findings["cross_version_drift"] = sorted(
|
||||
findings.get("cross_version_drift", []),
|
||||
key=lambda f: (-(1 if f["confidence"] == "high" else 0), -f["churn_lines"]))
|
||||
findings["redirect_chain"] = sorted(
|
||||
findings.get("redirect_chain", []), key=lambda f: f["body_chars"])
|
||||
total = sum(len(v) for v in findings.values())
|
||||
_call.set(findings_total=total,
|
||||
findings_by_check={k: len(v) for k, v in findings.items()})
|
||||
lines = [
|
||||
f"# Doc inconsistency scan — {len(candidates)} pages inspected", "",
|
||||
f"_Scope_: `{scope_query}` • _Filters_: version={version}, platform={platform}, bundle_id={bundle_id} • _Checks_: {sorted(requested)}", "",
|
||||
f"**{total} candidate finding{'' if total == 1 else 's'}.** Review each individually. "
|
||||
"For real bugs, follow up with `get_page` / `diff_versions`, draft the report, "
|
||||
"show the operator, and only call `submit_doc_bug` after explicit confirmation.", "",
|
||||
]
|
||||
if not total:
|
||||
lines.append("_No findings in this scope._")
|
||||
return "\n".join(lines)
|
||||
for check in sorted(requested):
|
||||
items = findings.get(check, [])
|
||||
lines += [f"## {check} ({len(items)})", ""]
|
||||
if not items:
|
||||
lines.append("_No findings for this check._\n")
|
||||
continue
|
||||
for i, f in enumerate(items, 1):
|
||||
lines.append(f"### {i}. `{f['bundle_id']}/{f['page_id']}` *({f['confidence']} confidence)*")
|
||||
lines.append(f"- URL: {f['page_url']}")
|
||||
lines.append(f"- {f['summary']}")
|
||||
if check == "cross_version_drift":
|
||||
lines.append(f"- Peer: `{f['peer_bundle_id']}/{f['peer_page_id']}` • churn: {f['churn_lines']} lines ({f['churn_pct_of_file']}% of file)")
|
||||
elif check == "redirect_chain":
|
||||
lines.append(f"- Body length: {f['body_chars']} chars • Phrase: *\"{f['redirect_phrase']}\"*")
|
||||
lines.append("")
|
||||
lines += ["---",
|
||||
"_Reminder: `submit_doc_bug` has a real side effect. Draft → show → confirm → submit, one at a time. Do not loop._"]
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
# --- submit_doc_bug ----------------------------------------------------------
|
||||
# HPE Support DocPortal's "Was this helpful?" widget POSTs to an endpoint
|
||||
# we haven't sniffed yet. Until DOC_BUG_API_URL is set AND
|
||||
# DOC_BUG_SUBMIT_ENABLED=true, this tool refuses submission and tells the
|
||||
# operator to paste manually. When you sniff the endpoint, set both env
|
||||
# vars and verify the payload shape against the schema below.
|
||||
|
||||
_DOC_BUG_ALLOWED_HOSTS = {"support.hpe.com"}
|
||||
_EMAIL_RE = re.compile(r"^[^@\s]+@[^@\s]+\.[^@\s]+$")
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def submit_doc_bug(
|
||||
page_url: Annotated[str, Field(description="Full URL of the support.hpe.com page the bug is about. Must be a support.hpe.com URL.")],
|
||||
content: Annotated[str, Field(description="Body of the bug report. Be specific: what the page says, what's wrong, what it should say. Cite exact passages. The docs team reads it verbatim.")],
|
||||
email: Annotated[str | None, Field(description="OPTIONAL submitter email for follow-up. Omit if anonymous.")] = None,
|
||||
rating: Annotated[int | None, Field(description="OPTIONAL star rating 1-5 (1-2 for serious bugs, 3 unclear, 4-5 only on explicit request).")] = None,
|
||||
like: Annotated[bool | None, Field(description="OPTIONAL thumbs-up/down. False for bugs, True for positive feedback.")] = None,
|
||||
) -> str:
|
||||
"""Submit a documentation bug to HPE's docs feedback channel.
|
||||
|
||||
**⚠️ THIS TOOL HAS A REAL SIDE EFFECT (when enabled). It POSTs to
|
||||
HPE's docs feedback endpoint and the submission lands in their queue.**
|
||||
|
||||
**MANDATORY operator-confirmation workflow:**
|
||||
|
||||
1. Draft the bug content yourself. Show the operator the exact text
|
||||
you intend to submit + the page URL + any rating/email fields.
|
||||
2. Ask explicitly: *"Submit this bug? (yes/no)"*
|
||||
3. Only call submit_doc_bug AFTER they answer yes.
|
||||
4. If they say *"submit them all"*, STILL confirm each one. This
|
||||
tool MUST NOT be called in a loop without per-bug consent.
|
||||
|
||||
**Do not call this autonomously.** Don't preemptively submit while
|
||||
exploring inconsistencies. Don't call inside an agent loop without
|
||||
a human in the loop. Misuse will get this MCP blocked at HPE's WAF.
|
||||
|
||||
**What makes a good bug report:**
|
||||
- Specific page URL. One bug = one page.
|
||||
- Concrete quote of the problem text + version/platform context.
|
||||
- Suggested correction when you have one.
|
||||
- Avoid editorializing — factual bugs and broken links best.
|
||||
"""
|
||||
with TimedCall("submit_doc_bug", {
|
||||
"page_url": page_url, "content_len": len(content or ""),
|
||||
"email_present": bool(email), "rating": rating, "like": like,
|
||||
}) as _call:
|
||||
if not DOC_BUG_SUBMIT_ENABLED:
|
||||
_call.set(error="disabled", outcome="refused_disabled")
|
||||
return (
|
||||
"submit_doc_bug is disabled on this MCP deployment "
|
||||
"(DOC_BUG_SUBMIT_ENABLED is not set). The operator's draft is good — "
|
||||
f"they can paste it into the feedback widget on {page_url} themselves.\n\n"
|
||||
"_(For maintainers: sniff HPE's feedback endpoint, set DOC_BUG_API_URL "
|
||||
"to the POST target, and DOC_BUG_SUBMIT_ENABLED=true to activate.)_"
|
||||
)
|
||||
if not DOC_BUG_API_URL:
|
||||
_call.set(error="no_endpoint", outcome="refused_disabled")
|
||||
return ("submit_doc_bug is enabled but DOC_BUG_API_URL is empty. "
|
||||
f"Operator should paste manually at {page_url}.")
|
||||
if not content or not content.strip():
|
||||
_call.set(error="empty_content", outcome="refused_invalid")
|
||||
return "Refused: empty `content`."
|
||||
if len(content) > 10000:
|
||||
_call.set(error="content_too_long", outcome="refused_invalid")
|
||||
return f"Refused: `content` is {len(content)} chars (cap 10000)."
|
||||
try:
|
||||
from urllib.parse import urlparse
|
||||
parsed = urlparse(page_url)
|
||||
except Exception as e:
|
||||
_call.set(error=f"url_parse: {e}", outcome="refused_invalid")
|
||||
return f"Refused: couldn't parse page_url ({e})."
|
||||
if parsed.scheme not in ("http", "https"):
|
||||
_call.set(error="bad_scheme", outcome="refused_invalid")
|
||||
return f"Refused: scheme must be http(s), got {parsed.scheme!r}."
|
||||
if parsed.hostname not in _DOC_BUG_ALLOWED_HOSTS:
|
||||
_call.set(error=f"bad_host: {parsed.hostname}", outcome="refused_invalid")
|
||||
return (f"Refused: page_url host {parsed.hostname!r} isn't a "
|
||||
f"support.hpe.com URL. submit_doc_bug only accepts bugs against HPE Support pages.")
|
||||
if email is not None and not _EMAIL_RE.match(email):
|
||||
_call.set(error="bad_email", outcome="refused_invalid")
|
||||
return f"Refused: email {email!r} doesn't look valid. Omit if anonymous."
|
||||
if rating is not None and not (1 <= rating <= 5):
|
||||
_call.set(error="bad_rating", outcome="refused_invalid")
|
||||
return f"Refused: rating must be 1-5, got {rating}."
|
||||
|
||||
href = f"{parsed.scheme}://{parsed.hostname}{parsed.path}{('?' + parsed.query) if parsed.query else ''}"
|
||||
payload: dict = {"content": content, "href": href}
|
||||
if email:
|
||||
payload["email"] = email
|
||||
if rating is not None:
|
||||
payload["rating"] = rating
|
||||
if like is not None:
|
||||
payload["like"] = like
|
||||
|
||||
try:
|
||||
import httpx
|
||||
except ImportError:
|
||||
_call.set(error="httpx_missing", outcome="refused_runtime")
|
||||
return "Refused: httpx not available."
|
||||
|
||||
headers = {
|
||||
"Content-Type": "application/json",
|
||||
"Accept": "application/json",
|
||||
"User-Agent": "hvm-docs-mcp submit_doc_bug",
|
||||
"Origin": "https://support.hpe.com",
|
||||
"Referer": href,
|
||||
}
|
||||
try:
|
||||
with httpx.Client(timeout=DOC_BUG_TIMEOUT) as c:
|
||||
r = c.post(DOC_BUG_API_URL, json=payload, headers=headers)
|
||||
except httpx.RequestError as e:
|
||||
_call.set(error=f"transport: {e}", outcome="failed_transport")
|
||||
return f"Submission failed (transport): {e}"
|
||||
|
||||
comment_id: object = None
|
||||
body_summary = ""
|
||||
try:
|
||||
resp_json = r.json()
|
||||
comment_id = resp_json.get("commentId") or resp_json.get("id")
|
||||
body_summary = json.dumps(resp_json)[:300]
|
||||
except (ValueError, json.JSONDecodeError):
|
||||
body_summary = (r.text or "")[:300]
|
||||
_call.set(http_status=r.status_code, comment_id=comment_id,
|
||||
outcome=("submitted" if r.is_success else "rejected_upstream"))
|
||||
if r.is_success:
|
||||
id_note = f" (commentId={comment_id})" if comment_id else ""
|
||||
return f"Submitted. HTTP {r.status_code}{id_note}. HPE docs team will see this for {href}."
|
||||
if r.status_code in (401, 403, 429):
|
||||
return (f"Submission rejected upstream (HTTP {r.status_code}). "
|
||||
"Likely captcha/auth/rate-limit on anonymous POSTs. "
|
||||
f"Operator can paste manually at {href}.\n\nResponse (truncated): {body_summary}")
|
||||
return f"Submission rejected upstream (HTTP {r.status_code}). Response (truncated): {body_summary}"
|
||||
|
||||
|
||||
# ===========================================================================
|
||||
|
||||
Reference in New Issue
Block a user