Strip submit_doc_bug tool and gate (Zerto-specific, not applicable to label MCP)

This commit is contained in:
2026-05-23 17:51:56 -04:00
parent 43728320bf
commit 3ca96a3716
5 changed files with 10 additions and 71 deletions
+1 -17
View File
@@ -30,7 +30,7 @@ which phase they're on by inspecting:
| `RERANK_URL` env unset in compose | Phase 6 not done |
| `HYBRID_SEARCH` env unset, no `rag/bm25.py` content | Phase 8 not done |
| No `eval/results/` directory | Phase 7 not done |
| `find_doc_inconsistencies` / `submit_doc_bug` are commented-out stubs in `docs_mcp/server.py` | Phase 12 |
| `find_doc_inconsistencies` is a commented-out stub in `docs_mcp/server.py` | Phase 12 |
| No `corpus/.digest/` produced by CI | Phase 13 |
When in doubt, ask the user: *"Which phase from PLAN.md are we
@@ -90,21 +90,6 @@ decide whether to call the tool. Treat it like a button label.
*"Use when..."*, *"Call proactively whenever..."* phrasings work
well. Don't bury the headline in implementation notes.
### Side-effecting tools must be env-gated AND operator-confirmed
Any tool that POSTs to an external service (submit_doc_bug being the
canonical example):
1. Must check an env flag at call time and return a "disabled,
manual fallback at <URL>" message if unset.
2. Must have a loud docstring requiring per-call operator
confirmation in the LLM conversation flow (the LLM drafts, shows
the operator the exact payload, asks yes/no, only then calls).
3. Must do upfront validation (URL allowlist, content length, etc.)
so the LLM gets a clean error instead of a wire-level failure.
Match the `submit_doc_bug` patterns documented in PLAN.md Phase 12.
### Defensive fallback for retrieval components
The reranker, BM25 index, and any external dependency must fail
@@ -231,7 +216,6 @@ python -m scrape.changelog --history-out corpus/.digest/history.jsonl --history-
| "Add a reranker" | Read PLAN.md Phase 6. Stand up the llama.cpp sidecar, implement `_rerank()`. Verify with the eval harness. |
| "Search is missing X queries" | Run the eval harness first to confirm the failure. Then consider: rich chunk-0 rewrites, hybrid retrieval, curated knowledge layer. Don't just tune cosine. |
| "Let's add hybrid search" | Read PLAN.md Phase 8. Only after you've established the failure mode with eval queries — hybrid is not free. |
| "Make a tool that submits doc bugs" | Read PLAN.md Phase 12. Find the docs portal's feedback endpoint by sniffing. Build with operator confirmation as a hard requirement in the tool docstring. |
| "I want a 'what changed' tool" | Read PLAN.md Phase 13. Don't try to do this at runtime — pre-bake the history JSONL at CI time. |
## Out-of-scope concerns (don't try to solve here)
+5 -37
View File
@@ -7,8 +7,7 @@ product-specific has been factored out.
The end product is a streamable-HTTP MCP server with ~15 tools that
any LLM client (Claude Desktop, Claude Code, Cursor, Copilot) can
call to answer questions against the docs, surface what changed
recently, find inconsistencies, and (optionally) submit doc bugs
back upstream.
recently, and flag likely inconsistencies.
---
@@ -27,7 +26,7 @@ upstream docs portal
│ ──► bm25/ (FTS5 lexical index)
MCP server ──► search_docs / get_page / diff_versions / weekly_digest /
find_doc_inconsistencies / submit_doc_bug / ...
find_doc_inconsistencies / ...
reverse proxy / Cloudflare Tunnel ──► public endpoint
@@ -440,10 +439,10 @@ The "RAG can't tell you what isn't in the docs" gap. Surfaces:
suspenders for queries where the LLM doesn't think to call it
proactively.
### Phase 12 — Doc-bug workflow tools *(1 day, optional)*
### Phase 12 — Doc-inconsistency tool *(half a day, optional)*
Two tools that pair up to enable a *"check the docs for
inconsistencies, draft bugs, confirm, submit"* workflow.
A *"scan the corpus for likely doc bugs"* tool the model can call
when an operator asks "is this section reliable?"
- `find_doc_inconsistencies(scope_query, version=None, platform=None,
max_pages=30, checks=None)`: deterministic, read-only. Two checks:
@@ -454,31 +453,6 @@ inconsistencies, draft bugs, confirm, submit"* workflow.
diff (`difflib`) against editor-curated cluster peers; the model
judges which findings are real bugs.
- `submit_doc_bug(page_url, content, email=None, rating=None,
like=None)`: POSTs to the docs portal's feedback endpoint.
Env-gated by `DOC_BUG_SUBMIT_ENABLED=true` so dev/staging
deployments can't accidentally hit the upstream. The tool's
docstring is loud about a mandatory operator-confirmation
workflow per submission — LLM must draft, show, ask, then
submit. Explicit *"do not loop"* instruction. Defensive
validation upfront (URL host matches expected portal, content
non-empty, etc.) so the LLM gets a clean error instead of a
rejected POST.
**You'll need to find the docs portal's feedback endpoint.** Most
portals route the "Was this helpful?" widget through a backend
API; sniff the browser network tab on the live site. The payload
shape varies; common fields: content/body, page url/href, optional
email, optional rating, optional thumbs. Most accept anonymous
POSTs with no captcha at the JSON-API layer (even if the widget
shows a captcha). Validate before you ship — and if the endpoint
has rate limits or captcha enforcement, the tool returns a clean
"submission rejected — paste manually at <url>" fallback.
The whole point is the per-bug operator confirmation in the
LLM-side conversation flow; the tool description enforces it. Do
not bypass.
### Phase 13 — Weekly digest tool *(half a day)*
Goal: a tool that answers *"what changed in the docs in the last N
@@ -524,7 +498,6 @@ shape:
| `weekly_digest` | What changed in the last N days, with filters |
| `corpus_status` | Freshness + size of the knowledge base |
| `find_doc_inconsistencies` | Scoped scan for doc bugs |
| `submit_doc_bug` | Submit a drafted bug (env-gated, operator-confirmed) |
| `<product>_api_lessons` | Curated API gotchas, proactively-called |
| product-specific tools | Interop matrix, lifecycle queries, etc. |
@@ -553,11 +526,6 @@ to figure out yourself — everything else is shared infrastructure:
- One filter per high-cardinality facet
- Skip filters that have <5 distinct values — they're not worth
the surface area
- **Feedback endpoint** (for `submit_doc_bug`, if you want it)
- URL of the POST endpoint
- Required + optional payload fields
- Captcha / rate-limit behavior
- Whether anonymous submissions are accepted
- **Curated knowledge** for the `_api_lessons` tool
- What does the product's API documentation NOT say that you've
learned from real integration work?
+2 -3
View File
@@ -7,8 +7,7 @@ product-specific has been factored out.
The end product is a streamable-HTTP MCP server with ~15 tools that
any LLM client (Claude Desktop, Claude Code, Cursor, Copilot) can
call to answer questions against the docs, surface what changed
recently, find inconsistencies, and (optionally) submit doc bugs
back upstream.
recently, and flag likely inconsistencies.
## What's here
@@ -97,7 +96,7 @@ python -m docs_mcp.server --transport stdio
- Registry GC script
- Standard tools: `search_docs`, `get_page`, `list_versions`,
`diff_versions`, `bundle_changelog`, `weekly_digest`,
`find_doc_inconsistencies`, `submit_doc_bug`, etc.
`find_doc_inconsistencies`, etc.
## License
-5
View File
@@ -45,11 +45,6 @@ services:
# Phase 10 — usage telemetry.
USAGE_LOG_DIR: /app/var/logs
USAGE_LOG_KEEP_DAYS: "90"
# Phase 12 — doc-bug submission gate. Off by default; on only
# in production after you've verified the endpoint contract.
DOC_BUG_SUBMIT_ENABLED: "false"
# DOC_BUG_API_URL: "https://docs-be.example.com/api/feedback"
volumes:
# Usage logs persist across container recreates.
- ./<product>-docs-mcp-logs:/app/var/logs
+2 -9
View File
@@ -9,7 +9,7 @@ PLAN.md add or extend pieces of this file:
Phase 9 — diff_versions, list_cluster, bundle_changelog
Phase 10 — TimedCall wiring (already imported below)
Phase 11 — <product>_api_lessons tool
Phase 12 — find_doc_inconsistencies, submit_doc_bug
Phase 12 — find_doc_inconsistencies
Phase 13 — weekly_digest + _digest_history reader
Every stub below has a docstring + `raise NotImplementedError`. Replace
@@ -47,7 +47,7 @@ BM25_DB = Path(os.environ.get("BM25_DB", str(ROOT / "bm25" / f"{PRODUCT_NAME}_do
BUNDLES_JSON = ROOT / "bundles.json"
# ---------------------------------------------------------------------------
# Feature flags (Phase 6 / 8 / 12 enable these as you ship each phase).
# Feature flags (Phase 6 / 8 enable these as you ship each phase).
# ---------------------------------------------------------------------------
RERANK_URL = os.environ.get("RERANK_URL", "").rstrip("/") or None
RERANK_POOL = int(os.environ.get("RERANK_POOL", "50"))
@@ -56,10 +56,6 @@ RERANK_TIMEOUT = float(os.environ.get("RERANK_TIMEOUT", "30"))
HYBRID_SEARCH = os.environ.get("HYBRID_SEARCH", "").lower() in ("true", "1", "yes", "on")
RRF_K = int(os.environ.get("RRF_K", "60"))
DOC_BUG_SUBMIT_ENABLED = os.environ.get("DOC_BUG_SUBMIT_ENABLED", "").lower() in ("true", "1", "yes", "on")
DOC_BUG_API_URL = os.environ.get("DOC_BUG_API_URL", "") # product-specific endpoint
DOC_BUG_TIMEOUT = float(os.environ.get("DOC_BUG_TIMEOUT", "15"))
# ---------------------------------------------------------------------------
# FastMCP setup.
@@ -230,9 +226,6 @@ def list_versions() -> str:
# @mcp.tool() # Phase 12
# def find_doc_inconsistencies(scope_query: str, ...) -> str: ...
# @mcp.tool() # Phase 12
# def submit_doc_bug(page_url: str, content: str, email: str | None = None, ...) -> str: ...
# ===========================================================================
# Entry point