Strip submit_doc_bug tool and gate (Zerto-specific, not applicable to label MCP)

2026-05-23 17:51:56 -04:00
parent 43728320bf
commit 3ca96a3716
5 changed files with 10 additions and 71 deletions
@@ -30,7 +30,7 @@ which phase they're on by inspecting:
 | `RERANK_URL` env unset in compose | Phase 6 not done |
 | `HYBRID_SEARCH` env unset, no `rag/bm25.py` content | Phase 8 not done |
 | No `eval/results/` directory | Phase 7 not done |
-| `find_doc_inconsistencies` / `submit_doc_bug` are commented-out stubs in `docs_mcp/server.py` | Phase 12 |
+| `find_doc_inconsistencies` is a commented-out stub in `docs_mcp/server.py` | Phase 12 |
 | No `corpus/.digest/` produced by CI | Phase 13 |

 When in doubt, ask the user: *"Which phase from PLAN.md are we
@@ -90,21 +90,6 @@ decide whether to call the tool. Treat it like a button label.
 *"Use when..."*, *"Call proactively whenever..."* phrasings work
 well. Don't bury the headline in implementation notes.

-### Side-effecting tools must be env-gated AND operator-confirmed
-
-Any tool that POSTs to an external service (submit_doc_bug being the
-canonical example):
-
-1. Must check an env flag at call time and return a "disabled,
-   manual fallback at <URL>" message if unset.
-2. Must have a loud docstring requiring per-call operator
-   confirmation in the LLM conversation flow (the LLM drafts, shows
-   the operator the exact payload, asks yes/no, only then calls).
-3. Must do upfront validation (URL allowlist, content length, etc.)
-   so the LLM gets a clean error instead of a wire-level failure.
-
-Match the `submit_doc_bug` patterns documented in PLAN.md Phase 12.
-
 ### Defensive fallback for retrieval components

 The reranker, BM25 index, and any external dependency must fail
@@ -231,7 +216,6 @@ python -m scrape.changelog --history-out corpus/.digest/history.jsonl --history-
 | "Add a reranker" | Read PLAN.md Phase 6. Stand up the llama.cpp sidecar, implement `_rerank()`. Verify with the eval harness. |
 | "Search is missing X queries" | Run the eval harness first to confirm the failure. Then consider: rich chunk-0 rewrites, hybrid retrieval, curated knowledge layer. Don't just tune cosine. |
 | "Let's add hybrid search" | Read PLAN.md Phase 8. Only after you've established the failure mode with eval queries — hybrid is not free. |
-| "Make a tool that submits doc bugs" | Read PLAN.md Phase 12. Find the docs portal's feedback endpoint by sniffing. Build with operator confirmation as a hard requirement in the tool docstring. |
 | "I want a 'what changed' tool" | Read PLAN.md Phase 13. Don't try to do this at runtime — pre-bake the history JSONL at CI time. |

 ## Out-of-scope concerns (don't try to solve here)
@@ -7,8 +7,7 @@ product-specific has been factored out.
 The end product is a streamable-HTTP MCP server with ~15 tools that
 any LLM client (Claude Desktop, Claude Code, Cursor, Copilot) can
 call to answer questions against the docs, surface what changed
-recently, find inconsistencies, and (optionally) submit doc bugs
-back upstream.
+recently, and flag likely inconsistencies.

 ---

@@ -27,7 +26,7 @@ upstream docs portal
        │           ──► bm25/   (FTS5 lexical index)
        ▼
   MCP server  ──► search_docs / get_page / diff_versions / weekly_digest /
-                   find_doc_inconsistencies / submit_doc_bug / ...
+                   find_doc_inconsistencies / ...
        │
        ▼
   reverse proxy / Cloudflare Tunnel ──► public endpoint
@@ -440,10 +439,10 @@ The "RAG can't tell you what isn't in the docs" gap. Surfaces:
  suspenders for queries where the LLM doesn't think to call it
  proactively.

-### Phase 12 — Doc-bug workflow tools  *(1 day, optional)*
+### Phase 12 — Doc-inconsistency tool  *(half a day, optional)*

-Two tools that pair up to enable a *"check the docs for
-inconsistencies, draft bugs, confirm, submit"* workflow.
+A *"scan the corpus for likely doc bugs"* tool the model can call
+when an operator asks "is this section reliable?"

 - `find_doc_inconsistencies(scope_query, version=None, platform=None,
  max_pages=30, checks=None)`: deterministic, read-only. Two checks:
@@ -454,31 +453,6 @@ inconsistencies, draft bugs, confirm, submit"* workflow.
  diff (`difflib`) against editor-curated cluster peers; the model
  judges which findings are real bugs.

- `submit_doc_bug(page_url, content, email=None, rating=None,
-  like=None)`: POSTs to the docs portal's feedback endpoint.
-  Env-gated by `DOC_BUG_SUBMIT_ENABLED=true` so dev/staging
-  deployments can't accidentally hit the upstream. The tool's
-  docstring is loud about a mandatory operator-confirmation
-  workflow per submission — LLM must draft, show, ask, then
-  submit. Explicit *"do not loop"* instruction. Defensive
-  validation upfront (URL host matches expected portal, content
-  non-empty, etc.) so the LLM gets a clean error instead of a
-  rejected POST.
-
-**You'll need to find the docs portal's feedback endpoint.** Most
-portals route the "Was this helpful?" widget through a backend
-API; sniff the browser network tab on the live site. The payload
-shape varies; common fields: content/body, page url/href, optional
-email, optional rating, optional thumbs. Most accept anonymous
-POSTs with no captcha at the JSON-API layer (even if the widget
-shows a captcha). Validate before you ship — and if the endpoint
-has rate limits or captcha enforcement, the tool returns a clean
-"submission rejected — paste manually at <url>" fallback.
-
-The whole point is the per-bug operator confirmation in the
-LLM-side conversation flow; the tool description enforces it. Do
-not bypass.
-
 ### Phase 13 — Weekly digest tool  *(half a day)*

 Goal: a tool that answers *"what changed in the docs in the last N
@@ -524,7 +498,6 @@ shape:
 | `weekly_digest` | What changed in the last N days, with filters |
 | `corpus_status` | Freshness + size of the knowledge base |
 | `find_doc_inconsistencies` | Scoped scan for doc bugs |
-| `submit_doc_bug` | Submit a drafted bug (env-gated, operator-confirmed) |
 | `<product>_api_lessons` | Curated API gotchas, proactively-called |
 | product-specific tools | Interop matrix, lifecycle queries, etc. |

@@ -553,11 +526,6 @@ to figure out yourself — everything else is shared infrastructure:
  - One filter per high-cardinality facet
  - Skip filters that have <5 distinct values — they're not worth
    the surface area
- **Feedback endpoint** (for `submit_doc_bug`, if you want it)
-  - URL of the POST endpoint
-  - Required + optional payload fields
-  - Captcha / rate-limit behavior
-  - Whether anonymous submissions are accepted
 - **Curated knowledge** for the `_api_lessons` tool
  - What does the product's API documentation NOT say that you've
    learned from real integration work?
@@ -7,8 +7,7 @@ product-specific has been factored out.
 The end product is a streamable-HTTP MCP server with ~15 tools that
 any LLM client (Claude Desktop, Claude Code, Cursor, Copilot) can
 call to answer questions against the docs, surface what changed
-recently, find inconsistencies, and (optionally) submit doc bugs
-back upstream.
+recently, and flag likely inconsistencies.

 ## What's here

@@ -97,7 +96,7 @@ python -m docs_mcp.server --transport stdio
 - Registry GC script
 - Standard tools: `search_docs`, `get_page`, `list_versions`,
  `diff_versions`, `bundle_changelog`, `weekly_digest`,
-  `find_doc_inconsistencies`, `submit_doc_bug`, etc.
+  `find_doc_inconsistencies`, etc.

 ## License

@@ -45,11 +45,6 @@ services:
      # Phase 10 — usage telemetry.
      USAGE_LOG_DIR: /app/var/logs
      USAGE_LOG_KEEP_DAYS: "90"
-
-      # Phase 12 — doc-bug submission gate. Off by default; on only
-      # in production after you've verified the endpoint contract.
-      DOC_BUG_SUBMIT_ENABLED: "false"
-      # DOC_BUG_API_URL: "https://docs-be.example.com/api/feedback"
    volumes:
      # Usage logs persist across container recreates.
      - ./<product>-docs-mcp-logs:/app/var/logs
@@ -9,7 +9,7 @@ PLAN.md add or extend pieces of this file:
  Phase 9  — diff_versions, list_cluster, bundle_changelog
  Phase 10 — TimedCall wiring (already imported below)
  Phase 11 — <product>_api_lessons tool
-  Phase 12 — find_doc_inconsistencies, submit_doc_bug
+  Phase 12 — find_doc_inconsistencies
  Phase 13 — weekly_digest + _digest_history reader

 Every stub below has a docstring + `raise NotImplementedError`. Replace
@@ -47,7 +47,7 @@ BM25_DB = Path(os.environ.get("BM25_DB", str(ROOT / "bm25" / f"{PRODUCT_NAME}_do
 BUNDLES_JSON = ROOT / "bundles.json"

 # ---------------------------------------------------------------------------
-# Feature flags (Phase 6 / 8 / 12 enable these as you ship each phase).
+# Feature flags (Phase 6 / 8 enable these as you ship each phase).
 # ---------------------------------------------------------------------------
 RERANK_URL = os.environ.get("RERANK_URL", "").rstrip("/") or None
 RERANK_POOL = int(os.environ.get("RERANK_POOL", "50"))
@@ -56,10 +56,6 @@ RERANK_TIMEOUT = float(os.environ.get("RERANK_TIMEOUT", "30"))
 HYBRID_SEARCH = os.environ.get("HYBRID_SEARCH", "").lower() in ("true", "1", "yes", "on")
 RRF_K = int(os.environ.get("RRF_K", "60"))

-DOC_BUG_SUBMIT_ENABLED = os.environ.get("DOC_BUG_SUBMIT_ENABLED", "").lower() in ("true", "1", "yes", "on")
-DOC_BUG_API_URL = os.environ.get("DOC_BUG_API_URL", "")  # product-specific endpoint
-DOC_BUG_TIMEOUT = float(os.environ.get("DOC_BUG_TIMEOUT", "15"))
-

 # ---------------------------------------------------------------------------
 # FastMCP setup.
@@ -230,9 +226,6 @@ def list_versions() -> str:
 # @mcp.tool()  # Phase 12
 # def find_doc_inconsistencies(scope_query: str, ...) -> str: ...

-# @mcp.tool()  # Phase 12
-# def submit_doc_bug(page_url: str, content: str, email: str | None = None, ...) -> str: ...
-

 # ===========================================================================
 # Entry point