diff --git a/CLAUDE.md b/CLAUDE.md index 30a8b46..ac3bd88 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -30,7 +30,7 @@ which phase they're on by inspecting: | `RERANK_URL` env unset in compose | Phase 6 not done | | `HYBRID_SEARCH` env unset, no `rag/bm25.py` content | Phase 8 not done | | No `eval/results/` directory | Phase 7 not done | -| `find_doc_inconsistencies` / `submit_doc_bug` are commented-out stubs in `docs_mcp/server.py` | Phase 12 | +| `find_doc_inconsistencies` is a commented-out stub in `docs_mcp/server.py` | Phase 12 | | No `corpus/.digest/` produced by CI | Phase 13 | When in doubt, ask the user: *"Which phase from PLAN.md are we @@ -90,21 +90,6 @@ decide whether to call the tool. Treat it like a button label. *"Use when..."*, *"Call proactively whenever..."* phrasings work well. Don't bury the headline in implementation notes. -### Side-effecting tools must be env-gated AND operator-confirmed - -Any tool that POSTs to an external service (submit_doc_bug being the -canonical example): - -1. Must check an env flag at call time and return a "disabled, - manual fallback at " message if unset. -2. Must have a loud docstring requiring per-call operator - confirmation in the LLM conversation flow (the LLM drafts, shows - the operator the exact payload, asks yes/no, only then calls). -3. Must do upfront validation (URL allowlist, content length, etc.) - so the LLM gets a clean error instead of a wire-level failure. - -Match the `submit_doc_bug` patterns documented in PLAN.md Phase 12. - ### Defensive fallback for retrieval components The reranker, BM25 index, and any external dependency must fail @@ -231,7 +216,6 @@ python -m scrape.changelog --history-out corpus/.digest/history.jsonl --history- | "Add a reranker" | Read PLAN.md Phase 6. Stand up the llama.cpp sidecar, implement `_rerank()`. Verify with the eval harness. | | "Search is missing X queries" | Run the eval harness first to confirm the failure. Then consider: rich chunk-0 rewrites, hybrid retrieval, curated knowledge layer. Don't just tune cosine. | | "Let's add hybrid search" | Read PLAN.md Phase 8. Only after you've established the failure mode with eval queries — hybrid is not free. | -| "Make a tool that submits doc bugs" | Read PLAN.md Phase 12. Find the docs portal's feedback endpoint by sniffing. Build with operator confirmation as a hard requirement in the tool docstring. | | "I want a 'what changed' tool" | Read PLAN.md Phase 13. Don't try to do this at runtime — pre-bake the history JSONL at CI time. | ## Out-of-scope concerns (don't try to solve here) diff --git a/PLAN.md b/PLAN.md index ca0f327..e369d88 100644 --- a/PLAN.md +++ b/PLAN.md @@ -7,8 +7,7 @@ product-specific has been factored out. The end product is a streamable-HTTP MCP server with ~15 tools that any LLM client (Claude Desktop, Claude Code, Cursor, Copilot) can call to answer questions against the docs, surface what changed -recently, find inconsistencies, and (optionally) submit doc bugs -back upstream. +recently, and flag likely inconsistencies. --- @@ -27,7 +26,7 @@ upstream docs portal │ ──► bm25/ (FTS5 lexical index) ▼ MCP server ──► search_docs / get_page / diff_versions / weekly_digest / - find_doc_inconsistencies / submit_doc_bug / ... + find_doc_inconsistencies / ... │ ▼ reverse proxy / Cloudflare Tunnel ──► public endpoint @@ -440,10 +439,10 @@ The "RAG can't tell you what isn't in the docs" gap. Surfaces: suspenders for queries where the LLM doesn't think to call it proactively. -### Phase 12 — Doc-bug workflow tools *(1 day, optional)* +### Phase 12 — Doc-inconsistency tool *(half a day, optional)* -Two tools that pair up to enable a *"check the docs for -inconsistencies, draft bugs, confirm, submit"* workflow. +A *"scan the corpus for likely doc bugs"* tool the model can call +when an operator asks "is this section reliable?" - `find_doc_inconsistencies(scope_query, version=None, platform=None, max_pages=30, checks=None)`: deterministic, read-only. Two checks: @@ -454,31 +453,6 @@ inconsistencies, draft bugs, confirm, submit"* workflow. diff (`difflib`) against editor-curated cluster peers; the model judges which findings are real bugs. -- `submit_doc_bug(page_url, content, email=None, rating=None, - like=None)`: POSTs to the docs portal's feedback endpoint. - Env-gated by `DOC_BUG_SUBMIT_ENABLED=true` so dev/staging - deployments can't accidentally hit the upstream. The tool's - docstring is loud about a mandatory operator-confirmation - workflow per submission — LLM must draft, show, ask, then - submit. Explicit *"do not loop"* instruction. Defensive - validation upfront (URL host matches expected portal, content - non-empty, etc.) so the LLM gets a clean error instead of a - rejected POST. - -**You'll need to find the docs portal's feedback endpoint.** Most -portals route the "Was this helpful?" widget through a backend -API; sniff the browser network tab on the live site. The payload -shape varies; common fields: content/body, page url/href, optional -email, optional rating, optional thumbs. Most accept anonymous -POSTs with no captcha at the JSON-API layer (even if the widget -shows a captcha). Validate before you ship — and if the endpoint -has rate limits or captcha enforcement, the tool returns a clean -"submission rejected — paste manually at " fallback. - -The whole point is the per-bug operator confirmation in the -LLM-side conversation flow; the tool description enforces it. Do -not bypass. - ### Phase 13 — Weekly digest tool *(half a day)* Goal: a tool that answers *"what changed in the docs in the last N @@ -524,7 +498,6 @@ shape: | `weekly_digest` | What changed in the last N days, with filters | | `corpus_status` | Freshness + size of the knowledge base | | `find_doc_inconsistencies` | Scoped scan for doc bugs | -| `submit_doc_bug` | Submit a drafted bug (env-gated, operator-confirmed) | | `_api_lessons` | Curated API gotchas, proactively-called | | product-specific tools | Interop matrix, lifecycle queries, etc. | @@ -553,11 +526,6 @@ to figure out yourself — everything else is shared infrastructure: - One filter per high-cardinality facet - Skip filters that have <5 distinct values — they're not worth the surface area -- **Feedback endpoint** (for `submit_doc_bug`, if you want it) - - URL of the POST endpoint - - Required + optional payload fields - - Captcha / rate-limit behavior - - Whether anonymous submissions are accepted - **Curated knowledge** for the `_api_lessons` tool - What does the product's API documentation NOT say that you've learned from real integration work? diff --git a/README.md b/README.md index 52cd328..a27ae47 100644 --- a/README.md +++ b/README.md @@ -7,8 +7,7 @@ product-specific has been factored out. The end product is a streamable-HTTP MCP server with ~15 tools that any LLM client (Claude Desktop, Claude Code, Cursor, Copilot) can call to answer questions against the docs, surface what changed -recently, find inconsistencies, and (optionally) submit doc bugs -back upstream. +recently, and flag likely inconsistencies. ## What's here @@ -97,7 +96,7 @@ python -m docs_mcp.server --transport stdio - Registry GC script - Standard tools: `search_docs`, `get_page`, `list_versions`, `diff_versions`, `bundle_changelog`, `weekly_digest`, - `find_doc_inconsistencies`, `submit_doc_bug`, etc. + `find_doc_inconsistencies`, etc. ## License diff --git a/deploy/docker-compose.yml b/deploy/docker-compose.yml index 0aa05a8..b0cdf59 100644 --- a/deploy/docker-compose.yml +++ b/deploy/docker-compose.yml @@ -45,11 +45,6 @@ services: # Phase 10 — usage telemetry. USAGE_LOG_DIR: /app/var/logs USAGE_LOG_KEEP_DAYS: "90" - - # Phase 12 — doc-bug submission gate. Off by default; on only - # in production after you've verified the endpoint contract. - DOC_BUG_SUBMIT_ENABLED: "false" - # DOC_BUG_API_URL: "https://docs-be.example.com/api/feedback" volumes: # Usage logs persist across container recreates. - ./-docs-mcp-logs:/app/var/logs diff --git a/docs_mcp/server.py b/docs_mcp/server.py index 28b1345..9a9387c 100644 --- a/docs_mcp/server.py +++ b/docs_mcp/server.py @@ -9,7 +9,7 @@ PLAN.md add or extend pieces of this file: Phase 9 — diff_versions, list_cluster, bundle_changelog Phase 10 — TimedCall wiring (already imported below) Phase 11 — _api_lessons tool - Phase 12 — find_doc_inconsistencies, submit_doc_bug + Phase 12 — find_doc_inconsistencies Phase 13 — weekly_digest + _digest_history reader Every stub below has a docstring + `raise NotImplementedError`. Replace @@ -47,7 +47,7 @@ BM25_DB = Path(os.environ.get("BM25_DB", str(ROOT / "bm25" / f"{PRODUCT_NAME}_do BUNDLES_JSON = ROOT / "bundles.json" # --------------------------------------------------------------------------- -# Feature flags (Phase 6 / 8 / 12 enable these as you ship each phase). +# Feature flags (Phase 6 / 8 enable these as you ship each phase). # --------------------------------------------------------------------------- RERANK_URL = os.environ.get("RERANK_URL", "").rstrip("/") or None RERANK_POOL = int(os.environ.get("RERANK_POOL", "50")) @@ -56,10 +56,6 @@ RERANK_TIMEOUT = float(os.environ.get("RERANK_TIMEOUT", "30")) HYBRID_SEARCH = os.environ.get("HYBRID_SEARCH", "").lower() in ("true", "1", "yes", "on") RRF_K = int(os.environ.get("RRF_K", "60")) -DOC_BUG_SUBMIT_ENABLED = os.environ.get("DOC_BUG_SUBMIT_ENABLED", "").lower() in ("true", "1", "yes", "on") -DOC_BUG_API_URL = os.environ.get("DOC_BUG_API_URL", "") # product-specific endpoint -DOC_BUG_TIMEOUT = float(os.environ.get("DOC_BUG_TIMEOUT", "15")) - # --------------------------------------------------------------------------- # FastMCP setup. @@ -230,9 +226,6 @@ def list_versions() -> str: # @mcp.tool() # Phase 12 # def find_doc_inconsistencies(scope_query: str, ...) -> str: ... -# @mcp.tool() # Phase 12 -# def submit_doc_bug(page_url: str, content: str, email: str | None = None, ...) -> str: ... - # =========================================================================== # Entry point