From 4009dc0b78f5ed524a48ee185704906064592815 Mon Sep 17 00:00:00 2001 From: Justin Paul Date: Mon, 25 May 2026 13:18:57 -0400 Subject: [PATCH] Phase 11: crop_seed_api_lessons tool + Pioneer fallback MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add the fifth MCP tool — crop_seed_api_lessons(topic?) — backed by docs_mcp/lessons.md, the ONLY source of opinionated content in the server. Everything else (search_docs, get_page, lookup_variety) returns verbatim from vendor catalogs; lessons.md fills the gaps the corpus can't cover. The Pioneer fallback is the critical anti-hallucination piece: Pioneer's ToS bans automation, so the corpus has no Pioneer data. Without this tool, an agent might surface Bayer/Asgrow chunks as mediocre matches for a Pioneer query. The tool's docstring tells the agent to call it on any Pioneer / P-series question; the 'pioneer' section says clearly: "I don't have Pioneer's variety data indexed... please consult Pioneer or an extension service." "Do NOT invent Pioneer hybrid ratings." Other lesson sections cover knowledge the agent needs to interpret search_docs / get_page output correctly: - rating-scales: Bayer 1-9, Golden Harvest 9-to-1, what R/MR/S/Rps1c/R3 mean in soybean disease columns - maturity-semantics: corn RM days vs soybean MG vs wheat class + qualitative early/medium/late - trait-glossary: SSRIB, VT2PRIB, XF, E3, Conkesta, Clearfield, etc. - scn-resistance: race coverage + Peking vs PI 88788 source - regional-listings: how to interpret Bayer's "local profiles" - sources-not-yet-indexed: which vendors aren't in the corpus yet - checking-your-work: always call lookup_variety before quoting Lesson lookup prefers slug-match (returns just `rating-scales` for topic="rating", not every section that mentions ratings); falls back to body-match only when no slug matches. Smoke-tested with topic=pioneer, topic=rating, topic=trait, topic=zzzzzz (no match), and topic=None (full index = 10K chars, 8 sections). Co-Authored-By: Claude Opus 4.7 (1M context) --- docs_mcp/lessons.md | 259 ++++++++++++++++++++++++++++++++++++++++++++ docs_mcp/server.py | 106 ++++++++++++++++++ 2 files changed, 365 insertions(+) create mode 100644 docs_mcp/lessons.md diff --git a/docs_mcp/lessons.md b/docs_mcp/lessons.md new file mode 100644 index 00000000..8936c5cf --- /dev/null +++ b/docs_mcp/lessons.md @@ -0,0 +1,259 @@ +# crop_seed API lessons + +Curated knowledge that **does not live in the scraped corpus** but +that an agent needs to interpret search_docs / get_page results +correctly. This file is the source for `crop_seed_api_lessons(topic)`. + +Each section starts with a `## ` heading; the tool returns +sections whose slug matches the topic (substring match) or all +sections when `topic` is None. + +This file is the *only* place the MCP injects opinionated content. +Everything else returned by the server comes verbatim from a vendor +catalog. Lessons here should be: factual, well-cited, and conservative +about what the corpus actually contains. + +--- + +## pioneer + +**Pioneer (Corteva) is NOT in the seed-mcp corpus.** The MCP intentionally +does not scrape pioneer.com. Per Pioneer's published Terms of Service: + +> "you shall not use any manual or automated software, devices or +> other processes (including but not limited to spiders, robots, +> scrapers, crawlers, avatars, data mining tools or the like) to +> 'scrape' or download data from the Services" + +Pioneer additionally runs Imperva WAF on their catalog API and +login-walls dealer lookup, so a polite scrape isn't technically +feasible either. + +**Agent behavior when a farmer asks about Pioneer / P-series hybrids:** + +1. Tell the farmer plainly: "I don't have Pioneer's variety data + indexed — Pioneer doesn't allow automated indexing of their + catalog." +2. Recommend pioneer.com → *Find a Sales Rep* (or call 1-800-247-6803) + for hybrid specs and local recommendations. +3. If the farmer wants an independent third-party rating for a + specific Pioneer hybrid, mention that **university extension + performance trials** (Iowa State, Illinois Crop Sciences, Purdue, + Nebraska, Ohio State) publish multi-location yield data on + Pioneer hybrids alongside competitors — useful if they want + apples-to-apples data without going through Pioneer's own + marketing. +4. **Do NOT invent Pioneer hybrid ratings.** If asked "what's the + disease tolerance of P1142AM?", the only correct answer is + "I don't have that data — please consult Pioneer or an + extension service." + +This is the canonical anti-hallucination policy for the seed-mcp. +There is no Pioneer data; there is no inference. Direct the farmer +to a primary source. + +--- + +## rating-scales + +Different vendors publish ratings on different conventions. The +chunker normalizes the *labels* in the chunk preamble but always +preserves the source's `_scale_direction` field in the sidecar. + +**Bayer (DEKALB / Asgrow / WestBred)**: `1-9 (9 = best)`. A +GRAY LEAF SPOT rating of 8 means EXCELLENT tolerance. A rating of 2 +means SUSCEPTIBLE. + +**Syngenta Golden Harvest**: `9-to-1 (9 = best, 1 = worst)` — +this is the *direction* Golden Harvest publishes, but the *meaning* +of high numbers is the same: high = best. Where the chunker says +"normalize" for Golden Harvest, that just means we've already +re-stated it as `1-9 (9 = best)` in the chunk preamble; the source's +`_scale_direction` field still says `9-to-1` so you can detect the +provenance. + +**Syngenta NK / AgriPro**: `1-9 (9 = best)`. Same as Bayer. + +**Beck's**: ratings live behind SeedIQ login; only identity-level +data is publicly available, so most disease/agronomic ratings are +absent from Beck's records in this corpus. + +**Always check the chunk's "Rating scale" line or call +`lookup_variety(source_key)` and look at `_scale_direction` if you +are unsure.** Cross-vendor comparisons are valid AFTER you've +confirmed each side uses the same direction. + +**Non-numeric values** appear for some characteristics and should be +read literally: +- `R`, `MR`, `S` for soybean disease resistance = Resistant / Moderately + Resistant / Susceptible (not 1-9). +- `Rps1c`, `Rps3a`, `Rps1k`, etc. = specific Phytophthora resistance + gene present. +- `R1`, `R3` (under SOYBEAN CYST NEMATODE) = effective against + SCN race 1 / race 3. +- `A`, `B`, `C` under HERBICIDE sensitivity = grade letters where A + is most tolerant. + +--- + +## maturity-semantics + +Maturity is encoded differently per crop. Don't conflate the units. + +**Corn — Relative maturity (RM days)**: integer roughly 75-120. +Lower = shorter season, suitable for higher latitudes / shorter +growing windows. 110 RM is a Central Iowa default; 85 RM suits +northern Minnesota or short-season silage; 115+ RM fits southern +Indiana / southern Illinois / Missouri Delta. The number is +**Pioneer-style RM days**, normalized across the industry. + +**Soybeans — Maturity group (MG)**: float 00 (zero-zero) to 9.0 +expressed with one decimal. A "3.5 MG" soybean is for central +Iowa. Northern North Dakota / Minnesota plant 0.0–1.5 MG. Mid-South +plants 5.0+. Each tenth of an MG ≈ 7-10 days of additional season. +Sidecar field: `maturity_group` (e.g. "3.5", "0.7"). + +**Wheat — Class + heading**: Winter / spring decision is separate +from "class" (HRW / HRS / SRW / SWW / SWS / durum): +- HRW = Hard Red Winter — Plains states bread wheat +- HRS = Hard Red Spring — Northern Plains, North Dakota, Montana +- SRW = Soft Red Winter — Eastern Corn Belt, Ohio Valley +- SWW = Soft White Winter — Pacific Northwest +- SWS = Soft White Spring — Pacific Northwest +- Durum — North Dakota / Montana, pasta wheat +Maturity is qualitative: Early / Medium-Early / Medium / Medium-Late / Late. +**WestBred's product page JSON does not always expose the wheat class +as a structured field** — sometimes it's only in the marketing +narrative (e.g. "WB1376CLP is a Soft White Winter Clearfield® Plus +Wheat variety"). Read `positioning_statement` carefully when the +sidecar's `wheat_class` is null. + +--- + +## trait-glossary + +Common trait codes that appear in `trait_stack`: + +**Corn:** +- `SSRIB` — SmartStax® RIB Complete® corn blend (above + below-ground + insect protection + Roundup Ready + LibertyLink, with refuge-in-bag) +- `VT2PRIB` — VT Double PRO® RIB Complete® (above-ground insect + protection + Roundup Ready, refuge-in-bag) +- `VT4PRIB` — VT4 PRO® RIB Complete® (newer above-ground protection) +- `Trecepta` — Trecepta® (Trecepta + Roundup Ready + LibertyLink, for + earworm + western bean cutworm pressure) +- `SmartStax PRO` — SmartStax® PRO® (RNAi corn rootworm) +- `PowerCore` — PowerCore® Refuge Advanced (older above-ground stack) +- `Conventional` — no biotech traits (organic / specialty channels) + +**Soybeans:** +- `XF` — XtendFlex® (Roundup Ready 2 Xtend + dicamba + glufosinate) +- `Xtend` — Roundup Ready 2 Xtend® (dicamba + glyphosate) +- `RR2Y` — Roundup Ready 2 Yield® (glyphosate only) +- `E3` — Enlist E3® (2,4-D + glyphosate + glufosinate) +- `LL/LL+GT27` — LibertyLink® / LibertyLink + GT27 (glufosinate + + glyphosate + isoxaflutole) +- `Conkesta E3` — Bt-stack for caterpillar pressure (BR/AR markets) +- `SR` — SR® (sulfonylurea-tolerant, Asgrow-specific) + +**Wheat:** +- `Clearfield` / `CLP` — Clearfield® / Clearfield® Plus (imazamox + tolerance) +- `CoAXium` — CoAXium® (quizalofop tolerance) — note: AgriPro's + catalog flag, NOT in the WestBred corpus. + +Always render the full trait name (`trait_descriptions`) when telling +the farmer "this variety has X trait" — bare trait codes are +ambiguous in print. + +--- + +## scn-resistance + +Soybean Cyst Nematode resistance ratings are critical for fields +with SCN pressure (most of the Corn Belt). Read carefully: + +- `R3` under SOYBEAN CYST NEMATODE = Resistant to race 3 (the most + common race nationally). Most "SCN-resistant" soybeans on the + market are R3. +- `R1, R3` = Resistant to both race 1 AND race 3. Higher value; + useful in long-rotation SCN fields where race shifts have occurred. +- `MR3` = Moderately Resistant to race 3. Some yield loss expected + under high SCN pressure. +- `S` = Susceptible. +- Some Bayer Asgrow XF lines (e.g. AG29XF4) use **Peking-type SCN + resistance**, which is genetically distinct from the more common + PI 88788 source. Peking is more durable when SCN populations + have eroded PI 88788 effectiveness. Look for "Peking type" in the + positioning statement. + +**Recommended workflow when a farmer asks about SCN:** call +`search_docs` with the user's MG range + "SCN-resistant", then +`lookup_variety` on the top 2-3 candidates to verify the exact race +coverage and resistance source. + +--- + +## regional-listings + +The `regional_recommendations` array in each sidecar is sourced from +Bayer's "local profiles" — varieties get assigned to regional Seed +Guide bundles (e.g. *"2026 Washington, Oregon, SEED GUIDE"*) with a +named regional agronomist contact. This is the closest signal we have +to *"is this variety recommended for the farmer's geography?"* but +note: + +- A variety being absent from a regional listing **does not** mean + it's unsuitable — Bayer's local agronomists curate these lists. +- Listings are vendor-side recommendations, not third-party trial + data. +- When the farmer mentions a region, try filtering or scanning for + varieties whose `regional_recommendations[].product_list_name` + mentions that region. + +Other vendors handle regional placement differently. Golden Harvest +publishes a separate "plot report" system per state/year/site; +NK publishes ratings as PDF tech sheets without regional flags. + +--- + +## sources-not-yet-indexed + +These vendors are planned but not yet in the corpus. Don't assume +their data is present: + +- **Golden Harvest (Syngenta)** — ~175 varieties, sitemap-driven + scrape pending. +- **NK (Syngenta)** — 29 varieties. +- **AgriPro (Syngenta wheat)** — 24 wheat varieties (HRW, HRS, HWS, + SWW, SWS). The only wheat coverage we expect to have outside + WestBred. +- **Beck's PFR (research)** — 2,089 head-to-head trial documents. + Different shape from variety records — these are studies, not + hybrids. +- **Beck's products** — 860 products. Identity-only (SeedIQ login + gates the ratings). + +If `list_versions()` doesn't show a vendor in the `vendor` facet, the +corpus does not have it yet. Direct the farmer to that vendor's +public catalog or their seed dealer. + +--- + +## checking-your-work + +Before quoting a specific number to a farmer, **always** call +`lookup_variety(source_key=...)` to confirm. The chunk text inside a +search_docs response is a faithful render of the sidecar, but the +sidecar IS the source of truth. Quoting from the canonical sidecar +makes you robust against: + +- Chunk-text formatting bugs (e.g. a rare unicode issue trimming a + value). +- Future chunker changes (a re-index might rewrite the body). +- Cross-vendor scale-direction differences (the sidecar's + `_scale_direction` lets you state the convention explicitly). + +If `lookup_variety` returns "not found" but `search_docs` surfaced the +chunk, that's a bug — please report it. (In normal operation, every +chunk's `source_key` round-trips to a valid sidecar.) diff --git a/docs_mcp/server.py b/docs_mcp/server.py index cce034bb..e33eb7e8 100644 --- a/docs_mcp/server.py +++ b/docs_mcp/server.py @@ -369,6 +369,40 @@ def _structured_ratings_block(sidecar: dict) -> str: return "\n".join(lines).rstrip() + "\n" +# --------------------------------------------------------------------------- +# Curated lessons — docs_mcp/lessons.md is the canonical source. +# --------------------------------------------------------------------------- +LESSONS_FILE = Path(__file__).resolve().parent / "lessons.md" +_lessons_cache: list[tuple[str, str]] | None = None + + +def _load_lessons() -> list[tuple[str, str]]: + """Parse lessons.md into ``[(slug, body), ...]`` sections. + + Sections are delimited by ``## `` headings. The slug is the + `` token (whitespace stripped); the body is everything between + that heading and the next ``## `` heading (or EOF). + """ + global _lessons_cache + if _lessons_cache is not None: + return _lessons_cache + out: list[tuple[str, str]] = [] + if not LESSONS_FILE.exists(): + _lessons_cache = out + return out + text = LESSONS_FILE.read_text(encoding="utf-8") + parts = re.split(r"(?m)^## (.+)$", text) + # parts = [preamble, slug1, body1, slug2, body2, ...] + for i in range(1, len(parts), 2): + slug = parts[i].strip() + body = parts[i + 1] if i + 1 < len(parts) else "" + # Drop trailing horizontal rule that separates sections. + body = re.sub(r"\n---\s*$", "", body).strip() + out.append((slug, body)) + _lessons_cache = out + return out + + # =========================================================================== # Tools # =========================================================================== @@ -711,6 +745,78 @@ def lookup_variety( return "\n".join(out) +@mcp.tool() +def crop_seed_api_lessons( + topic: Annotated[ + str | None, + Field(description=( + "OPTIONAL topic — match against lesson section slugs or body " + "(substring, case-insensitive). Known slugs: pioneer, " + "rating-scales, maturity-semantics, trait-glossary, " + "scn-resistance, regional-listings, sources-not-yet-indexed, " + "checking-your-work. Omit for the full curated index." + )), + ] = None, +) -> str: + """Curated knowledge that does NOT live in the scraped corpus — + vendor scale-direction notes, trait glossary, maturity semantics, + SCN resistance interpretation, the **Pioneer fallback policy**, + and rules for fact-checking your work. + + Call this tool when: + + * The user asks about **Pioneer** or any P-series hybrid — Pioneer + is intentionally NOT scraped (ToS bans it); the lesson tells you + what to say instead. + * You need to compare ratings across vendors — different vendors + publish on different scale directions. + * You're parsing a trait code or disease abbreviation you don't + recognize. + * Before quoting a specific rating value to a farmer — the + ``checking-your-work`` lesson reminds you to call + ``lookup_variety`` to confirm. + + This tool is **the only source of opinionated content** in the + server. Everything else returned by search_docs / get_page / + lookup_variety is verbatim from vendor catalogs. + """ + with TimedCall("crop_seed_api_lessons", {"topic": topic}) as _call: + sections = _load_lessons() + if not sections: + _call.set(sections_returned=0) + return "_(no lessons file present — docs_mcp/lessons.md missing)_" + + if not topic: + _call.set(sections_returned=len(sections)) + return "\n\n---\n\n".join( + f"## {slug}\n\n{body}" for slug, body in sections + ) + + needle = topic.strip().lower() + # Prefer slug matches (most specific). Fall back to body match + # only when no slug matches — keeps a query like "rating" from + # returning every section that happens to mention the word. + slug_matches: list[tuple[str, str]] = [] + body_matches: list[tuple[str, str]] = [] + for slug, body in sections: + if needle in slug.lower(): + slug_matches.append((slug, body)) + elif needle in body.lower(): + body_matches.append((slug, body)) + matched = slug_matches if slug_matches else body_matches + + _call.set(sections_returned=len(matched), topic=topic) + if not matched: + slugs = ", ".join(s for s, _ in sections) + return ( + f"_(no lesson section matched topic '{topic}'. " + f"Available slugs: {slugs}.)_" + ) + return "\n\n---\n\n".join( + f"## {slug}\n\n{body}" for slug, body in matched + ) + + # =========================================================================== # Entry point # =========================================================================== -- 2.52.0