Files
claude a54fac240f
Image rebuild (skip scrape) / build (push) Successful in 5m54s
Add university-extension trials: Illinois VT + Iowa ICPT + Ohio OCPT (+123 cross-vendor trial docs) (#19)
Co-authored-by: claude <claude@jpaul.io>
Co-committed-by: claude <claude@jpaul.io>
2026-06-10 08:36:19 -04:00

424 lines
19 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# crop_seed API lessons
Curated knowledge that **does not live in the scraped corpus** but
that an agent needs to interpret search_docs / get_page results
correctly. This file is the source for `crop_seed_api_lessons(topic)`.
Each section starts with a `## <slug>` heading; the tool returns
sections whose slug matches the topic (substring match) or all
sections when `topic` is None.
This file is the *only* place the MCP injects opinionated content.
Everything else returned by the server comes verbatim from a vendor
catalog. Lessons here should be: factual, well-cited, and conservative
about what the corpus actually contains.
---
## pioneer
**Pioneer (Corteva) is NOT in the seed-mcp corpus.** The MCP intentionally
does not scrape pioneer.com. Per Pioneer's published Terms of Service:
> "you shall not use any manual or automated software, devices or
> other processes (including but not limited to spiders, robots,
> scrapers, crawlers, avatars, data mining tools or the like) to
> 'scrape' or download data from the Services"
Pioneer additionally runs Imperva WAF on their catalog API and
login-walls dealer lookup, so a polite scrape isn't technically
feasible either.
**Agent behavior when a farmer asks about Pioneer / P-series hybrids:**
1. Tell the farmer plainly: "I don't have Pioneer's variety data
indexed — Pioneer doesn't allow automated indexing of their
catalog."
2. Recommend pioneer.com → *Find a Sales Rep* (or call 1-800-247-6803)
for hybrid specs and local recommendations.
3. If the farmer wants an independent third-party rating for a
specific Pioneer hybrid, mention that **university extension
performance trials** (Iowa State, Illinois Crop Sciences, Purdue,
Nebraska, Ohio State) publish multi-location yield data on
Pioneer hybrids alongside competitors — useful if they want
apples-to-apples data without going through Pioneer's own
marketing.
4. **Do NOT invent Pioneer hybrid ratings.** If asked "what's the
disease tolerance of P1142AM?", the only correct answer is
"I don't have that data — please consult Pioneer or an
extension service."
This is the canonical anti-hallucination policy for the seed-mcp.
There is no Pioneer data; there is no inference. Direct the farmer
to a primary source.
---
## rating-scales
Different vendors publish ratings on different conventions. The
chunker normalizes the *labels* in the chunk preamble but always
preserves the source's `_scale_direction` field in the sidecar.
**Bayer (DEKALB / Asgrow / WestBred)**: `1-9 (9 = best)`. A
GRAY LEAF SPOT rating of 8 means EXCELLENT tolerance. A rating of 2
means SUSCEPTIBLE.
**Syngenta Golden Harvest**: `9-to-1 (9 = best, 1 = worst)`
this is the *direction* Golden Harvest publishes, but the *meaning*
of high numbers is the same: high = best. Where the chunker says
"normalize" for Golden Harvest, that just means we've already
re-stated it as `1-9 (9 = best)` in the chunk preamble; the source's
`_scale_direction` field still says `9-to-1` so you can detect the
provenance.
**Syngenta NK and AgriPro**: `1-9 (1 = best, lower = more
resistant)`. **REVERSED from Bayer and Golden Harvest.** NK's
tech-sheet PDFs literally print *"1-9 Scale: 1 = Best, 9 = Worst"*
in the footer; AgriPro's positioning on stripe-rust-resistant
varieties (e.g. AP Iliad with Stripe Rust 1, Eyespot 2) confirms
the same direction. On NK, this applies both to disease tolerance
AND to numeric agronomic ratings (Emergence, Standability, Shatter
Tolerance, Green Stem — all 1 = best). Cross-vendor comparisons
MUST consult the `_scale_direction` field in each side's sidecar
before drawing conclusions.
(Agronomic ratings on AgriPro are qualitative —
"Excellent / Very Good / Good / Fair" — and have no direction
issue. NK's soybean tech sheets ALSO publish soil-type adaptation
as Best/Good/Fair/Poor labels which are qualitative.)
**Beck's**: ratings live behind SeedIQ login; only identity-level
data is publicly available, so most disease/agronomic ratings are
absent from Beck's records in this corpus.
**ProHarvest Seeds**: **mixed scales** on one record. *Disease
Tolerance* is `1-9 numeric, 9 = best / most tolerant` (same direction
as Bayer — no flip; `NA` = not rated). *General Characteristics* and
*Agronomic Features* are qualitative (`Excellent / Very Good / Good /
Average`) with a few raw numerics (GDD pollination/black-layer, kernel
rows). *Soil Adaptability* uses `HR` (highly recommended) / `R`
(recommended). The single `_scale_direction` line on the record states
all three. Ebbert's-style independent brand, but ratings ARE parsed
into structured groups so they're retrievable.
**Latham Hi-Tech Seeds**: numeric ratings `~1-9 where LOWER = BETTER`
(1 = best / most tolerant / most resistant) — **REVERSED from Bayer,
same direction as NK / AgriPro**. There's no on-page legend; the
direction was derived empirically (top-rated stalks/roots cluster at
1.01.5, weak traits at 3.03.5). Categorical values pass through
verbatim: SCN source (`PI 88788`), Phytophthora gene (`Rps 1k`),
Anthracnose (`ASR`). `NA`/blank = not rated.
**Stine Seed**: **corn** is `1-9 numeric, 9 = Excellent / best`
(HIGHER = better, same as Bayer — read from the on-page legend:
9 Excellent … 5 Below Average). **Soybeans are QUALITATIVE** (vigor
Excellent/Very Good/Good; disease Resistant/Strong/Good/Susceptible
where Resistant/Strong = best), with SCN source + RPS gene passed
through, not a number. So a Stine corn "8" is strong but a Latham
"8" is weak — never compare the raw numbers across these two.
**Burrus Seed**: numeric ratings `1-10, 10 = best / most tolerant`
(HIGHER = better; observed range 410). Herbicide tolerances and Bt
insect-protection are `Yes/No` (verbatim). `NR`/blank/`0`/`-` = not
rated. Covers brands Burrus / Power Plus / DONMARIO.
**1st Choice Seeds**: `0-10, HIGHER = better` (0-4 Below Average,
5 Average, 6 Good, 7 Very Good, 8 Excellent, 9-10 Superior). Many
older corn hybrids publish only partial ratings (source gap); wheat
is identity-only.
**RobSeeCo** (Rob-See-Co + Innotech): `1-9, 9 = Best` (HIGHER =
better, same direction as Bayer / Stine-corn); `-` = not available.
Plant Height 9=Tall, Ear Height 9=High; Planting Rate L/ML/M/MH/H;
**Product Fit Geography A=All, C=Central, E=East, W=West, CW=Central+West**
(a placement code, not a rating). Soy disease uses letter codes
(R/MR/S) + an SCN source (e.g. Peking) + Rps gene. Sourced from the
seed-guide PDF, so it's identity + structured ratings but no live web
page per variety.
**⚠️ Direction is NOT consistent across the independents.** HIGHER =
better: Bayer, Golden Harvest, Stine(corn), ProHarvest(disease),
Burrus(1-10), 1st Choice(0-10), **RobSeeCo(1-9)**. LOWER = better
(1 = best): NK, AgriPro, **Latham**. Qualitative (no direction):
Stine(soy), ProHarvest(general/agronomic), AgriPro(agronomic),
Ebbert's. A raw numeric rating is meaningless without its
`_scale_direction`.
**Always check the chunk's "Rating scale" line or call
`lookup_variety(source_key)` and look at `_scale_direction` if you
are unsure.** Cross-vendor comparisons are valid AFTER you've
confirmed each side uses the same direction.
**Non-numeric values** appear for some characteristics and should be
read literally:
- `R`, `MR`, `S` for soybean disease resistance = Resistant / Moderately
Resistant / Susceptible (not 1-9).
- `Rps1c`, `Rps3a`, `Rps1k`, etc. = specific Phytophthora resistance
gene present.
- `R1`, `R3` (under SOYBEAN CYST NEMATODE) = effective against
SCN race 1 / race 3.
- `A`, `B`, `C` under HERBICIDE sensitivity = grade letters where A
is most tolerant.
---
## maturity-semantics
Maturity is encoded differently per crop. Don't conflate the units.
**Corn — Relative maturity (RM days)**: integer roughly 75-120.
Lower = shorter season, suitable for higher latitudes / shorter
growing windows. 110 RM is a Central Iowa default; 85 RM suits
northern Minnesota or short-season silage; 115+ RM fits southern
Indiana / southern Illinois / Missouri Delta. The number is
**Pioneer-style RM days**, normalized across the industry.
**Soybeans — Maturity group (MG)**: float 00 (zero-zero) to 9.0
expressed with one decimal. A "3.5 MG" soybean is for central
Iowa. Northern North Dakota / Minnesota plant 0.01.5 MG. Mid-South
plants 5.0+. Each tenth of an MG ≈ 7-10 days of additional season.
Sidecar field: `maturity_group` (e.g. "3.5", "0.7").
**Wheat — Class + heading**: Winter / spring decision is separate
from "class" (HRW / HRS / SRW / SWW / SWS / durum):
- HRW = Hard Red Winter — Plains states bread wheat
- HRS = Hard Red Spring — Northern Plains, North Dakota, Montana
- SRW = Soft Red Winter — Eastern Corn Belt, Ohio Valley
- SWW = Soft White Winter — Pacific Northwest
- SWS = Soft White Spring — Pacific Northwest
- Durum — North Dakota / Montana, pasta wheat
Maturity is qualitative: Early / Medium-Early / Medium / Medium-Late / Late.
**WestBred's product page JSON does not always expose the wheat class
as a structured field** — sometimes it's only in the marketing
narrative (e.g. "WB1376CLP is a Soft White Winter Clearfield® Plus
Wheat variety"). Read `positioning_statement` carefully when the
sidecar's `wheat_class` is null.
---
## trait-glossary
Common trait codes that appear in `trait_stack`:
**Corn:**
- `SSRIB` — SmartStax® RIB Complete® corn blend (above + below-ground
insect protection + Roundup Ready + LibertyLink, with refuge-in-bag)
- `VT2PRIB` — VT Double PRO® RIB Complete® (above-ground insect
protection + Roundup Ready, refuge-in-bag)
- `VT4PRIB` — VT4 PRO® RIB Complete® (newer above-ground protection)
- `Trecepta` — Trecepta® (Trecepta + Roundup Ready + LibertyLink, for
earworm + western bean cutworm pressure)
- `SmartStax PRO` — SmartStax® PRO® (RNAi corn rootworm)
- `PowerCore` — PowerCore® Refuge Advanced (older above-ground stack)
- `Conventional` — no biotech traits (organic / specialty channels)
**Soybeans:**
- `XF` — XtendFlex® (Roundup Ready 2 Xtend + dicamba + glufosinate)
- `Xtend` — Roundup Ready 2 Xtend® (dicamba + glyphosate)
- `RR2Y` — Roundup Ready 2 Yield® (glyphosate only)
- `E3` — Enlist E3® (2,4-D + glyphosate + glufosinate)
- `LL/LL+GT27` — LibertyLink® / LibertyLink + GT27 (glufosinate +
glyphosate + isoxaflutole)
- `Conkesta E3` — Bt-stack for caterpillar pressure (BR/AR markets)
- `SR` — SR® (sulfonylurea-tolerant, Asgrow-specific)
**Wheat:**
- `Clearfield` / `CLP` — Clearfield® / Clearfield® Plus (imazamox
tolerance)
- `CoAXium` — CoAXium® (quizalofop tolerance) — note: AgriPro's
catalog flag, NOT in the WestBred corpus.
Always render the full trait name (`trait_descriptions`) when telling
the farmer "this variety has X trait" — bare trait codes are
ambiguous in print.
---
## scn-resistance
Soybean Cyst Nematode resistance ratings are critical for fields
with SCN pressure (most of the Corn Belt). Read carefully:
- `R3` under SOYBEAN CYST NEMATODE = Resistant to race 3 (the most
common race nationally). Most "SCN-resistant" soybeans on the
market are R3.
- `R1, R3` = Resistant to both race 1 AND race 3. Higher value;
useful in long-rotation SCN fields where race shifts have occurred.
- `MR3` = Moderately Resistant to race 3. Some yield loss expected
under high SCN pressure.
- `S` = Susceptible.
- Some Bayer Asgrow XF lines (e.g. AG29XF4) use **Peking-type SCN
resistance**, which is genetically distinct from the more common
PI 88788 source. Peking is more durable when SCN populations
have eroded PI 88788 effectiveness. Look for "Peking type" in the
positioning statement.
**Recommended workflow when a farmer asks about SCN:** call
`search_docs` with the user's MG range + "SCN-resistant", then
`lookup_variety` on the top 2-3 candidates to verify the exact race
coverage and resistance source.
---
## regional-listings
The `regional_recommendations` array in each sidecar is sourced from
Bayer's "local profiles" — varieties get assigned to regional Seed
Guide bundles (e.g. *"2026 Washington, Oregon, SEED GUIDE"*) with a
named regional agronomist contact. This is the closest signal we have
to *"is this variety recommended for the farmer's geography?"* but
note:
- A variety being absent from a regional listing **does not** mean
it's unsuitable — Bayer's local agronomists curate these lists.
- Listings are vendor-side recommendations, not third-party trial
data.
- When the farmer mentions a region, try filtering or scanning for
varieties whose `regional_recommendations[].product_list_name`
mentions that region.
Other vendors handle regional placement differently. Golden Harvest
publishes a separate "plot report" system per state/year/site;
NK publishes ratings as PDF tech sheets without regional flags.
---
## sources-not-yet-indexed
If `list_versions()` doesn't show a vendor in the `vendor` facet, the
corpus does not have it yet. Direct the farmer to that vendor's
public catalog or their seed dealer.
**Already indexed**: Bayer (DEKALB / Asgrow / WestBred), Syngenta
(Golden Harvest, NK, AgriPro).
**Not yet indexed**:
- **Beck's PFR (research)** — 2,089 head-to-head trial documents
on the public Sanity GROQ API. Different shape from variety
records — these are studies, not hybrids. Surfacing them would
benefit a separate tool (e.g. `search_pfr_studies`) rather than
share a corpus with variety identity.
- **Beck's products** — ~860 products. Identity-only (SeedIQ login
gates the ratings).
---
## trial-data
The MCP exposes TWO complementary surfaces:
* **`search_docs`** — variety IDENTITY (what a hybrid IS):
disease ratings, trait stack, maturity, vendor positioning.
* **`search_trials`** — variety PERFORMANCE (how it ACTUALLY did):
ranked yield at specific cooperator fields and regions.
**Indexed trial sources**:
- **Golden Harvest plot reports** (~4,600 cross-vendor head-to-head
trials, 2024+2025). Each trial = one cooperator's field at a
specific state/year, comparing products from multiple brands
(NK / DEKALB / Golden Harvest / Enogen / Pioneer / Channel, etc.)
side by side. **This is the closest thing to independent
comparison data the corpus has** — Bayer doesn't publish its own
trial data, but GH publishes plots where DEKALB hybrids appear
alongside their competitors.
- **AgriPro regional trial PDFs** (14 PDFs) — multi-year
multi-location wheat performance for Northern Plains / Pacific
Northwest / Plains regions. Variety + per-location yields
preserved verbatim.
- **LG Seeds + AgriGold plot reports** (AgReliant) — additional
cross-vendor corn/soy plots (same head-to-head structure as the
GH reports).
- **ProHarvest Seeds plot reports** (corn + soy, 2024+2025) —
per-cooperator harvest reports from an independent Corn Belt brand.
Many are cross-vendor (ProHarvest / Apex vs Pioneer / DEKALB /
Becks / Merschman, etc.). Structured rank/yield/%H2O/test-weight
tables where the PDF fits ProHarvest's template; foreign-format
third-party reports are kept verbatim (`raw_text`) so the yields
are still searchable. Image-only PDFs (no text layer) are skipped.
- **University-extension variety trials** (`illinois_vt_trials`,
`iowa_icpt_trials`, `ohio_ocpt_trials`, 2024+2025) — **the
independent third-party gold standard.** Land-grant programs (U of
Illinois VT, Iowa State ICPT, Ohio OCPT) that test every *entered*
brand side-by-side at the same sites with replication + LSD stats.
The publisher is the university; the seed brands are in each row's
`brand`. **This is where Pioneer / DEKALB / Channel / Brevant
performance is legitimately available** (they enter these public
trials even though we can't scrape their own sites). Caveat: a brand
only appears where it *entered* — e.g. Brevant didn't enter Iowa
ICPT, DEKALB/Channel didn't enter Illinois VT; absence in one
program is a true negative, not missing data. Illinois adds wheat;
Iowa/Ohio are corn+soy. (Purdue PCPP + other states deferred.)
**Recommended workflow when a farmer asks about performance**:
1. Call `search_trials(crop, state, year, ...)` to find trials
from the relevant region/season.
2. Identify the top performers in the rankings.
3. Call `lookup_variety(source_key=...)` for each leading hybrid to
verify identity (RM, traits, disease ratings) — confirm the
variety actually fits the farmer's situation, not just that it
won someone else's trial.
4. If the leading variety is from a brand whose trial data isn't
directly published (e.g. DEKALB), the GH plot reports often
show it competing — that's still the agent's best public
third-party signal.
**Trial data NOT in the corpus** (don't fabricate):
- **DEKALB / Asgrow / Channel** per-variety yield trials —
Bayer keeps these in rep tools, not on the public catalog. The
GH plot reports surface DEKALB/Asgrow performance indirectly,
but per-variety dedicated trials aren't indexed.
- **NK yield results** — the data exists at
`syngenta-us.com/nk/yield-results` but the ASMX endpoint is
fiddly; not yet scraped. The variety identity is in the corpus
(`search_docs` finds it), just not the per-region trial yields.
- **Pioneer trials** — ToS bans automation, so we have no Pioneer
*identity* data and don't scrape Pioneer's own results. BUT
Pioneer *performance* IS now available indirectly via the
university-extension trials (and the GH/ProHarvest plots) where
Pioneer entered — search those for Pioneer head-to-head yields;
for Pioneer variety specs, direct the farmer to a dealer.
- **University extension trials** — NOW INDEXED for IL / IA / OH
(`illinois_vt_trials` / `iowa_icpt_trials` / `ohio_ocpt_trials`,
2024+2025). Purdue PCPP and other states (NE / WI / MN / the
Dakotas / Kansas wheat) are not yet indexed — a future enrichment.
**Reading a GH plot report**:
Each plot has a cooperator name (the farmer running the trial), a
state, a year, planting/harvest dates, population, row width, and a
ranked table of products. The columns vary by crop:
- **Corn / Soy**: Rank | Brand | Product | Traits | Yield BU/Ac
| %MST | Test Weight | Gross Revenue
- **Silage**: Rank | Brand | Product | Traits | Ton/Acre
| Milk Per Acre | Milk Per Ton | Beef Per Acre | Beef Per Ton
Rank 1 = top performer at that site/year. Note that a single plot
is one data point — for a robust recommendation, look across
multiple plots from the same region.
---
## checking-your-work
Before quoting a specific number to a farmer, **always** call
`lookup_variety(source_key=...)` to confirm. The chunk text inside a
search_docs response is a faithful render of the sidecar, but the
sidecar IS the source of truth. Quoting from the canonical sidecar
makes you robust against:
- Chunk-text formatting bugs (e.g. a rare unicode issue trimming a
value).
- Future chunker changes (a re-index might rewrite the body).
- Cross-vendor scale-direction differences (the sidecar's
`_scale_direction` lets you state the convention explicitly).
If `lookup_variety` returns "not found" but `search_docs` surfaced the
chunk, that's a bug — please report it. (In normal operation, every
chunk's `source_key` round-trips to a valid sidecar.)