Add ProHarvest Seeds: 119 varieties + 161 cross-vendor plot reports
ProHarvest Seeds (independent Corn Belt brand, proharvestseeds.com) exposes a public, no-auth WordPress REST API — cleaner ingestion than the HTML-only independents. Two new sources: - `proharvest` (variety identity, 119 row-crop varieties: 70 corn / 47 soy / 2 wheat). Enumerated via /wp/v2/seed (seed-type taxonomy), agronomics parsed from each /seed/<slug>/ detail page into structured characteristics_groups so the ratings actually embed. Mixed scale: disease 1-9 numeric (9=best, no flip), agronomic/general qualitative, soil HR/R. - `proharvest_plots` (trials, data_type=trial, 161 plots, 2024+2025). Per- cooperator harvest reports via the custom /wp-json/proharvest/v1/plots?y= endpoint + PDF table extraction. Many are cross-vendor head-to-head (ProHarvest/Apex vs Pioneer/DEKALB/Becks/Channel/Wyffels). Handles ruled tables, unruled tables (text fallback; soy drops the Test-Wt column → 4 vs 5 numerics), and off-template third-party reports (sanity-gated to verbatim so junk rows never ship). Image-only PDFs skipped + counted. - rag/chunk.py: route proharvest_plots through the shared cross-vendor plot renderer (structured) / verbatim body (raw_text fallback). - sources.json + lessons.md (rating-scales, trial-data). - README/CLAUDE.md corpus inventory brought current (it had drifted: bayer 931 not 475; ebberts/lg/agrigold were unlisted). New totals: 1,645 variety + 6,787 trial records. robots.txt permissive (only search + /dealer-* disallowed); no ToS automation clause. CI rebuilds the index from the committed corpus.
This commit is contained in:
@@ -92,6 +92,16 @@ as Best/Good/Fair/Poor labels which are qualitative.)
|
||||
data is publicly available, so most disease/agronomic ratings are
|
||||
absent from Beck's records in this corpus.
|
||||
|
||||
**ProHarvest Seeds**: **mixed scales** on one record. *Disease
|
||||
Tolerance* is `1-9 numeric, 9 = best / most tolerant` (same direction
|
||||
as Bayer — no flip; `NA` = not rated). *General Characteristics* and
|
||||
*Agronomic Features* are qualitative (`Excellent / Very Good / Good /
|
||||
Average`) with a few raw numerics (GDD pollination/black-layer, kernel
|
||||
rows). *Soil Adaptability* uses `HR` (highly recommended) / `R`
|
||||
(recommended). The single `_scale_direction` line on the record states
|
||||
all three. Ebbert's-style independent brand, but ratings ARE parsed
|
||||
into structured groups so they're retrievable.
|
||||
|
||||
**Always check the chunk's "Rating scale" line or call
|
||||
`lookup_variety(source_key)` and look at `_scale_direction` if you
|
||||
are unsure.** Cross-vendor comparisons are valid AFTER you've
|
||||
@@ -275,6 +285,16 @@ The MCP exposes TWO complementary surfaces:
|
||||
multi-location wheat performance for Northern Plains / Pacific
|
||||
Northwest / Plains regions. Variety + per-location yields
|
||||
preserved verbatim.
|
||||
- **LG Seeds + AgriGold plot reports** (AgReliant) — additional
|
||||
cross-vendor corn/soy plots (same head-to-head structure as the
|
||||
GH reports).
|
||||
- **ProHarvest Seeds plot reports** (corn + soy, 2024+2025) —
|
||||
per-cooperator harvest reports from an independent Corn Belt brand.
|
||||
Many are cross-vendor (ProHarvest / Apex vs Pioneer / DEKALB /
|
||||
Becks / Merschman, etc.). Structured rank/yield/%H2O/test-weight
|
||||
tables where the PDF fits ProHarvest's template; foreign-format
|
||||
third-party reports are kept verbatim (`raw_text`) so the yields
|
||||
are still searchable. Image-only PDFs (no text layer) are skipped.
|
||||
|
||||
**Recommended workflow when a farmer asks about performance**:
|
||||
|
||||
|
||||
Reference in New Issue
Block a user