ProHarvest Seeds (independent Corn Belt brand, proharvestseeds.com) exposes
a public, no-auth WordPress REST API — cleaner ingestion than the HTML-only
independents. Two new sources:
- `proharvest` (variety identity, 119 row-crop varieties: 70 corn / 47 soy /
2 wheat). Enumerated via /wp/v2/seed (seed-type taxonomy), agronomics
parsed from each /seed/<slug>/ detail page into structured
characteristics_groups so the ratings actually embed. Mixed scale: disease
1-9 numeric (9=best, no flip), agronomic/general qualitative, soil HR/R.
- `proharvest_plots` (trials, data_type=trial, 161 plots, 2024+2025). Per-
cooperator harvest reports via the custom /wp-json/proharvest/v1/plots?y=
endpoint + PDF table extraction. Many are cross-vendor head-to-head
(ProHarvest/Apex vs Pioneer/DEKALB/Becks/Channel/Wyffels). Handles ruled
tables, unruled tables (text fallback; soy drops the Test-Wt column → 4 vs
5 numerics), and off-template third-party reports (sanity-gated to verbatim
so junk rows never ship). Image-only PDFs skipped + counted.
- rag/chunk.py: route proharvest_plots through the shared cross-vendor plot
renderer (structured) / verbatim body (raw_text fallback).
- sources.json + lessons.md (rating-scales, trial-data).
- README/CLAUDE.md corpus inventory brought current (it had drifted: bayer
931 not 475; ebberts/lg/agrigold were unlisted). New totals: 1,645 variety
+ 6,787 trial records.
robots.txt permissive (only search + /dealer-* disallowed); no ToS
automation clause. CI rebuilds the index from the committed corpus.