Add ProHarvest Seeds: 119 varieties + 161 cross-vendor plot reports
ProHarvest Seeds (independent Corn Belt brand, proharvestseeds.com) exposes a public, no-auth WordPress REST API — cleaner ingestion than the HTML-only independents. Two new sources: - `proharvest` (variety identity, 119 row-crop varieties: 70 corn / 47 soy / 2 wheat). Enumerated via /wp/v2/seed (seed-type taxonomy), agronomics parsed from each /seed/<slug>/ detail page into structured characteristics_groups so the ratings actually embed. Mixed scale: disease 1-9 numeric (9=best, no flip), agronomic/general qualitative, soil HR/R. - `proharvest_plots` (trials, data_type=trial, 161 plots, 2024+2025). Per- cooperator harvest reports via the custom /wp-json/proharvest/v1/plots?y= endpoint + PDF table extraction. Many are cross-vendor head-to-head (ProHarvest/Apex vs Pioneer/DEKALB/Becks/Channel/Wyffels). Handles ruled tables, unruled tables (text fallback; soy drops the Test-Wt column → 4 vs 5 numerics), and off-template third-party reports (sanity-gated to verbatim so junk rows never ship). Image-only PDFs skipped + counted. - rag/chunk.py: route proharvest_plots through the shared cross-vendor plot renderer (structured) / verbatim body (raw_text fallback). - sources.json + lessons.md (rating-scales, trial-data). - README/CLAUDE.md corpus inventory brought current (it had drifted: bayer 931 not 475; ebberts/lg/agrigold were unlisted). New totals: 1,645 variety + 6,787 trial records. robots.txt permissive (only search + /dealer-* disallowed); no ToS automation clause. CI rebuilds the index from the committed corpus.
This commit is contained in:
@@ -10,22 +10,29 @@ vendors — **variety identity** (what each hybrid IS) plus **yield-trial data**
|
||||
|
||||
## What's in the corpus
|
||||
|
||||
**5,073 indexed chunks** across two complementary surfaces:
|
||||
**~8,400 indexed records** (one chunk each) across two complementary surfaces:
|
||||
|
||||
### Variety identity — 760 records
|
||||
### Variety identity — 1,645 records
|
||||
|
||||
| Source | Count | Vendor | Brand |
|
||||
|---|---|---|---|
|
||||
| `bayer_seeds` | 475 | Bayer | DEKALB (corn) / Asgrow (soy) / WestBred (wheat) |
|
||||
| `bayer_seeds` | 931 | Bayer | DEKALB / Channel (corn) / Asgrow (soy) / WestBred (wheat) / Deltapine |
|
||||
| `lg_seeds` | 170 | AgReliant | LG Seeds (corn / soy / sorghum) |
|
||||
| `golden_harvest` | 139 | Syngenta | Golden Harvest (corn / soy) |
|
||||
| `nk` | 122 | Syngenta | NK (corn / soy) |
|
||||
| `proharvest` | 119 | ProHarvest Seeds | ProHarvest / Apex (corn / soy / wheat) — **independent Corn Belt brand** |
|
||||
| `agrigold` | 111 | AgReliant | AgriGold (corn / soy) |
|
||||
| `ebberts_seeds` | 29 | Ebbert's Seeds | Ebbert's (corn / soy / wheat) — independent E. Corn Belt breeder |
|
||||
| `agripro` | 24 | Syngenta | AgriPro (wheat — HRW / HRS / HWS / SWW) |
|
||||
|
||||
### Yield-trial data — 4,313 documents
|
||||
### Yield-trial data — 6,787 documents
|
||||
|
||||
| Source | Count | Notes |
|
||||
|---|---|---|
|
||||
| `gh_plot_reports` | 4,299 | Golden Harvest plot reports 2024+2025. **Cross-vendor head-to-head** — DEKALB / NK / GH / Pioneer / Channel all appear in the same trial rankings. The closest thing to independent comparison data the corpus has. |
|
||||
| `gh_plot_reports` | 4,299 | Golden Harvest plot reports 2024+2025. **Cross-vendor head-to-head** — DEKALB / NK / GH / Pioneer / Channel all appear in the same trial rankings. |
|
||||
| `lg_plot_reports` | 1,307 | LG Seeds (AgReliant) cross-vendor plots, top-5 per site, 2024+2025. |
|
||||
| `agrigold_plot_reports` | 1,006 | AgriGold (AgReliant) cross-vendor plots, full ranking + rich plot management, 2024+2025. |
|
||||
| `proharvest_plots` | 161 | ProHarvest Seeds per-cooperator harvest reports (corn / soy, 2024+2025). Many are **cross-vendor** (ProHarvest / Apex vs Pioneer / DEKALB / Becks / Channel / Wyffels). Structured rank/yield/%H2O/test-wt where the PDF fits the template; off-template third-party reports kept verbatim. |
|
||||
| `agripro_trials` | 14 | Regional wheat trial PDF summaries (PNW, Western Plains, Northern Plains, etc.) |
|
||||
|
||||
### Not in the corpus (documented in `docs_mcp/lessons.md`)
|
||||
|
||||
Reference in New Issue
Block a user