Add ProHarvest Seeds: 119 varieties + 161 cross-vendor plot reports #16
Reference in New Issue
Block a user
Delete Branch "add-proharvest"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Adds ProHarvest Seeds (independent Corn Belt brand, proharvestseeds.com) to the corpus. ProHarvest exposes a public, no-auth WordPress REST API — cleaner than the HTML-only independents (Ebbert's). Two new sources.
proharvest— variety identity (119 records)70 corn-hybrid + 47 soybean + 2 wheat (forage/grass/cover-crop/sweet-corn excluded — out of scope). Enumerated via
/wp/v2/seed(seed-type taxonomy);acf/contentaren't registered to REST, so agronomics are parsed from each/seed/<slug>/detail page (<h2>spec sections of<strong>label</strong><div>value</div>pairs) into structuredcharacteristics_groupsso the ratings embed (unlikeebberts_seeds, which left them body-only). Mixed scale, documented in_scale_direction+ lessons: Disease Tolerance 1-9 numeric (9=best, same direction as Bayer/NK — no flip), General/Agronomic qualitative (Good/Very Good), Soil Adaptability HR/R.proharvest_plots— cross-vendor yield trials (161 docs,data_type=trial)Per-cooperator harvest reports via the custom
GET /wp-json/proharvest/v1/plots?y=<year>endpoint (2024+2025 baseline; older years behind--include-old) + PDF table extraction. Emits the same sidecar shape as the gh/lg/agrigold plot reports → routed through the shared_render_gh_plot_chunk. Many are genuinely cross-vendor (ProHarvest/Apex vs Pioneer / DEKALB / Becks / Channel / Wyffels / NK / AgriGold / LG).Robust against three PDF realities:
extract_tables()column split;raw_textso junk rows never ship and the cross-vendor yields stay searchable.Image-only PDFs (no text layer) are skipped and counted (no silent cap). 139 structured + 22 verbatim + 1 image-skip of 162.
Plumbing + docs
rag/chunk.py:proharvest_plotsbranch (structured → cross-vendor renderer;raw_text→ verbatim body).sources.json: 2 entries (tos_check_date 2026-06-04 — robots permissive, no ToS automation clause).docs_mcp/lessons.md: rating-scales + trial-data entries.Validation
py_compileclean;sources.jsonvalid.image-only.yml) rebuilds the Chroma+BM25 indexes from the committed corpus, then builds+pushes the image → Watchtower deploys.