Add RobSeeCo (Rob-See-Co + Innotech): 130 corn/soy varieties from the seed-guide PDF (#18)
Image rebuild (skip scrape) / build (push) Successful in 4m48s

Co-authored-by: claude <claude@jpaul.io>
Co-committed-by: claude <claude@jpaul.io>
This commit was merged in pull request #18.
This commit is contained in:
2026-06-09 23:29:38 -04:00
committed by Claude (agent)
parent 84ad2b1de6
commit 0bac06b7b6
265 changed files with 23133 additions and 6 deletions
+19
View File
@@ -288,6 +288,25 @@
"tos_check_date": "2026-06-04",
"tos_note": "Data via the Seedware JSON API (burrus25.seedware.net). burrusseed.com robots.txt blocks ~33 NAMED AI/scraper bots and carries Content-signal: ai-train=no + Crawl-delay 10; User-agent:* IS allowed and the ToS has NO scraping clause. Operator chose to include this source despite the ai-train=no signal; scraper uses a non-blacklisted UA (seed-mcp-scraper) and honors Crawl-delay 10 (>=10s between requests).",
"schema_notes": "Seedware JSONP API: GET https://burrus25.seedware.net/app/_queries/crop_varieties.php?crop_pkey=101(corn)|102(soy)&callback=cb (requires a callback param + Referer https://burrusseed.com/; strip the JSONP wrapper). ~40 fields/record incl. brand, maturity (RM/MG), released, and many stat_* ratings → mapped into characteristics_groups: DISEASE RATINGS (gray leaf spot, tar spot, BSR, SDS, phytophthora), AGRONOMIC CHARACTERISTICS (drought, greensnap, stalk/root strength, standability, emergence, etc.), HERBICIDE TOLERANCE (glyphosate/glufosinate/2,4-D/dicamba/FOPs, Yes/No) + Bt insect-protection (Yes/No). SCALE: numeric agronomic+disease 1-10, 10 = best/most-tolerant (HIGHER = better; observed 4-10); NR/blank/0/'-' = not rated. Per-variety tech-sheet PDFs exist (getTechSheet/<pkey>) — not ingested this pass."
},
{
"name": "robseeco",
"vendor": "RobSeeCo",
"brands": [
"Rob-See-Co",
"Innotech"
],
"crops": [
"corn",
"soybeans"
],
"verdict": "green",
"expected_count": 130,
"base_url": "https://www.robseeco.com",
"scope_filter": "Independent regional seed co (Elkhorn, NE; rolled up Federal Hybrids/Big Cob/Kiser/Rupp's grain-forage). Row-crop core: corn (Rob-See-Co brand, 87) + soybean (Rob-See-Co RS#### + Innotech IS####, 43). Masters Choice silage corn + sorghum sections EXCLUDED (out of row-crop scope).",
"tos_check_date": "2026-06-09",
"tos_note": "Squarespace site; robots.txt does NOT block AI/content crawling (the AI-bot UAs incl. ClaudeBot/anthropic-ai are grouped with User-agent:* under standard Squarespace exclusions only — no Disallow:/, AI-block toggle off, no expressed AI opt-out). No anti-scraping ToS clause found. Lineup is published as a public seed-guide PDF on a Squarespace CDN URL (not disallowed). UA seed-mcp-scraper.",
"schema_notes": "PDF-extraction source (no structured web catalog — Squarespace visual grid). Download the 2026 Seed Guide PDF (https://www.robseeco.com/s/2026_RobSeeCo-Seed-Guide_FINAL-LR-Single.pdf, follow redirect to static1.squarespace.com; ~18MB, 52 pages; cached under var/, gitignored). EVERY content page is DUPLICATED (p5==p6, p9==p10, ...) → dedup by source_key. Sections: corn ratings table p5-8 + 2-col descriptive cards p9-18; soy ratings table p19-26 + cards. Masters Choice silage (p27-38) + sorghum (p39-42) scoped OUT. Rotated/vertical column headers reconstructed by clustering rotated words by x0; each data cell mapped to its column by X-CENTER alignment (whitespace tokenization is unreliable around sparse cells). Cards joined by code to enrich trait_stack (corn -RR2/-VT2P/-Conv suffixes) + strengths bullets. characteristics_groups: AGRONOMIC (emergence/vigor/root/stalk/greensnap/staygreen/drydown/drought/plant+ear height/test wt) + DISEASE (GLS/Goss/NCLB/Tar Spot/fungicide response; soy SCN source+score/IDC/Phytophthora gene+PRR/BSR/SWM/SDS). SCALE: 1-9, 9=Best (HIGHER=better, same direction as Bayer/Stine-corn); '-'=not available; soy disease letter codes R/MR/S; Product Fit Geography A/C/E/W/CW. Column map verified against descriptive-card bullets; 0 card-only fallbacks (all 130 parsed from the table)."
}
],
"_excluded_sources": [