Three new brand scrapers: LG Seeds + AgriGold + Ebbert's Seeds (+310 varieties) #14
Reference in New Issue
Block a user
Delete Branch "lg-agrigold-ebberts"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
User flagged LG Seeds, AgriGold, and Ebbert's (local Ohio/Indiana breeder) are all active in farmer territory. Three new scrapers, three different shapes, 310 new varieties, three new brands, first-ever alfalfa coverage.
lg_seedsvar products = [...]JSON; per-variety detail has<span class="bar-N">ratings on 1-9 (9=best)agrigold/corn/explore-corn-hybrids/<CODE>URLs; ratings rendered as 5-circle scale (1-5, 5=best) — distinct from the 1-9 brandsebberts_seedsCrawl-delay: 5. 1-5 scale (1=best, lower = more resistant)Corpus state after merge
Smoke tests
brand="LG Seeds"+ crop=corn → returns LG corn varieties (LG65C89, etc.) ✓brand="AgriGold"+ crop=corn → returns AgriGold varieties (A626-39,A618-34) ✓brand="Ebbert's Seeds"+ crop=corn → returns Ebbert's regional (7442PC,7188PC) ✓LG5701→ LG5701,A616-30→ A616-30,7000TR RIB→ 7000TR RIB ✓7C300,5R300,4R300) — first alfalfa data the advisor can answer on ✓Notable choices
_scale_directionis set on every chunk so the LLM doesn't conflate.What's not in this PR (possible future)
/performance/cornand AgriGold/corn/performance/corn-yield-resultslook like they publish plot reports analogous to GH's. Could become two more trial sources (lg_plot_reports,agrigold_plot_reports).User flagged LG, AgriGold, and Ebbert's (local Ohio breeder) are all active in farmer territory. Built three scrapers — corpus now covers 5,839 chunks across 11 brands. Net new varieties: 310 lg_seeds 170 — corn 78 + soy 63 + alfalfa 16 + sorghum 13 → adds FIRST alfalfa coverage (FD 3-5 range) agrigold 111 — corn 60 + soy 51 ebberts_seeds 29 — corn 17 + soy 12 (regional OH/IN breeder) scrape/sources/lg_seeds.py — embedded-JSON pattern (cleanest): - /products/<crop> pages have a `var products = [...]` blob with the variety summary (Variety, Maturity, Traits[], Bullets[], CropType). - Per-variety detail page (/products/<crop>/<Variety>) carries the ratings as `<span class="bar-N">` where N is 1-9 on the canonical scale. Same 9=best direction as Bayer / Golden Harvest. - Three sections per page: Characteristics / Management / Disease Tolerance, plus a few qualitative bars ("Tar Spot Susceptible", "Fungicide Response High") preserved as text values. scrape/sources/agrigold.py — 5-circle scale: - Listing page has 60+ /corn/explore-corn-hybrids/<CODE> URLs. - Detail page renders ratings as <div class="scale"> blocks with 5 child <div class="circle"> elements, of which N have class "circle selected" → rating N on a 1-5 scale. - 7 sections per page incl. Silage Characteristics (Dairy Silage Rating, NDFd 30 Hr, Crude Protein), Planting Applications, Soil Adaptability, Plant Characteristics, Product Features. - Distinct rating direction (1-5 vs Bayer's 1-9) — declared in _scale_direction so chunker preamble renders correctly. scrape/sources/ebberts_seeds.py — small regional breeder, verbatim text approach: - Single page per crop (corn / soybeans / wheat). Each variety is an <h1> + multi-section CSS-grid block where labels and values are in separate adjacent cells. Reconstructing perfectly-aligned columns for a 29-variety total isn't worth the engineering — chunk body carries the verbatim text in document order, LLM can read the tabular content. - Scale: 1-5 (1 = best, lower = more resistant), inferred from marketing-vs-rating cross-checks ("Robust tall plants" + STANDABILITY 1.0 → 1 = best). - Politeness: robots.txt asks for Crawl-delay: 5; honored. All three new scrapers smoke-tested: - LG corn LG5701 RM 116 SmartStax → 3 characteristic groups with Disease Tolerance ratings (Northern/Southern Leaf Blight 8-9, etc.) - AgriGold A616-30 RM 86 VT2RIB → 7 groups incl. silage and soil adaptability ratings - Ebbert's 7000TR RIB RM 100 → 1098-char verbatim body covering CHARACTERISTICS, DISEASE RATINGS, herbicide tolerance, etc. Corpus state after this PR: - 5,839 chunks (was 5,529) - 11 brands (was 8) - 8 crops (corn 3047, soy 2209, silage 359, wheat 123, sorghum 49, cotton 30, alfalfa 16, canola 6) — alfalfa is brand-new Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>