User flagged that Channel is expanding into their area — re-walked
the cropscience.bayer.us sitemap and found 8 additional brand×crop
paths beyond the original DEKALB/Asgrow/WestBred triple. Patches
the scraper to walk all of them; total Bayer varieties roughly
doubles from 475 to 931 and the corpus picks up first-ever
coverage in sorghum (36), cotton (30), canola (6), and silage as a
distinct crop (was conflated with corn before).
Net new varieties: 456
Channel corn=181 soy=67 silage=54 sorghum=18 (320)
DEKALB silage=82 sorghum=18 canola=6 (106)
Deltapine cotton=30 (30)
scrape/sources/bayer_seeds.py
- Replace `BRANDS` (brand → 1 path) and `CROP_SUFFIX` (brand → 1
suffix) with a flatter `BRAND_PATHS` list of (brand, url_path,
crop, is_primary_for_brand) entries. Channel and DEKALB are now
multi-crop brands; the same scraper walks every brand×crop pair.
- source_key derivation: for a brand's PRIMARY crop, strip the
trailing `-<crop>` suffix (matches the existing deployed source
keys for DEKALB corn / Asgrow soy / WestBred wheat). For
SECONDARY crops, KEEP the suffix so DEKALB-the-same-SKU sold as
both grain corn and silage gets two distinct source_keys
(collision-safe and unambiguous for `lookup_variety`).
- New `--crop` CLI filter for incremental backfills.
- Log line shows brand + crop alongside source_key for visibility.
rag/chunk.py
- Channel + Deltapine pages use slightly different characteristics
group labels (DISEASE not DISEASE RATINGS, AGRONOMIC
CHARACTERISTICS not GROWTH/HARVEST, plus MATURITY / ADAPTATION /
HERBICIDES / OTHER). Fold them into the DISEASE / AGRONOMIC /
MANAGEMENT label sets so the chunker buckets them correctly
into the standard sections.
Smoke-tested cross-brand × cross-crop queries against the rebuilt
index (5,529 chunks total) — all 6 sample queries surface the
right brand+crop at top-3:
Channel corn 110 RM → 210-25TRE BRAND
Channel soy 2.5 MG IA → 2622RXF BRAND
Deltapine cotton XF → DP 1820 B3XF BRAND
Sorghum dryland Kansas → 6B95 BRAND (Channel)
Silage corn WI dairy → DKC64-44RIB BRAND BLEND (silage variant)
Canola Northern Plains → DK401TL BRAND
Watchtower will pull the new image on the next push; deploy is
unchanged otherwise.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace stub with working scraper for all three Bayer seed brands.
Discovery uses the public sitemap-dynamic.xml (475 varieties:
288 DEKALB corn + 102 Asgrow soy + 85 WestBred wheat — matches recon).
Per-variety detail comes from the page's __NEXT_DATA__ JSON island.
Each variety writes corpus/bayer_seeds/<source_key>.{md,json} with:
- Identity (brand, crop, hybridLabel, productId, releaseYear)
- Maturity routed per crop (RM for corn, MG for soy, qualitative for wheat)
- Trait stack (code + full name)
- Positioning + strengths narrative
- Characteristics groups (DISEASE RATINGS, GROWTH, MANAGEMENT, HARVEST,
etc.) preserved verbatim from source so the chunker can re-bucket
into canonical disease/agronomic flats per CLAUDE.md schema
- Regional seed-guide listings with agronomist contacts
- _scale_direction tag (Bayer = "1-9 (9 = best)") for chunker
Smoke-tested all three brands (--limit 2 each, plus --product, --force,
and scrape.runner dispatch). Politeness: 1 req/sec, retries on 429/5xx
with Retry-After honored.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>