Files

T

justin eaa7e0789b bayer_seeds: add Channel + DEKALB silage/sorghum/canola + Deltapine cotton

User flagged that Channel is expanding into their area — re-walked
the cropscience.bayer.us sitemap and found 8 additional brand×crop
paths beyond the original DEKALB/Asgrow/WestBred triple. Patches
the scraper to walk all of them; total Bayer varieties roughly
doubles from 475 to 931 and the corpus picks up first-ever
coverage in sorghum (36), cotton (30), canola (6), and silage as a
distinct crop (was conflated with corn before).

Net new varieties: 456
  Channel    corn=181  soy=67   silage=54  sorghum=18    (320)
  DEKALB     silage=82 sorghum=18  canola=6              (106)
  Deltapine  cotton=30                                    (30)

scrape/sources/bayer_seeds.py
- Replace `BRANDS` (brand → 1 path) and `CROP_SUFFIX` (brand → 1
  suffix) with a flatter `BRAND_PATHS` list of (brand, url_path,
  crop, is_primary_for_brand) entries. Channel and DEKALB are now
  multi-crop brands; the same scraper walks every brand×crop pair.
- source_key derivation: for a brand's PRIMARY crop, strip the
  trailing `-<crop>` suffix (matches the existing deployed source
  keys for DEKALB corn / Asgrow soy / WestBred wheat). For
  SECONDARY crops, KEEP the suffix so DEKALB-the-same-SKU sold as
  both grain corn and silage gets two distinct source_keys
  (collision-safe and unambiguous for `lookup_variety`).
- New `--crop` CLI filter for incremental backfills.
- Log line shows brand + crop alongside source_key for visibility.

rag/chunk.py
- Channel + Deltapine pages use slightly different characteristics
  group labels (DISEASE not DISEASE RATINGS, AGRONOMIC
  CHARACTERISTICS not GROWTH/HARVEST, plus MATURITY / ADAPTATION /
  HERBICIDES / OTHER). Fold them into the DISEASE / AGRONOMIC /
  MANAGEMENT label sets so the chunker buckets them correctly
  into the standard sections.

Smoke-tested cross-brand × cross-crop queries against the rebuilt
index (5,529 chunks total) — all 6 sample queries surface the
right brand+crop at top-3:
  Channel corn 110 RM       → 210-25TRE BRAND
  Channel soy 2.5 MG IA     → 2622RXF BRAND
  Deltapine cotton XF       → DP 1820 B3XF BRAND
  Sorghum dryland Kansas    → 6B95 BRAND (Channel)
  Silage corn WI dairy      → DKC64-44RIB BRAND BLEND (silage variant)
  Canola Northern Plains    → DK401TL BRAND

Watchtower will pull the new image on the next push; deploy is
unchanged otherwise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-26 11:54:30 -04:00

sources

bayer_seeds: add Channel + DEKALB silage/sorghum/canola + Deltapine cotton

2026-05-26 11:54:30 -04:00

__init__.py

seed-mcp scaffold: clone docs-mcp-template, customize for crop_seed PRODUCT_NAME

2026-05-25 12:28:49 -04:00

changelog.py

seed-mcp scaffold: clone docs-mcp-template, customize for crop_seed PRODUCT_NAME

2026-05-25 12:28:49 -04:00

README.md

seed-mcp scaffold: clone docs-mcp-template, customize for crop_seed PRODUCT_NAME

2026-05-25 12:28:49 -04:00

runner.py

seed-mcp scaffold: clone docs-mcp-template, customize for crop_seed PRODUCT_NAME

2026-05-25 12:28:49 -04:00

README.md

scrape/

Per-vendor seed catalog scrapers + the runner that dispatches to them. Each source lives in scrape/sources/<name>.py with a main() entrypoint. The runner is a thin shim:

python -m scrape.runner --source bayer_seeds --force
python -m scrape.runner --source golden_harvest --limit 20
python -m scrape.runner --all                # only GREEN sources

Output layout

Each scraper writes:

corpus/<source>/<source_key>.md — LLM-visible body (chunk_0 preamble + the variety's marketing + agronomic narrative)
corpus/<source>/<source_key>.json — sidecar metadata (per CLAUDE.md's canonical schema)

source_key is a stable per-vendor slug — typically <brand>-<sku> lowercased, e.g. dekalb-dkc62-08rib. Stability matters: it's the join key the MCP uses for get_page(source, source_key).

Sources

Source	Module	Verdict	Notes
`bayer_seeds`	`bayer_seeds.py`	🟢	DEKALB + Asgrow + WestBred, ~475 varieties
`golden_harvest`	`golden_harvest.py`	🟢	~175 varieties, 9-to-1 disease scale (reverse)
`nk`	`nk.py`	🟢	29 varieties, ratings in CDN PDFs
`agripro`	`agripro.py`	🟢	24 wheat varieties
`becks_pfr`	`becks_pfr.py`	🟡	2,089 research docs via public Sanity GROQ
`becks_products`	`becks_products.py`	🟡	860 products, identity-only (SeedIQ-gated)

Pioneer is intentionally absent — see CLAUDE.md and the curated Pioneer fallback in docs_mcp/lessons.md.

Tips

Sniff before you scrape. Most catalogs are SPAs that call a backend API. The recon docs in ~/.claude/projects/-home-justin/ memory/reference_seed_vendor_recon.md already capture the endpoints; if you find new ones, update that file.
Idempotent re-scrapes. Without --force, skip pages already on disk. With --force, re-fetch everything — that's the monthly cron mode.
Respect the portals. Backoff on 429s. Set a recognizable user-agent (seed-mcp-scraper/<version>).
Normalize at chunk time, not at scrape time. The chunker (Phase 2) handles the 9-to-1 → 1-9 disease-scale flip for Golden Harvest, NOT this scraper. Sidecar JSON should preserve the vendor's raw values + a _scale_direction field; the chunker reads that and normalizes the markdown body.

changelog.py

Reusable as-is from the template. Walks git diff --name-status output for the commit summary, and git log for the digest history (Phase 13).