justin 9ce920f622 agripro + nk scrapers — 146 Syngenta varieties added (wheat + corn/soy)
agripro (24 varieties)
- Drupal Views form scrape via /search-agripro-brand-varieties with
  explicit GET params (sidesteps the AJAX-only-on-load default that
  returns an empty form skeleton).
- Per-variety parse: <h1>, .field--node--variety-type--variety,
  .field--node--tag-line--variety, .field--node--body, plus the
  three rated sections (Agronomics / Grain / Disease) with their
  <div class="row"><div class="label">label</div><div>value</div>
  pairs.
- Wheat-class distribution: 12 HRS, 7 SWW, 3 HRW, 1 HWS, 1 Barley
  — provides the Northern Plains HRS coverage WestBred lacks.

nk (122 varieties — recon's "29" was outdated; the current NK seed
finder lists 41 corn + 81 soy)
- ASP.NET WebForms endpoint:
  POST /NKSeeds/{Corn,Soy}ProductFinder.aspx/GetProducts returns
  {"d": "<html>"} where the inner HTML is one <div class="sf-result">
  per variety. BeautifulSoup tokenizes the whole blob.
- Per-card: product code (NK8005, NK008-P8XF), RM/MG from the
  title <span>, "Brands Available" trait variants, marketing
  positioning + bullet strengths, tech-sheet PDF URL.
- pdfplumber text extraction on the tech-sheet PDFs adds:
  * corn disease ratings (Gray Leaf Spot, NCLB, Goss's Wilt,
    Anthracnose, Tar Spot, Fusarium, etc.) where the PDF prints
    "Label N" lines (text-extractable)
  * soybean Phytophthora source genes (Rps1c, Rps3a, ...)
  * soybean SCN race coverage
  * soybean agronomic ratings (Emergence, Standability, Shatter
    Tolerance, Green Stem) with text-extractable 1-9 values
  * soybean soil-type adaptation (Best/Good/Fair/Poor) for drought
    prone / high pH / poorly drained / etc.
- Agronomic rating BARS for corn (Emergence, Stalk Strength,
  Drought) are not text-extractable; we record the labels with an
  explicit "rated in PDF chart, see tech sheet" value so the agent
  can direct the farmer at the source for those numbers.

Scale-direction correction in lessons.md:
- NK and AgriPro both use 1 = best, lower = more resistant — the
  REVERSED convention vs Bayer / Golden Harvest. NK's tech-sheet
  footer literally prints "1-9 Scale: 1 = Best, 9 = Worst".
  AgriPro positioning on stripe-rust-resistant varieties (AP Iliad
  with Stripe Rust 1, Eyespot 2) confirms the same direction.
- sources-not-yet-indexed section trimmed to just Beck's PFR +
  Beck's products — everything else IS now in the corpus.

Cross-vendor coverage after this PR: 760 varieties.
  bayer_seeds     475 (DEKALB 288 / Asgrow 102 / WestBred 85)
  golden_harvest  139
  nk              122  (41 corn / 81 soy)
  agripro          24  (12 HRS / 7 SWW / 3 HRW / 1 HWS / 1 Barley)
Vendors: Bayer, Syngenta. Brands: 6. Crops: corn, soy, wheat (109
wheat now, up from 85).

requirements.txt: pdfplumber>=0.11 for NK tech-sheet parsing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 14:16:36 -04:00

seed-mcp

MCP server over the public catalogs of major US row-crop seed vendors — corn, soybeans, wheat. Sibling project to crop-chem-docs (pesticide labels), feeding the same Drawbar farm-advisor AI.

The server exposes per-variety records with agronomic ratings, disease tolerance, trait stack, maturity, and regional notes — so the advisor can answer questions like "which corn hybrid for sandy soil, drought-prone, RM ≤105 in northeast Iowa?" without rummaging through individual brand sites.

Vendor coverage

Vendor Verdict Varieties Notes
Bayer seeds (DEKALB + Asgrow + WestBred) 🟢 ~475 Same cropscience.bayer.us Next.js infra as crop-chem-docs
Golden Harvest (Syngenta) 🟢 ~175 Sitemap + server-rendered HTML + Syngenta CDN PDFs
NK (Syngenta) 🟢 29 Shares PDF fetcher with Golden Harvest
AgriPro (Syngenta wheat) 🟢 24 Drupal Views, server-rendered
Beck's PFR 🟡 2,089 Public Sanity GROQ API (no auth)
Beck's products 🟡 860 Identity-only until SeedIQ XHR sniffed
Pioneer (Corteva) 🔴 ToS bans automation — curated fallback lesson instead

Quick start

git clone https://git.jpaul.io/justin/seed-mcp.git
cd seed-mcp
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt

# Run one scraper
python -m scrape.runner --source bayer_seeds --force

# Rebuild indexes
python -m rag.index --rebuild

# Local MCP server (stdio for Claude Desktop dev)
python -m docs_mcp.server --transport stdio

Tools exposed

Tool Purpose
search_docs Hybrid + rerank variety search with crop / RM / trait / region filters
get_page Full variety record by (source, source_key)
list_versions Discover crops, brands, traits, RM/MG ranges, wheat classes
corpus_status Counts + freshness; useful for health probes
crop_seed_api_lessons Curated agronomy lessons — Pioneer fallback, disease-scale normalization, regional placement heuristics

Build phases

This is a clone of docs-mcp-template. The 13 phases in PLAN.md apply:

Phase Status
0 — scaffold done
1 — first scraper (bayer_seeds) next
2 — chunk + index pending
3 — baseline MCP tools template defaults
4-5 — Dockerfile + CI done (placeholders filled)
6 — reranker shares llama-rerank sidecar with crop-chem-docs
7 — eval harness pending (curate ~25 queries)
8 — hybrid search done (template)
9 — diff_versions, list_cluster optional
11 — crop_seed_api_lessons curated layer pending

See CLAUDE.md for the canonical sidecar schema and the disease-scale-normalization gotcha (Golden Harvest is reversed).

Infrastructure

  • Registry: git.jpaul.io/justin/seed-mcp:latest (Watchtower) / :corpus-YYYY.MM.DD (production pin)
  • Embedder: shared Ollama pool with crop-chem-docs (Gitea-host GPUs + Windows Ollama; CI never hits trashpanda's production Ollama)
  • Reranker: shared llama-rerank sidecar on trashpanda's Tesla P4 (one container, both MCPs use it)
  • PRODUCT_NAME: crop_seed (not seed_mcp — used in Chroma collection, BM25 db filename, and crop_seed_api_lessons tool)
S
Description
MCP server over US row-crop seed/hybrid variety data (corn, soybeans, wheat). Sibling to crop-chem-docs. Feeds Drawbar farmer advisor.
Readme 23 MiB
Languages
Python 99.7%
Dockerfile 0.3%