Three new brand scrapers: LG Seeds + AgriGold + Ebbert's Seeds (+310 varieties)

User flagged LG, AgriGold, and Ebbert's (local Ohio breeder) are all active in farmer territory. Built three scrapers — corpus now covers 5,839 chunks across 11 brands. Net new varieties: 310 lg_seeds 170 — corn 78 + soy 63 + alfalfa 16 + sorghum 13 → adds FIRST alfalfa coverage (FD 3-5 range) agrigold 111 — corn 60 + soy 51 ebberts_seeds 29 — corn 17 + soy 12 (regional OH/IN breeder) scrape/sources/lg_seeds.py — embedded-JSON pattern (cleanest): - /products/<crop> pages have a `var products = [...]` blob with the variety summary (Variety, Maturity, Traits[], Bullets[], CropType). - Per-variety detail page (/products/<crop>/<Variety>) carries the ratings as `<span class="bar-N">` where N is 1-9 on the canonical scale. Same 9=best direction as Bayer / Golden Harvest. - Three sections per page: Characteristics / Management / Disease Tolerance, plus a few qualitative bars ("Tar Spot Susceptible", "Fungicide Response High") preserved as text values. scrape/sources/agrigold.py — 5-circle scale: - Listing page has 60+ /corn/explore-corn-hybrids/<CODE> URLs. - Detail page renders ratings as <div class="scale"> blocks with 5 child <div class="circle"> elements, of which N have class "circle selected" → rating N on a 1-5 scale. - 7 sections per page incl. Silage Characteristics (Dairy Silage Rating, NDFd 30 Hr, Crude Protein), Planting Applications, Soil Adaptability, Plant Characteristics, Product Features. - Distinct rating direction (1-5 vs Bayer's 1-9) — declared in _scale_direction so chunker preamble renders correctly. scrape/sources/ebberts_seeds.py — small regional breeder, verbatim text approach: - Single page per crop (corn / soybeans / wheat). Each variety is an <h1> + multi-section CSS-grid block where labels and values are in separate adjacent cells. Reconstructing perfectly-aligned columns for a 29-variety total isn't worth the engineering — chunk body carries the verbatim text in document order, LLM can read the tabular content. - Scale: 1-5 (1 = best, lower = more resistant), inferred from marketing-vs-rating cross-checks ("Robust tall plants" + STANDABILITY 1.0 → 1 = best). - Politeness: robots.txt asks for Crawl-delay: 5; honored. All three new scrapers smoke-tested: - LG corn LG5701 RM 116 SmartStax → 3 characteristic groups with Disease Tolerance ratings (Northern/Southern Leaf Blight 8-9, etc.) - AgriGold A616-30 RM 86 VT2RIB → 7 groups incl. silage and soil adaptability ratings - Ebbert's 7000TR RIB RM 100 → 1098-char verbatim body covering CHARACTERISTICS, DISEASE RATINGS, herbicide tolerance, etc. Corpus state after this PR: - 5,839 chunks (was 5,529) - 11 brands (was 8) - 8 crops (corn 3047, soy 2209, silage 359, wheat 123, sorghum 49, cotton 30, alfalfa 16, canola 6) — alfalfa is brand-new Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 12:42:23 -04:00
parent 06461ade1d
commit 30b182e28a
623 changed files with 75417 additions and 0 deletions
@@ -0,0 +1,502 @@
+"""AgriGold scraper — AgReliant Genetics brand.
+
+Source: ``www.agrigold.com`` — WordPress site, empty robots.txt
+(no Disallow). Catalog covers corn + soybeans. Sibling of LG Seeds
+under the same parent (AgReliant) but distinct branding /
+positioning, so kept in its own scraper.
+
+Discovery: the listing page ``/corn/explore-corn-hybrids`` (and
+the soybean equivalent) is server-rendered HTML that contains
+``<a href="/corn/explore-corn-hybrids/<CODE>">`` for every variety.
+Codes look like ``A616-30``, ``A623-88``, etc. Parse the listing
+HTML, collect distinct variety URLs.
+
+Per-variety detail (``/corn/explore-corn-hybrids/<CODE>``) renders
+several ``<div class="product-section ...">`` blocks. Each section
+has a ``<div class="title">`` heading + multiple ``.detail-item``
+rows shaped as ``<div class="label">N</div><div class="value">V</div>``.
+
+The ``<div class="value">`` content is one of:
+
+  - **5-circle rating scale** (Agronomic Rating, Disease Tolerance,
+    Silage Characteristics): ``<div class="scale">`` containing 5
+    children, where N have class ``circle selected`` and 5-N have
+    class ``circle``. Count = rating on a **1-5 scale** (5 = best).
+    Distinct from Bayer / LG Seeds' 1-9 convention — documented in
+    the sidecar's ``_scale_direction``.
+
+  - **Numeric value** (GDUs, year, plant population): bare number.
+
+  - **Categorical / qualitative** (Ear Flex Type "KERNEL",
+    Leaf Orientation "SEMI UPRIGHT", Cob Color "Red"): the literal
+    text.
+
+  - **NA**: rated but not yet measured.
+
+Rating scale: ``1-5 (5 = best)`` — distinct from the other brands;
+the chunker reads ``_scale_direction`` to render the correct
+preamble.
+
+Output:
+  corpus/agrigold/<source_key>.md
+  corpus/agrigold/<source_key>.json
+
+source_key: ``agrigold-<code>`` lowercased, e.g.
+``agrigold-a616-30``.
+
+CLI:
+  python -m scrape.sources.agrigold --crop corn --limit 5
+  python -m scrape.sources.agrigold --force
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import logging
+import os
+import random
+import re
+import sys
+import time
+from dataclasses import dataclass, field
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Any
+
+import requests
+from bs4 import BeautifulSoup
+
+SCRAPER_VERSION = "0.1.0"
+USER_AGENT = "seed-mcp-scraper/0.1 (+https://drawbar.example/contact)"
+BASE = "https://www.agrigold.com"
+
+LISTING_PATHS = {
+    "corn":     "/corn/explore-corn-hybrids",
+    "soybeans": "/soybeans/explore-soybean-varieties",
+}
+
+# AgriGold publishes ratings on a 1-5 scale (5 = best), counted from
+# the selected circles in the per-rating scale block. The chunker
+# preserves this verbatim — every chunk preamble declares the scale
+# so the LLM doesn't conflate with Bayer's 1-9.
+RATING_SCALE_DIRECTION = "1-5 (5 = best)"
+
+REPO_ROOT = Path(__file__).resolve().parents[2]
+CORPUS_ROOT = Path(os.environ.get("CORPUS_ROOT") or REPO_ROOT / "corpus")
+CORPUS_DIR = CORPUS_ROOT / "agrigold"
+
+REQ_INTERVAL_SEC = 1.0
+
+log = logging.getLogger("scrape.agrigold")
+
+
+# --------------------------------------------------------------------- HTTP
+
+
+class RateLimitedSession:
+    def __init__(self, interval: float = REQ_INTERVAL_SEC) -> None:
+        self.s = requests.Session()
+        self.s.headers["User-Agent"] = USER_AGENT
+        self.interval = interval
+        self._last = 0.0
+
+    def _wait(self) -> None:
+        delta = time.monotonic() - self._last
+        if delta < self.interval:
+            time.sleep(self.interval - delta)
+        self._last = time.monotonic()
+
+    def request(self, method: str, url: str, *, max_retries: int = 4,
+                timeout: float = 30.0, **kw: Any) -> requests.Response:
+        last_exc: Exception | None = None
+        for attempt in range(max_retries):
+            self._wait()
+            try:
+                resp = self.s.request(method, url, timeout=timeout, **kw)
+            except requests.RequestException as exc:
+                last_exc = exc
+                backoff = min(30.0, (2 ** attempt) + random.random())
+                log.warning("network error on %s %s: %s — retry in %.1fs",
+                            method, url, exc, backoff)
+                time.sleep(backoff)
+                continue
+            if resp.status_code == 429 or 500 <= resp.status_code < 600:
+                ra = resp.headers.get("Retry-After")
+                backoff = float(ra) if (ra and ra.isdigit()) else min(30.0, (2 ** attempt) + random.random())
+                log.warning("HTTP %d on %s %s — retry in %.1fs",
+                            resp.status_code, method, url, backoff)
+                time.sleep(backoff)
+                continue
+            return resp
+        if last_exc:
+            raise last_exc
+        return resp  # type: ignore[return-value]
+
+    def get(self, url: str, **kw: Any) -> requests.Response:
+        return self.request("GET", url, **kw)
+
+
+# --------------------------------------------------------------------- model
+
+
+@dataclass
+class AGProduct:
+    source_key: str
+    source_url: str
+    crop: str
+    product_name: str = ""
+    relative_maturity: str | None = None   # corn RM days from .maturity
+    maturity_group: str | None = None      # soy MG
+    trait_descriptions: list[str] = field(default_factory=list)
+    characteristics_groups: list[dict] = field(default_factory=list)
+
+
+# --------------------------------------------------------------------- discovery
+
+
+def discover_varieties(
+    http: RateLimitedSession, *, only_crop: str | None = None,
+) -> list[tuple[str, str, str]]:
+    """Return ``[(url, crop, variety_code), ...]`` for every variety in
+    the listing pages."""
+    out: list[tuple[str, str, str]] = []
+    for crop, path in LISTING_PATHS.items():
+        if only_crop and crop != only_crop:
+            continue
+        log.info("fetching listing %s%s", BASE, path)
+        r = http.get(f"{BASE}{path}")
+        r.raise_for_status()
+        # Collect distinct hrefs that look like /<crop>/explore-X-{hybrids,
+        # varieties}/<CODE>. Codes are alphanumeric with dashes.
+        href_re = re.compile(rf"^{re.escape(path)}/([\w\-]+)$")
+        seen: set[str] = set()
+        soup = BeautifulSoup(r.text, "html.parser")
+        for a in soup.find_all("a", href=True):
+            m = href_re.match(a["href"])
+            if not m:
+                continue
+            code = m.group(1)
+            # Filter out catalog-tool tails ("filter", "browse", etc.)
+            if not re.match(r"^[A-Z0-9][\w\-]{2,30}$", code, re.I):
+                continue
+            if code in seen:
+                continue
+            seen.add(code)
+            out.append((f"{BASE}{path}/{code}", crop, code))
+        log.info("  %s: %d varieties", crop, len(seen))
+    log.info("total varieties discovered: %d", len(out))
+    return out
+
+
+# --------------------------------------------------------------------- helpers
+
+
+def source_key_for(code: str) -> str:
+    slug = re.sub(r"[^a-zA-Z0-9-]+", "-", code).strip("-").lower()
+    return f"agrigold-{slug}"
+
+
+# Section class hint -> normalized label for the sidecar.
+SECTION_LABEL_MAP = {
+    "agronomic-rating":        "AGRONOMIC RATING",
+    "disease-tolerance":       "DISEASE TOLERANCE",
+    "plant-characteristics":   "PLANT CHARACTERISTICS",
+    "plant-features":          "PRODUCT FEATURES",
+    "silage-characteristics":  "SILAGE CHARACTERISTICS",
+    "planting-applications":   "PLANTING APPLICATIONS",
+    "planting-population":     "PLANTING POPULATION",
+}
+
+
+def _parse_scale(value_el) -> int | None:
+    """Count selected circles in a ``<div class="scale">`` block.
+    Returns 1-5 or None if no scale present."""
+    if value_el is None:
+        return None
+    scale = value_el.find("div", class_="scale")
+    if scale is None:
+        return None
+    selected = scale.find_all("div", class_=lambda c: c and "selected" in c)
+    return len(selected) if selected else 0
+
+
+def _parse_value(value_el) -> str:
+    """Extract a non-scale value: raw text contents, trimmed."""
+    if value_el is None:
+        return ""
+    # If it has a .scale child we should have caught it above. Otherwise
+    # return the leaf text.
+    text = value_el.get_text(" ", strip=True)
+    return text
+
+
+# --------------------------------------------------------------------- detail
+
+
+def fetch_product_detail(
+    http: RateLimitedSession, url: str, crop: str, code: str,
+) -> AGProduct:
+    r = http.get(url)
+    r.raise_for_status()
+    soup = BeautifulSoup(r.text, "html.parser")
+
+    prod = AGProduct(
+        source_key=source_key_for(code),
+        source_url=url,
+        crop=crop,
+        product_name=code,
+    )
+
+    # Maturity — often rendered as ``<div class="maturity">86 days</div>``.
+    mat_el = soup.find(class_="maturity")
+    if mat_el:
+        text = mat_el.get_text(strip=True)
+        m = re.search(r"(\d+(?:\.\d+)?)", text)
+        if m:
+            if crop == "corn":
+                prod.relative_maturity = m.group(1)
+            elif crop == "soybeans":
+                prod.maturity_group = m.group(1)
+
+    # Trait package — from .product-details / "Trait Package"
+    pd = soup.find(class_="product-details")
+    if pd:
+        # The details block renders pairs of label / value text:
+        # "Genetic Family | Icon-J | Trait Package | VT2RIB | ..."
+        # Parse the labels we recognize.
+        text = pd.get_text(" | ", strip=True)
+        m = re.search(r"Trait Package\s*\|\s*([^|]+?)(?:\s*\||$)", text)
+        if m:
+            tp = m.group(1).strip()
+            if tp and tp.lower() not in ("none", "-"):
+                prod.trait_descriptions = [tp]
+
+    # Iterate all product-section blocks; bucket items per section.
+    for section in soup.find_all("div", class_=re.compile(r"product-section")):
+        section_classes = section.get("class", [])
+        label = ""
+        for cls in section_classes:
+            if cls in SECTION_LABEL_MAP:
+                label = SECTION_LABEL_MAP[cls]
+                break
+        if not label:
+            title_el = section.find(class_="title")
+            label = (title_el.get_text(strip=True).upper()
+                     if title_el else "OTHER")
+
+        items: list[dict] = []
+        for detail in section.find_all("div", class_="detail-item"):
+            label_el = detail.find("div", class_="label")
+            value_el = detail.find("div", class_="value")
+            ch = (label_el.get_text(" ", strip=True) if label_el else "").strip()
+            if not ch:
+                continue
+
+            scale = _parse_scale(value_el)
+            if scale is not None:
+                items.append({"characteristic": ch, "value": str(scale)})
+            else:
+                v = _parse_value(value_el)
+                # Special-case the "Row Type" header row from planting-population
+                # which holds nested headers, not a real rating.
+                if ch.lower() == "row type" and v.lower() in (
+                    "low medium high", "low / medium / high",
+                ):
+                    continue
+                if v:
+                    items.append({"characteristic": ch, "value": v})
+
+        if items:
+            prod.characteristics_groups.append({
+                "label": label, "type": "scale-or-value", "items": items,
+            })
+
+    return prod
+
+
+# --------------------------------------------------------------------- render
+
+
+def render_markdown(p: AGProduct) -> str:
+    title = p.product_name or p.source_key
+    crop_label = "Corn" if p.crop == "corn" else "Soybeans"
+    head: list[str] = [
+        f"# {title}",
+        "",
+        "- **Vendor:** AgReliant Genetics",
+        "- **Brand:** AgriGold",
+        f"- **Crop:** {crop_label}",
+    ]
+    if p.relative_maturity and p.crop == "corn":
+        head.append(f"- **Relative maturity:** {p.relative_maturity}")
+    if p.maturity_group and p.crop == "soybeans":
+        head.append(f"- **Maturity group:** {p.maturity_group}")
+    if p.trait_descriptions:
+        head.append(f"- **Traits:** {', '.join(p.trait_descriptions)}")
+    head.append(f"- **Source:** {p.source_url}")
+    head.append(f"- **Rating scale (AgriGold):** {RATING_SCALE_DIRECTION}")
+    head.append("")
+    head.append("---")
+    head.append("")
+
+    sections: list[str] = []
+    for g in p.characteristics_groups:
+        label = (g.get("label") or "Characteristics").title()
+        items = g.get("items") or []
+        if not items:
+            continue
+        rows = "\n".join(f"| {it['characteristic']} | {it['value']} |" for it in items)
+        sections.append(
+            f"## {label}\n\n"
+            "| Characteristic | Value |\n"
+            "|---|---|\n"
+            f"{rows}\n"
+        )
+    return "\n".join(head) + "\n".join(sections)
+
+
+# --------------------------------------------------------------------- write
+
+
+def write_product(prod: AGProduct, body_md: str) -> None:
+    CORPUS_DIR.mkdir(parents=True, exist_ok=True)
+    md_path = CORPUS_DIR / f"{prod.source_key}.md"
+    json_path = CORPUS_DIR / f"{prod.source_key}.json"
+
+    md_path.write_text(body_md, encoding="utf-8")
+    sidecar = {
+        "source": "agrigold",
+        "source_key": prod.source_key,
+        "vendor": "AgReliant Genetics",
+        "brand": "AgriGold",
+        "product_name": prod.product_name,
+        "product_id": None,
+        "hybrid_prefix": prod.product_name,
+        "hybrid_suffix": None,
+        "crop": prod.crop,
+        "release_year": None,
+        "relative_maturity": prod.relative_maturity,
+        "maturity_group": prod.maturity_group,
+        "wheat_class": None,
+        "trait_stack": prod.trait_descriptions,
+        "trait_descriptions": prod.trait_descriptions,
+        "positioning_statement": None,
+        "strengths": [],
+        "characteristics_groups": prod.characteristics_groups,
+        "_scale_direction": RATING_SCALE_DIRECTION,
+        "regional_recommendations": [],
+        "image_url": None,
+        "source_urls": [prod.source_url],
+        "sitemap_last_modified": None,
+        "fetched_at": datetime.now(timezone.utc).isoformat(),
+        "scraper_version": SCRAPER_VERSION,
+    }
+    json_path.write_text(
+        json.dumps(sidecar, indent=2, ensure_ascii=False) + "\n",
+        encoding="utf-8",
+    )
+
+
+# --------------------------------------------------------------------- pipeline
+
+
+def process_product(
+    http: RateLimitedSession, *, url: str, crop: str, code: str, force: bool,
+) -> tuple[str, AGProduct | None]:
+    source_key = source_key_for(code)
+    md_path = CORPUS_DIR / f"{source_key}.md"
+    if md_path.exists() and not force:
+        return "skipped", None
+    try:
+        prod = fetch_product_detail(http, url, crop, code)
+    except Exception as exc:  # noqa: BLE001
+        log.error("variety %s failed: %s", code, exc)
+        return "failed", None
+    body = render_markdown(prod)
+    write_product(prod, body)
+    return "written", prod
+
+
+def run(*, limit: int | None, force: bool,
+        only_crop: str | None, only_product: str | None) -> int:
+    CORPUS_DIR.mkdir(parents=True, exist_ok=True)
+    http = RateLimitedSession()
+    targets = discover_varieties(http, only_crop=only_crop)
+    if only_product:
+        targets = [
+            (u, c, k) for (u, c, k) in targets
+            if source_key_for(k) == only_product
+            or k.lower() == only_product.lower()
+        ]
+        if not targets:
+            log.error("no variety matched --product=%s", only_product)
+            return 2
+
+    counts = {"written": 0, "skipped": 0, "failed": 0}
+    processed = 0
+    for url, crop, code in targets:
+        if limit is not None and processed >= limit:
+            break
+        processed += 1
+        status, prod = process_product(
+            http, url=url, crop=crop, code=code, force=force,
+        )
+        counts[status] = counts.get(status, 0) + 1
+        if prod is not None:
+            log.info(
+                "[%d/%s] %s %s | crop=%s rm/mg=%s traits=%s groups=%d",
+                processed, str(limit) if limit else "all",
+                prod.source_key, status, prod.crop,
+                prod.relative_maturity or prod.maturity_group or "-",
+                ",".join(prod.trait_descriptions) or "-",
+                len(prod.characteristics_groups),
+            )
+        else:
+            log.info("[%d/%s] %s %s",
+                     processed, str(limit) if limit else "all",
+                     source_key_for(code), status)
+
+    log.info(
+        "done: processed=%d written=%d skipped=%d failed=%d (of %d candidates)",
+        processed, counts["written"], counts["skipped"],
+        counts["failed"], len(targets),
+    )
+    return 0 if counts["failed"] == 0 else 1
+
+
+# --------------------------------------------------------------------- CLI
+
+
+def _build_argparser() -> argparse.ArgumentParser:
+    p = argparse.ArgumentParser(
+        prog="scrape.sources.agrigold",
+        description="Scrape AgriGold (AgReliant Genetics) corn + soybean varieties.",
+    )
+    p.add_argument("--limit", type=int, default=None,
+                   help="Stop after processing N varieties (default: all).")
+    p.add_argument("--force", action="store_true",
+                   help="Re-fetch even if the markdown file already exists.")
+    p.add_argument("--crop", default=None, choices=list(LISTING_PATHS),
+                   help="Limit to one crop.")
+    p.add_argument("--product", default=None,
+                   help="Process a single variety by source_key or variety code.")
+    p.add_argument("--log-level", default=os.environ.get("LOG_LEVEL", "INFO"))
+    return p
+
+
+def main(argv: list[str] | None = None) -> int:
+    args = _build_argparser().parse_args(argv)
+    logging.basicConfig(
+        level=args.log_level.upper(),
+        format="%(asctime)s %(levelname)s %(name)s %(message)s",
+        stream=sys.stderr,
+    )
+    return run(
+        limit=args.limit, force=args.force,
+        only_crop=args.crop, only_product=args.product,
+    )
+
+
+if __name__ == "__main__":
+    sys.exit(main())
@@ -0,0 +1,412 @@
+"""Ebbert's Seeds scraper — small regional Ohio/Indiana breeder.
+
+Source: ``www.ebbertsseeds.com`` — WordPress site. robots.txt is
+permissive (``Crawl-delay: 5`` only, no Disallow). Covington, OH +
+Decatur, IN — Eastern Corn Belt focus.
+
+Catalog is structured as one scrollable page PER CROP, with each
+variety rendered as a CSS-grid block of `<h1>NAME TRAIT RM RM</h1>`
+ several sub-sections (MANAGEMENT & POSITIONING / CHARACTERISTICS
+/ DISEASE RATINGS) where the labels and numeric values live in
+separate adjacent grid cells. Reconstructing a perfectly-aligned
+{characteristic: value} dict from the multi-column layout is
+fiddly; the small variety count (~17 corn + similar soy/wheat)
+doesn't justify the engineering. We instead **preserve the full
+text body of each variety's container** in the chunk markdown so
+the LLM can read the tabular text as-is.
+
+Pages scraped: `/corn/`, `/soybeans-2/`, `/wheat/`. Grass-seed /
+forage / cover-crop pages are out of scope for the row-crop
+advisor.
+
+Rating scale: ``1-5 (1 = best, lower = more resistant)`` — same
+direction as AgriPro / NK. Confirmed by cross-referencing
+positioning text against published values (a variety described as
+"Robust tall plants" has STANDABILITY 1.0 → 1 = best).
+
+Output:
+  corpus/ebberts_seeds/<source_key>.md
+  corpus/ebberts_seeds/<source_key>.json
+
+source_key: ``ebberts-<slug>`` lowercased, e.g.
+``ebberts-7000tr-rib`` or ``ebberts-1335-conventional``.
+
+CLI:
+  python -m scrape.sources.ebberts_seeds --crop corn --limit 5
+  python -m scrape.sources.ebberts_seeds --force
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import logging
+import os
+import random
+import re
+import sys
+import time
+from dataclasses import dataclass, field
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Any
+
+import requests
+from bs4 import BeautifulSoup
+
+SCRAPER_VERSION = "0.1.0"
+USER_AGENT = "seed-mcp-scraper/0.1 (+https://drawbar.example/contact)"
+BASE = "https://www.ebbertsseeds.com"
+
+# Ebbert's per-crop catalog pages. URL paths confirmed via homepage
+# nav links 2026-05-26.
+CROP_PAGES = {
+    "corn":     "/corn/",
+    "soybeans": "/soybeans-2/",
+    "wheat":    "/wheat/",
+}
+
+# Per robots.txt: Crawl-delay: 5 (seconds). We respect that.
+REQ_INTERVAL_SEC = 5.0
+
+RATING_SCALE_DIRECTION = "1-5 (1 = best, lower = more resistant)"
+
+REPO_ROOT = Path(__file__).resolve().parents[2]
+CORPUS_ROOT = Path(os.environ.get("CORPUS_ROOT") or REPO_ROOT / "corpus")
+CORPUS_DIR = CORPUS_ROOT / "ebberts_seeds"
+
+log = logging.getLogger("scrape.ebberts_seeds")
+
+
+# --------------------------------------------------------------------- HTTP
+
+
+class RateLimitedSession:
+    """robots.txt asks for 5-sec Crawl-delay; we honor it. Ebbert's
+    catalog is only ~30-50 pages total so even at 5 sec/req the
+    full scrape finishes in <5 min."""
+
+    def __init__(self, interval: float = REQ_INTERVAL_SEC) -> None:
+        self.s = requests.Session()
+        self.s.headers["User-Agent"] = USER_AGENT
+        self.interval = interval
+        self._last = 0.0
+
+    def _wait(self) -> None:
+        delta = time.monotonic() - self._last
+        if delta < self.interval:
+            time.sleep(self.interval - delta)
+        self._last = time.monotonic()
+
+    def request(self, method: str, url: str, *, max_retries: int = 4,
+                timeout: float = 30.0, **kw: Any) -> requests.Response:
+        last_exc: Exception | None = None
+        for attempt in range(max_retries):
+            self._wait()
+            try:
+                resp = self.s.request(method, url, timeout=timeout, **kw)
+            except requests.RequestException as exc:
+                last_exc = exc
+                backoff = min(30.0, (2 ** attempt) + random.random())
+                log.warning("network error on %s %s: %s — retry in %.1fs",
+                            method, url, exc, backoff)
+                time.sleep(backoff)
+                continue
+            if resp.status_code == 429 or 500 <= resp.status_code < 600:
+                ra = resp.headers.get("Retry-After")
+                backoff = float(ra) if (ra and ra.isdigit()) else min(30.0, (2 ** attempt) + random.random())
+                log.warning("HTTP %d on %s %s — retry in %.1fs",
+                            resp.status_code, method, url, backoff)
+                time.sleep(backoff)
+                continue
+            return resp
+        if last_exc:
+            raise last_exc
+        return resp  # type: ignore[return-value]
+
+    def get(self, url: str, **kw: Any) -> requests.Response:
+        return self.request("GET", url, **kw)
+
+
+# --------------------------------------------------------------------- model
+
+
+@dataclass
+class EbProduct:
+    source_key: str
+    source_url: str          # the per-crop page URL (Ebbert's doesn't have per-variety pages)
+    crop: str
+    product_name: str = ""   # "7000TR RIB", "1335 CONVENTIONAL"
+    trait_label: str | None = None   # "RIB", "CONVENTIONAL", "PC", "SSX RIB", etc.
+    relative_maturity: str | None = None    # corn
+    maturity_group: str | None = None       # soy
+    body_text: str = ""      # verbatim text of the variety's container
+
+
+# --------------------------------------------------------------------- discovery + parse
+
+
+_VARIETY_HEADING_RE = re.compile(
+    r"^(?P<name>\S+(?:\s+\S+)*?)\s+(?P<rm>\d+(?:\.\d+)?)\s*RM$",
+    re.IGNORECASE,
+)
+
+
+def _variety_text(h1, next_h1) -> str:
+    """Collect the visible text from this variety's <h1> up to (but
+    not including) the next variety's <h1>, walking the DOM in
+    document order.
+
+    Ebbert's grid layout spreads each variety's content across many
+    sibling ``.x-cell`` blocks in the outer container; the h1's
+    immediate parent only holds the title cell. The correct boundary
+    is the next variety h1 in document order.
+    """
+    chunks: list[str] = [h1.get_text(strip=True)]
+    for node in h1.find_all_next(string=True):
+        # Stop once we cross into the next variety's h1.
+        if next_h1 is not None:
+            if node is next_h1 or next_h1 in getattr(node, "parents", []):
+                break
+            # Or text is a descendant of next_h1
+            anc = node.parent
+            while anc is not None:
+                if anc is next_h1:
+                    break
+                anc = anc.parent
+            if anc is next_h1:
+                break
+        text = str(node).strip()
+        if text:
+            chunks.append(text)
+    body = " | ".join(chunks)
+    body = re.sub(r"\s*\|\s*\|\s*", " | ", body)
+    body = re.sub(r"\s+", " ", body).strip()
+    return body
+
+
+def _slug(text: str) -> str:
+    s = re.sub(r"[^a-zA-Z0-9]+", "-", text).strip("-").lower()
+    return s
+
+
+def discover_and_parse(
+    http: RateLimitedSession, *, only_crop: str | None = None,
+) -> list[EbProduct]:
+    """Fetch one page per crop and extract every variety container."""
+    out: list[EbProduct] = []
+    for crop, path in CROP_PAGES.items():
+        if only_crop and crop != only_crop:
+            continue
+        url = f"{BASE}{path}"
+        log.info("fetching %s", url)
+        r = http.get(url)
+        r.raise_for_status()
+        soup = BeautifulSoup(r.text, "html.parser")
+
+        # Every variety is anchored by an <h1>NAME ... RM RM</h1>.
+        v_h1s = [
+            h for h in soup.find_all("h1")
+            if _VARIETY_HEADING_RE.match(h.get_text(strip=True))
+        ]
+        log.info("  %s: %d varieties", crop, len(v_h1s))
+
+        for i, h1 in enumerate(v_h1s):
+            title = h1.get_text(strip=True)
+            m = _VARIETY_HEADING_RE.match(title)
+            if not m:
+                continue
+            name = m.group("name").strip()
+            maturity = m.group("rm")
+
+            next_h1 = v_h1s[i + 1] if i + 1 < len(v_h1s) else None
+            body = _variety_text(h1, next_h1)
+
+            prod = EbProduct(
+                source_key=f"ebberts-{_slug(name)}",
+                source_url=url,
+                crop=crop,
+                product_name=name,
+                relative_maturity=maturity if crop == "corn" else None,
+                maturity_group=maturity if crop == "soybeans" else None,
+                body_text=body,
+            )
+            # Derive trait_label from the second token of the name if
+            # it looks like a trait (CONVENTIONAL, RIB, PC, SSX RIB,
+            # TR RIB, etc.). Best-effort, doesn't have to be perfect.
+            parts = name.split(maxsplit=1)
+            if len(parts) == 2:
+                prod.trait_label = parts[1]
+            out.append(prod)
+    log.info("total varieties discovered: %d", len(out))
+    return out
+
+
+# --------------------------------------------------------------------- render
+
+
+def render_markdown(p: EbProduct) -> str:
+    title = p.product_name or p.source_key
+    crop_label = {"corn": "Corn", "soybeans": "Soybeans",
+                  "wheat": "Wheat"}.get(p.crop, p.crop.title())
+    head: list[str] = [
+        f"# {title}",
+        "",
+        "- **Vendor:** Ebbert's Seeds (independent regional breeder)",
+        "- **Brand:** Ebbert's Seeds",
+        f"- **Crop:** {crop_label}",
+    ]
+    if p.relative_maturity and p.crop == "corn":
+        head.append(f"- **Relative maturity:** {p.relative_maturity}")
+    if p.maturity_group and p.crop == "soybeans":
+        head.append(f"- **Maturity group:** {p.maturity_group}")
+    if p.trait_label:
+        head.append(f"- **Trait stack (label):** {p.trait_label}")
+    head.append(f"- **Source:** {p.source_url}")
+    head.append(f"- **Rating scale (Ebbert's):** {RATING_SCALE_DIRECTION}")
+    head.append("- **Service area:** Covington, OH + Decatur, IN — Eastern Corn Belt regional")
+    head.append("")
+    head.append("---")
+    head.append("")
+    head.append("## Variety detail (verbatim from page)")
+    head.append("")
+    head.append(p.body_text)
+    head.append("")
+    return "\n".join(head)
+
+
+# --------------------------------------------------------------------- write
+
+
+def write_product(prod: EbProduct, body_md: str) -> None:
+    CORPUS_DIR.mkdir(parents=True, exist_ok=True)
+    md_path = CORPUS_DIR / f"{prod.source_key}.md"
+    json_path = CORPUS_DIR / f"{prod.source_key}.json"
+
+    md_path.write_text(body_md, encoding="utf-8")
+    sidecar = {
+        "source": "ebberts_seeds",
+        "source_key": prod.source_key,
+        "vendor": "Ebbert's Seeds",
+        "brand": "Ebbert's Seeds",
+        "product_name": prod.product_name,
+        "product_id": None,
+        "hybrid_prefix": prod.product_name,
+        "hybrid_suffix": prod.trait_label,
+        "crop": prod.crop,
+        "release_year": None,
+        "relative_maturity": prod.relative_maturity,
+        "maturity_group": prod.maturity_group,
+        "wheat_class": None,
+        "trait_stack": [prod.trait_label] if prod.trait_label else [],
+        "trait_descriptions": [],
+        "positioning_statement": None,
+        "strengths": [],
+        # No structured groups — the body markdown carries the table
+        # text verbatim. characteristics_groups stays empty so the
+        # chunker doesn't try to bucket non-existent items.
+        "characteristics_groups": [],
+        "page_text_chars": len(prod.body_text),
+        "_scale_direction": RATING_SCALE_DIRECTION,
+        "regional_recommendations": [
+            {"product_list_name": "Ebbert's service area (Eastern Corn Belt — OH/IN/IL)",
+             "agronomist": None, "agronomist_email": None, "variant_id": None},
+        ],
+        "image_url": None,
+        "source_urls": [prod.source_url],
+        "sitemap_last_modified": None,
+        "fetched_at": datetime.now(timezone.utc).isoformat(),
+        "scraper_version": SCRAPER_VERSION,
+    }
+    json_path.write_text(
+        json.dumps(sidecar, indent=2, ensure_ascii=False) + "\n",
+        encoding="utf-8",
+    )
+
+
+# --------------------------------------------------------------------- pipeline
+
+
+def run(*, limit: int | None, force: bool,
+        only_crop: str | None, only_product: str | None) -> int:
+    CORPUS_DIR.mkdir(parents=True, exist_ok=True)
+    http = RateLimitedSession()
+    products = discover_and_parse(http, only_crop=only_crop)
+
+    if only_product:
+        products = [
+            p for p in products
+            if p.source_key == only_product
+            or p.product_name.lower() == only_product.lower()
+        ]
+        if not products:
+            log.error("no variety matched --product=%s", only_product)
+            return 2
+
+    counts = {"written": 0, "skipped": 0}
+    processed = 0
+    for prod in products:
+        if limit is not None and processed >= limit:
+            break
+        processed += 1
+        md_path = CORPUS_DIR / f"{prod.source_key}.md"
+        if md_path.exists() and not force:
+            counts["skipped"] += 1
+            log.info("[%d/%s] %s skipped",
+                     processed, str(limit) if limit else len(products),
+                     prod.source_key)
+            continue
+        body = render_markdown(prod)
+        write_product(prod, body)
+        counts["written"] += 1
+        log.info(
+            "[%d/%s] %s written | crop=%s rm/mg=%s trait=%s chars=%d",
+            processed, str(limit) if limit else len(products),
+            prod.source_key, prod.crop,
+            prod.relative_maturity or prod.maturity_group or "-",
+            prod.trait_label or "-", len(prod.body_text),
+        )
+
+    log.info(
+        "done: processed=%d written=%d skipped=%d (of %d varieties)",
+        processed, counts["written"], counts["skipped"], len(products),
+    )
+    return 0
+
+
+# --------------------------------------------------------------------- CLI
+
+
+def _build_argparser() -> argparse.ArgumentParser:
+    p = argparse.ArgumentParser(
+        prog="scrape.sources.ebberts_seeds",
+        description="Scrape Ebbert's Seeds (regional Eastern Corn Belt breeder) — "
+                    "corn / soybeans / wheat.",
+    )
+    p.add_argument("--limit", type=int, default=None,
+                   help="Stop after processing N varieties (default: all).")
+    p.add_argument("--force", action="store_true",
+                   help="Re-fetch even if the markdown file already exists.")
+    p.add_argument("--crop", default=None, choices=list(CROP_PAGES),
+                   help="Limit to one crop (corn / soybeans / wheat).")
+    p.add_argument("--product", default=None,
+                   help="Process a single variety by source_key or product name.")
+    p.add_argument("--log-level", default=os.environ.get("LOG_LEVEL", "INFO"))
+    return p
+
+
+def main(argv: list[str] | None = None) -> int:
+    args = _build_argparser().parse_args(argv)
+    logging.basicConfig(
+        level=args.log_level.upper(),
+        format="%(asctime)s %(levelname)s %(name)s %(message)s",
+        stream=sys.stderr,
+    )
+    return run(
+        limit=args.limit, force=args.force,
+        only_crop=args.crop, only_product=args.product,
+    )
+
+
+if __name__ == "__main__":
+    sys.exit(main())
@@ -0,0 +1,503 @@
+"""LG Seeds scraper — AgReliant Genetics brand.
+
+Source: ``www.lgseeds.com`` — WordPress site. Empty robots.txt
+(no Disallow). Catalog covers 4 crops: corn, soybeans, alfalfa,
+sorghum.
+
+Two-layer fetch:
+
+1. **Listing page** (one per crop): inline JavaScript variable
+   ``products = [{...}, ...]`` carries the full variety summary —
+   Variety code, Maturity, Traits[], Bullets[], CropType. No
+   per-variety HTTP needed for identity.
+
+2. **Detail page** (``/products/<crop>/<Variety>``): rich plant
+   characteristics + disease tolerance + management ratings,
+   rendered as ``<div class="characteristics-bar">`` blocks with
+   ``<span class="bar-N">`` where N ∈ 1-9 is the rating. Same
+   convention as Bayer/Golden Harvest (9 = best).
+
+LG Seeds is a regional brand (Eastern Corn Belt focus) under
+AgReliant Genetics, the same parent as AgriGold. Brand voice is
+distinct so we keep them in separate scrapers.
+
+Rating scale: ``1-9 (9 = best)`` — verified empirically on the
+bar-N markup; matches Bayer / Golden Harvest convention.
+
+Output:
+  corpus/lg_seeds/<source_key>.md
+  corpus/lg_seeds/<source_key>.json
+
+source_key: ``lg-<variety>`` lowercased, e.g. ``lg-lg5701``,
+``lg-c3400`` (soybean — codes don't use LG prefix), ``lg-7c300``
+(alfalfa), ``lg-silo-max-100`` (sorghum).
+
+CLI:
+  python -m scrape.sources.lg_seeds --crop corn --limit 5
+  python -m scrape.sources.lg_seeds --force
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import logging
+import os
+import random
+import re
+import sys
+import time
+from dataclasses import dataclass, field
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Any
+
+import requests
+from bs4 import BeautifulSoup
+
+SCRAPER_VERSION = "0.1.0"
+USER_AGENT = "seed-mcp-scraper/0.1 (+https://drawbar.example/contact)"
+BASE = "https://www.lgseeds.com"
+
+# Crops listed in nav. Each has a listing page at /products/<crop>
+# with an inline `var products = [...]` JSON blob.
+LISTING_PATHS = {
+    "corn":     "/products/corn",
+    "soybeans": "/products/soybeans",
+    "alfalfa":  "/products/alfalfa",
+    "sorghum":  "/products/sorghum",
+}
+
+RATING_SCALE_DIRECTION = "1-9 (9 = best)"
+
+REPO_ROOT = Path(__file__).resolve().parents[2]
+CORPUS_ROOT = Path(os.environ.get("CORPUS_ROOT") or REPO_ROOT / "corpus")
+CORPUS_DIR = CORPUS_ROOT / "lg_seeds"
+
+REQ_INTERVAL_SEC = 1.0
+
+log = logging.getLogger("scrape.lg_seeds")
+
+
+# --------------------------------------------------------------------- HTTP
+
+
+class RateLimitedSession:
+    def __init__(self, interval: float = REQ_INTERVAL_SEC) -> None:
+        self.s = requests.Session()
+        self.s.headers["User-Agent"] = USER_AGENT
+        self.interval = interval
+        self._last = 0.0
+
+    def _wait(self) -> None:
+        delta = time.monotonic() - self._last
+        if delta < self.interval:
+            time.sleep(self.interval - delta)
+        self._last = time.monotonic()
+
+    def request(self, method: str, url: str, *, max_retries: int = 4,
+                timeout: float = 30.0, **kw: Any) -> requests.Response:
+        last_exc: Exception | None = None
+        for attempt in range(max_retries):
+            self._wait()
+            try:
+                resp = self.s.request(method, url, timeout=timeout, **kw)
+            except requests.RequestException as exc:
+                last_exc = exc
+                backoff = min(30.0, (2 ** attempt) + random.random())
+                log.warning("network error on %s %s: %s — retry in %.1fs",
+                            method, url, exc, backoff)
+                time.sleep(backoff)
+                continue
+            if resp.status_code == 429 or 500 <= resp.status_code < 600:
+                ra = resp.headers.get("Retry-After")
+                backoff = float(ra) if (ra and ra.isdigit()) else min(30.0, (2 ** attempt) + random.random())
+                log.warning("HTTP %d on %s %s — retry in %.1fs",
+                            resp.status_code, method, url, backoff)
+                time.sleep(backoff)
+                continue
+            return resp
+        if last_exc:
+            raise last_exc
+        return resp  # type: ignore[return-value]
+
+    def get(self, url: str, **kw: Any) -> requests.Response:
+        return self.request("GET", url, **kw)
+
+
+# --------------------------------------------------------------------- model
+
+
+@dataclass
+class LGProduct:
+    source_key: str
+    source_url: str
+    crop: str
+    product_name: str = ""
+    product_id: int | None = None
+    maturity_raw: str | None = None             # corn RM days / soy MG / alfalfa FD / sorghum days
+    fall_dormancy: str | None = None            # alfalfa only
+    trait_descriptions: list[str] = field(default_factory=list)
+    bullets: list[str] = field(default_factory=list)
+    characteristics_groups: list[dict] = field(default_factory=list)
+
+
+# --------------------------------------------------------------------- discovery
+
+
+_VAR_RE = re.compile(
+    r'var\s+\w+\s*=\s*(\[\{"Variety":.+?\}\]);', re.S,
+)
+
+
+def discover_varieties(
+    http: RateLimitedSession, *, only_crop: str | None = None,
+) -> list[tuple[str, dict]]:
+    """Return ``[(crop, summary_dict), ...]`` from each listing page's
+    inline JSON. Summary dict has Variety / Id / Maturity / Traits /
+    Bullets / CropType / FallDormancy."""
+    out: list[tuple[str, dict]] = []
+    for crop, path in LISTING_PATHS.items():
+        if only_crop and crop != only_crop:
+            continue
+        log.info("fetching listing %s%s", BASE, path)
+        r = http.get(f"{BASE}{path}")
+        r.raise_for_status()
+        m = _VAR_RE.search(r.text)
+        if not m:
+            log.warning("no products array in %s", path)
+            continue
+        try:
+            items = json.loads(m.group(1))
+        except json.JSONDecodeError as exc:
+            log.error("JSON parse failed for %s: %s", path, exc)
+            continue
+        log.info("  %s: %d varieties", crop, len(items))
+        for it in items:
+            out.append((crop, it))
+    log.info("total varieties discovered: %d", len(out))
+    return out
+
+
+# --------------------------------------------------------------------- helpers
+
+
+def source_key_for(variety: str) -> str:
+    """Slugify the variety code into a stable source_key."""
+    slug = re.sub(r"[^a-zA-Z0-9-]+", "-", variety).strip("-").lower()
+    return f"lg-{slug}"
+
+
+_BAR_CLASS_RE = re.compile(r"^bar-(\d)$")
+
+
+def _parse_bar_value(span_classes: list[str]) -> int | None:
+    """Extract the integer rating from a ``bar-N`` CSS class."""
+    for c in span_classes or []:
+        m = _BAR_CLASS_RE.match(c)
+        if m:
+            return int(m.group(1))
+    return None
+
+
+# --------------------------------------------------------------------- detail
+
+
+def fetch_product_detail(
+    http: RateLimitedSession, summary: dict, crop: str,
+) -> LGProduct:
+    """Fetch the detail page and merge characteristics into an
+    LGProduct seeded by the listing-page summary."""
+    variety = summary.get("Variety") or ""
+    # LG's detail URL is /products/<crop>/<Variety>. The Variety in the
+    # listing JSON appears in correct case; LG seems to accept any case
+    # but we use what's published.
+    url = f"{BASE}/products/{crop}/{variety}"
+    prod = LGProduct(
+        source_key=source_key_for(variety),
+        source_url=url,
+        crop=crop,
+        product_name=variety,
+        product_id=summary.get("Id"),
+        maturity_raw=str(summary.get("Maturity")) if summary.get("Maturity") is not None else None,
+        fall_dormancy=str(summary.get("FallDormancy")) if summary.get("FallDormancy") else None,
+        trait_descriptions=list(summary.get("Traits") or []),
+        bullets=list(summary.get("Bullets") or []),
+    )
+
+    try:
+        r = http.get(url)
+        r.raise_for_status()
+    except Exception as exc:  # noqa: BLE001
+        log.warning("detail fetch failed for %s: %s", variety, exc)
+        return prod  # identity-only fallback
+
+    soup = BeautifulSoup(r.text, "html.parser")
+
+    # The detail page has multiple .product-section blocks; each has
+    # a heading + a collection of .characteristics-bar rows. We bucket
+    # by the section's text content. Common LG section labels:
+    # "Characteristics" / "Management" / "Disease Tolerance".
+    sections: list[tuple[str, list[dict]]] = []
+    for section in soup.find_all("div", class_=re.compile(r"product-section")):
+        # Heading is the first text node inside the section, before bars.
+        # The section class often includes a hint like "disease-toler",
+        # "plantCharacteristics", "management-pr".
+        section_classes = " ".join(section.get("class", []))
+        bars = section.find_all("div", class_="characteristics-bar")
+        if not bars:
+            continue
+
+        # Section label — use the first heading-like element or the
+        # text right after the section class anchor.
+        label = ""
+        for h in section.find_all(["h2", "h3", "h4"]):
+            t = h.get_text(strip=True)
+            if t:
+                label = t
+                break
+        if not label:
+            # fallback: section_classes hint
+            if "disease" in section_classes.lower():
+                label = "Disease Tolerance"
+            elif "management" in section_classes.lower():
+                label = "Management"
+            elif "plantcharacteristics" in section_classes.lower():
+                label = "Characteristics"
+
+        items: list[dict] = []
+        for bar in bars:
+            name_el = bar.find(class_="product-name")
+            value_span = bar.find("span", class_=_BAR_CLASS_RE)
+            name = (name_el.get_text(" ", strip=True) if name_el else "").strip()
+            rating = _parse_bar_value(value_span.get("class") if value_span else [])
+            if not name:
+                continue
+            # Some "bars" are actually qualitative (e.g. "Tar Spot Susceptible",
+            # "Fungicide Response High"). For those we keep the label as the
+            # value text rather than a missing rating.
+            if rating is None:
+                # Look inside the bar element for a non-name text snippet
+                inner_text = bar.get_text(" ", strip=True)
+                # Strip the label off the front
+                if inner_text.startswith(name):
+                    inner_text = inner_text[len(name):].strip()
+                items.append({"characteristic": name, "value": inner_text or "-"})
+            else:
+                items.append({"characteristic": name, "value": str(rating)})
+
+        if items:
+            sections.append((label or "Characteristics", items))
+
+    prod.characteristics_groups = [
+        {"label": label.upper(), "type": "bars", "items": items}
+        for label, items in sections
+    ]
+
+    return prod
+
+
+# --------------------------------------------------------------------- render
+
+
+def render_markdown(p: LGProduct) -> str:
+    title = p.product_name or p.source_key
+    crop_label = {
+        "corn": "Corn", "soybeans": "Soybeans",
+        "alfalfa": "Alfalfa", "sorghum": "Sorghum",
+    }.get(p.crop, p.crop.title())
+
+    head: list[str] = [
+        f"# {title}",
+        "",
+        "- **Vendor:** AgReliant Genetics",
+        "- **Brand:** LG Seeds",
+        f"- **Crop:** {crop_label}",
+    ]
+    if p.maturity_raw:
+        if p.crop == "corn":
+            head.append(f"- **Relative maturity:** {p.maturity_raw}")
+        elif p.crop == "soybeans":
+            head.append(f"- **Maturity group:** {p.maturity_raw}")
+        elif p.crop == "alfalfa":
+            head.append(f"- **Fall dormancy / maturity:** {p.maturity_raw}")
+        elif p.crop == "sorghum":
+            head.append(f"- **Days to maturity:** {p.maturity_raw}")
+    if p.trait_descriptions:
+        head.append(f"- **Traits:** {', '.join(p.trait_descriptions)}")
+    head.append(f"- **Source:** {p.source_url}")
+    head.append(f"- **Rating scale (LG Seeds):** {RATING_SCALE_DIRECTION}")
+    head.append("")
+    head.append("---")
+    head.append("")
+
+    sections: list[str] = []
+    if p.bullets:
+        bullets = "\n".join(f"- {b}" for b in p.bullets)
+        sections.append("## Strengths\n\n" + bullets + "\n")
+
+    for g in p.characteristics_groups:
+        label = (g.get("label") or "Characteristics").title()
+        items = g.get("items") or []
+        if not items:
+            continue
+        rows = "\n".join(f"| {it['characteristic']} | {it['value']} |" for it in items)
+        sections.append(
+            f"## {label}\n\n"
+            "| Characteristic | Value |\n"
+            "|---|---|\n"
+            f"{rows}\n"
+        )
+    return "\n".join(head) + "\n".join(sections)
+
+
+# --------------------------------------------------------------------- write
+
+
+def write_product(prod: LGProduct, body_md: str) -> None:
+    CORPUS_DIR.mkdir(parents=True, exist_ok=True)
+    md_path = CORPUS_DIR / f"{prod.source_key}.md"
+    json_path = CORPUS_DIR / f"{prod.source_key}.json"
+
+    md_path.write_text(body_md, encoding="utf-8")
+    sidecar = {
+        "source": "lg_seeds",
+        "source_key": prod.source_key,
+        "vendor": "AgReliant Genetics",
+        "brand": "LG Seeds",
+        "product_name": prod.product_name,
+        "product_id": prod.product_id,
+        "hybrid_prefix": prod.product_name,
+        "hybrid_suffix": None,
+        "crop": prod.crop,
+        "release_year": None,
+        # Maturity routing: corn = RM days, soy = MG, alfalfa = FD,
+        # sorghum = days-to-maturity. Stored in the canonical fields
+        # so the chunker's crop-aware preamble works.
+        "relative_maturity": prod.maturity_raw if prod.crop in ("corn", "sorghum") else None,
+        "maturity_group": prod.maturity_raw if prod.crop == "soybeans" else None,
+        "fall_dormancy": prod.maturity_raw if prod.crop == "alfalfa" else prod.fall_dormancy,
+        "wheat_class": None,
+        "trait_stack": prod.trait_descriptions,  # LG publishes full names, not codes
+        "trait_descriptions": prod.trait_descriptions,
+        "positioning_statement": None,
+        "strengths": prod.bullets,
+        "characteristics_groups": prod.characteristics_groups,
+        "_scale_direction": RATING_SCALE_DIRECTION,
+        "regional_recommendations": [],
+        "image_url": None,
+        "source_urls": [prod.source_url],
+        "sitemap_last_modified": None,
+        "fetched_at": datetime.now(timezone.utc).isoformat(),
+        "scraper_version": SCRAPER_VERSION,
+    }
+    json_path.write_text(
+        json.dumps(sidecar, indent=2, ensure_ascii=False) + "\n",
+        encoding="utf-8",
+    )
+
+
+# --------------------------------------------------------------------- pipeline
+
+
+def process_product(
+    http: RateLimitedSession, summary: dict, crop: str, *, force: bool,
+) -> tuple[str, LGProduct | None]:
+    variety = summary.get("Variety") or ""
+    source_key = source_key_for(variety)
+    md_path = CORPUS_DIR / f"{source_key}.md"
+    if md_path.exists() and not force:
+        return "skipped", None
+    try:
+        prod = fetch_product_detail(http, summary, crop)
+    except Exception as exc:  # noqa: BLE001
+        log.error("variety %s failed: %s", variety, exc)
+        return "failed", None
+    body = render_markdown(prod)
+    write_product(prod, body)
+    return "written", prod
+
+
+def run(
+    *, limit: int | None, force: bool,
+    only_crop: str | None, only_product: str | None,
+) -> int:
+    CORPUS_DIR.mkdir(parents=True, exist_ok=True)
+    http = RateLimitedSession()
+    targets = discover_varieties(http, only_crop=only_crop)
+    if only_product:
+        targets = [
+            (c, s) for (c, s) in targets
+            if source_key_for(s.get("Variety", "")) == only_product
+            or s.get("Variety", "").lower() == only_product.lower()
+        ]
+        if not targets:
+            log.error("no variety matched --product=%s", only_product)
+            return 2
+
+    counts = {"written": 0, "skipped": 0, "failed": 0}
+    processed = 0
+    for crop, summary in targets:
+        if limit is not None and processed >= limit:
+            break
+        processed += 1
+        status, prod = process_product(http, summary, crop, force=force)
+        counts[status] = counts.get(status, 0) + 1
+        if prod is not None:
+            log.info(
+                "[%d/%s] %s %s | crop=%s maturity=%s traits=%d groups=%d",
+                processed, str(limit) if limit else "all",
+                prod.source_key, status, prod.crop,
+                prod.maturity_raw or "-",
+                len(prod.trait_descriptions),
+                len(prod.characteristics_groups),
+            )
+        else:
+            log.info("[%d/%s] %s %s",
+                     processed, str(limit) if limit else "all",
+                     source_key_for(summary.get("Variety", "")), status)
+
+    log.info(
+        "done: processed=%d written=%d skipped=%d failed=%d (of %d candidates)",
+        processed, counts["written"], counts["skipped"],
+        counts["failed"], len(targets),
+    )
+    return 0 if counts["failed"] == 0 else 1
+
+
+# --------------------------------------------------------------------- CLI
+
+
+def _build_argparser() -> argparse.ArgumentParser:
+    p = argparse.ArgumentParser(
+        prog="scrape.sources.lg_seeds",
+        description="Scrape LG Seeds (AgReliant Genetics) — corn / "
+                    "soybeans / alfalfa / sorghum.",
+    )
+    p.add_argument("--limit", type=int, default=None,
+                   help="Stop after processing N varieties (default: all).")
+    p.add_argument("--force", action="store_true",
+                   help="Re-fetch even if the markdown file already exists.")
+    p.add_argument("--crop", default=None, choices=list(LISTING_PATHS),
+                   help="Limit to one crop.")
+    p.add_argument("--product", default=None,
+                   help="Process a single variety by source_key or Variety code.")
+    p.add_argument("--log-level", default=os.environ.get("LOG_LEVEL", "INFO"))
+    return p
+
+
+def main(argv: list[str] | None = None) -> int:
+    args = _build_argparser().parse_args(argv)
+    logging.basicConfig(
+        level=args.log_level.upper(),
+        format="%(asctime)s %(levelname)s %(name)s %(message)s",
+        stream=sys.stderr,
+    )
+    return run(
+        limit=args.limit, force=args.force,
+        only_crop=args.crop, only_product=args.product,
+    )
+
+
+if __name__ == "__main__":
+    sys.exit(main())