Three new brand scrapers: LG Seeds + AgriGold + Ebbert's Seeds (+310 varieties)

User flagged LG, AgriGold, and Ebbert's (local Ohio breeder) are
all active in farmer territory. Built three scrapers — corpus now
covers 5,839 chunks across 11 brands.

Net new varieties: 310
  lg_seeds        170 — corn 78 + soy 63 + alfalfa 16 + sorghum 13
                  → adds FIRST alfalfa coverage (FD 3-5 range)
  agrigold        111 — corn 60 + soy 51
  ebberts_seeds    29 — corn 17 + soy 12 (regional OH/IN breeder)

scrape/sources/lg_seeds.py — embedded-JSON pattern (cleanest):
- /products/<crop> pages have a `var products = [...]` blob with the
  variety summary (Variety, Maturity, Traits[], Bullets[], CropType).
- Per-variety detail page (/products/<crop>/<Variety>) carries the
  ratings as `<span class="bar-N">` where N is 1-9 on the canonical
  scale. Same 9=best direction as Bayer / Golden Harvest.
- Three sections per page: Characteristics / Management / Disease
  Tolerance, plus a few qualitative bars ("Tar Spot Susceptible",
  "Fungicide Response High") preserved as text values.

scrape/sources/agrigold.py — 5-circle scale:
- Listing page has 60+ /corn/explore-corn-hybrids/<CODE> URLs.
- Detail page renders ratings as <div class="scale"> blocks with 5
  child <div class="circle"> elements, of which N have class
  "circle selected" → rating N on a 1-5 scale.
- 7 sections per page incl. Silage Characteristics (Dairy Silage
  Rating, NDFd 30 Hr, Crude Protein), Planting Applications, Soil
  Adaptability, Plant Characteristics, Product Features.
- Distinct rating direction (1-5 vs Bayer's 1-9) — declared in
  _scale_direction so chunker preamble renders correctly.

scrape/sources/ebberts_seeds.py — small regional breeder, verbatim
text approach:
- Single page per crop (corn / soybeans / wheat). Each variety is an
  <h1> + multi-section CSS-grid block where labels and values are in
  separate adjacent cells. Reconstructing perfectly-aligned columns
  for a 29-variety total isn't worth the engineering — chunk body
  carries the verbatim text in document order, LLM can read the
  tabular content.
- Scale: 1-5 (1 = best, lower = more resistant), inferred from
  marketing-vs-rating cross-checks ("Robust tall plants" + STANDABILITY
  1.0 → 1 = best).
- Politeness: robots.txt asks for Crawl-delay: 5; honored.

All three new scrapers smoke-tested:
- LG corn LG5701 RM 116 SmartStax → 3 characteristic groups with
  Disease Tolerance ratings (Northern/Southern Leaf Blight 8-9, etc.)
- AgriGold A616-30 RM 86 VT2RIB → 7 groups incl. silage and soil
  adaptability ratings
- Ebbert's 7000TR RIB RM 100 → 1098-char verbatim body covering
  CHARACTERISTICS, DISEASE RATINGS, herbicide tolerance, etc.

Corpus state after this PR:
- 5,839 chunks (was 5,529)
- 11 brands (was 8)
- 8 crops (corn 3047, soy 2209, silage 359, wheat 123, sorghum 49,
  cotton 30, alfalfa 16, canola 6) — alfalfa is brand-new

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-26 12:42:23 -04:00
parent 06461ade1d
commit 30b182e28a
623 changed files with 75417 additions and 0 deletions
+502
View File
@@ -0,0 +1,502 @@
"""AgriGold scraper — AgReliant Genetics brand.
Source: ``www.agrigold.com`` — WordPress site, empty robots.txt
(no Disallow). Catalog covers corn + soybeans. Sibling of LG Seeds
under the same parent (AgReliant) but distinct branding /
positioning, so kept in its own scraper.
Discovery: the listing page ``/corn/explore-corn-hybrids`` (and
the soybean equivalent) is server-rendered HTML that contains
``<a href="/corn/explore-corn-hybrids/<CODE>">`` for every variety.
Codes look like ``A616-30``, ``A623-88``, etc. Parse the listing
HTML, collect distinct variety URLs.
Per-variety detail (``/corn/explore-corn-hybrids/<CODE>``) renders
several ``<div class="product-section ...">`` blocks. Each section
has a ``<div class="title">`` heading + multiple ``.detail-item``
rows shaped as ``<div class="label">N</div><div class="value">V</div>``.
The ``<div class="value">`` content is one of:
- **5-circle rating scale** (Agronomic Rating, Disease Tolerance,
Silage Characteristics): ``<div class="scale">`` containing 5
children, where N have class ``circle selected`` and 5-N have
class ``circle``. Count = rating on a **1-5 scale** (5 = best).
Distinct from Bayer / LG Seeds' 1-9 convention — documented in
the sidecar's ``_scale_direction``.
- **Numeric value** (GDUs, year, plant population): bare number.
- **Categorical / qualitative** (Ear Flex Type "KERNEL",
Leaf Orientation "SEMI UPRIGHT", Cob Color "Red"): the literal
text.
- **NA**: rated but not yet measured.
Rating scale: ``1-5 (5 = best)`` — distinct from the other brands;
the chunker reads ``_scale_direction`` to render the correct
preamble.
Output:
corpus/agrigold/<source_key>.md
corpus/agrigold/<source_key>.json
source_key: ``agrigold-<code>`` lowercased, e.g.
``agrigold-a616-30``.
CLI:
python -m scrape.sources.agrigold --crop corn --limit 5
python -m scrape.sources.agrigold --force
"""
from __future__ import annotations
import argparse
import json
import logging
import os
import random
import re
import sys
import time
from dataclasses import dataclass, field
from datetime import datetime, timezone
from pathlib import Path
from typing import Any
import requests
from bs4 import BeautifulSoup
SCRAPER_VERSION = "0.1.0"
USER_AGENT = "seed-mcp-scraper/0.1 (+https://drawbar.example/contact)"
BASE = "https://www.agrigold.com"
LISTING_PATHS = {
"corn": "/corn/explore-corn-hybrids",
"soybeans": "/soybeans/explore-soybean-varieties",
}
# AgriGold publishes ratings on a 1-5 scale (5 = best), counted from
# the selected circles in the per-rating scale block. The chunker
# preserves this verbatim — every chunk preamble declares the scale
# so the LLM doesn't conflate with Bayer's 1-9.
RATING_SCALE_DIRECTION = "1-5 (5 = best)"
REPO_ROOT = Path(__file__).resolve().parents[2]
CORPUS_ROOT = Path(os.environ.get("CORPUS_ROOT") or REPO_ROOT / "corpus")
CORPUS_DIR = CORPUS_ROOT / "agrigold"
REQ_INTERVAL_SEC = 1.0
log = logging.getLogger("scrape.agrigold")
# --------------------------------------------------------------------- HTTP
class RateLimitedSession:
def __init__(self, interval: float = REQ_INTERVAL_SEC) -> None:
self.s = requests.Session()
self.s.headers["User-Agent"] = USER_AGENT
self.interval = interval
self._last = 0.0
def _wait(self) -> None:
delta = time.monotonic() - self._last
if delta < self.interval:
time.sleep(self.interval - delta)
self._last = time.monotonic()
def request(self, method: str, url: str, *, max_retries: int = 4,
timeout: float = 30.0, **kw: Any) -> requests.Response:
last_exc: Exception | None = None
for attempt in range(max_retries):
self._wait()
try:
resp = self.s.request(method, url, timeout=timeout, **kw)
except requests.RequestException as exc:
last_exc = exc
backoff = min(30.0, (2 ** attempt) + random.random())
log.warning("network error on %s %s: %s — retry in %.1fs",
method, url, exc, backoff)
time.sleep(backoff)
continue
if resp.status_code == 429 or 500 <= resp.status_code < 600:
ra = resp.headers.get("Retry-After")
backoff = float(ra) if (ra and ra.isdigit()) else min(30.0, (2 ** attempt) + random.random())
log.warning("HTTP %d on %s %s — retry in %.1fs",
resp.status_code, method, url, backoff)
time.sleep(backoff)
continue
return resp
if last_exc:
raise last_exc
return resp # type: ignore[return-value]
def get(self, url: str, **kw: Any) -> requests.Response:
return self.request("GET", url, **kw)
# --------------------------------------------------------------------- model
@dataclass
class AGProduct:
source_key: str
source_url: str
crop: str
product_name: str = ""
relative_maturity: str | None = None # corn RM days from .maturity
maturity_group: str | None = None # soy MG
trait_descriptions: list[str] = field(default_factory=list)
characteristics_groups: list[dict] = field(default_factory=list)
# --------------------------------------------------------------------- discovery
def discover_varieties(
http: RateLimitedSession, *, only_crop: str | None = None,
) -> list[tuple[str, str, str]]:
"""Return ``[(url, crop, variety_code), ...]`` for every variety in
the listing pages."""
out: list[tuple[str, str, str]] = []
for crop, path in LISTING_PATHS.items():
if only_crop and crop != only_crop:
continue
log.info("fetching listing %s%s", BASE, path)
r = http.get(f"{BASE}{path}")
r.raise_for_status()
# Collect distinct hrefs that look like /<crop>/explore-X-{hybrids,
# varieties}/<CODE>. Codes are alphanumeric with dashes.
href_re = re.compile(rf"^{re.escape(path)}/([\w\-]+)$")
seen: set[str] = set()
soup = BeautifulSoup(r.text, "html.parser")
for a in soup.find_all("a", href=True):
m = href_re.match(a["href"])
if not m:
continue
code = m.group(1)
# Filter out catalog-tool tails ("filter", "browse", etc.)
if not re.match(r"^[A-Z0-9][\w\-]{2,30}$", code, re.I):
continue
if code in seen:
continue
seen.add(code)
out.append((f"{BASE}{path}/{code}", crop, code))
log.info(" %s: %d varieties", crop, len(seen))
log.info("total varieties discovered: %d", len(out))
return out
# --------------------------------------------------------------------- helpers
def source_key_for(code: str) -> str:
slug = re.sub(r"[^a-zA-Z0-9-]+", "-", code).strip("-").lower()
return f"agrigold-{slug}"
# Section class hint -> normalized label for the sidecar.
SECTION_LABEL_MAP = {
"agronomic-rating": "AGRONOMIC RATING",
"disease-tolerance": "DISEASE TOLERANCE",
"plant-characteristics": "PLANT CHARACTERISTICS",
"plant-features": "PRODUCT FEATURES",
"silage-characteristics": "SILAGE CHARACTERISTICS",
"planting-applications": "PLANTING APPLICATIONS",
"planting-population": "PLANTING POPULATION",
}
def _parse_scale(value_el) -> int | None:
"""Count selected circles in a ``<div class="scale">`` block.
Returns 1-5 or None if no scale present."""
if value_el is None:
return None
scale = value_el.find("div", class_="scale")
if scale is None:
return None
selected = scale.find_all("div", class_=lambda c: c and "selected" in c)
return len(selected) if selected else 0
def _parse_value(value_el) -> str:
"""Extract a non-scale value: raw text contents, trimmed."""
if value_el is None:
return ""
# If it has a .scale child we should have caught it above. Otherwise
# return the leaf text.
text = value_el.get_text(" ", strip=True)
return text
# --------------------------------------------------------------------- detail
def fetch_product_detail(
http: RateLimitedSession, url: str, crop: str, code: str,
) -> AGProduct:
r = http.get(url)
r.raise_for_status()
soup = BeautifulSoup(r.text, "html.parser")
prod = AGProduct(
source_key=source_key_for(code),
source_url=url,
crop=crop,
product_name=code,
)
# Maturity — often rendered as ``<div class="maturity">86 days</div>``.
mat_el = soup.find(class_="maturity")
if mat_el:
text = mat_el.get_text(strip=True)
m = re.search(r"(\d+(?:\.\d+)?)", text)
if m:
if crop == "corn":
prod.relative_maturity = m.group(1)
elif crop == "soybeans":
prod.maturity_group = m.group(1)
# Trait package — from .product-details / "Trait Package"
pd = soup.find(class_="product-details")
if pd:
# The details block renders pairs of label / value text:
# "Genetic Family | Icon-J | Trait Package | VT2RIB | ..."
# Parse the labels we recognize.
text = pd.get_text(" | ", strip=True)
m = re.search(r"Trait Package\s*\|\s*([^|]+?)(?:\s*\||$)", text)
if m:
tp = m.group(1).strip()
if tp and tp.lower() not in ("none", "-"):
prod.trait_descriptions = [tp]
# Iterate all product-section blocks; bucket items per section.
for section in soup.find_all("div", class_=re.compile(r"product-section")):
section_classes = section.get("class", [])
label = ""
for cls in section_classes:
if cls in SECTION_LABEL_MAP:
label = SECTION_LABEL_MAP[cls]
break
if not label:
title_el = section.find(class_="title")
label = (title_el.get_text(strip=True).upper()
if title_el else "OTHER")
items: list[dict] = []
for detail in section.find_all("div", class_="detail-item"):
label_el = detail.find("div", class_="label")
value_el = detail.find("div", class_="value")
ch = (label_el.get_text(" ", strip=True) if label_el else "").strip()
if not ch:
continue
scale = _parse_scale(value_el)
if scale is not None:
items.append({"characteristic": ch, "value": str(scale)})
else:
v = _parse_value(value_el)
# Special-case the "Row Type" header row from planting-population
# which holds nested headers, not a real rating.
if ch.lower() == "row type" and v.lower() in (
"low medium high", "low / medium / high",
):
continue
if v:
items.append({"characteristic": ch, "value": v})
if items:
prod.characteristics_groups.append({
"label": label, "type": "scale-or-value", "items": items,
})
return prod
# --------------------------------------------------------------------- render
def render_markdown(p: AGProduct) -> str:
title = p.product_name or p.source_key
crop_label = "Corn" if p.crop == "corn" else "Soybeans"
head: list[str] = [
f"# {title}",
"",
"- **Vendor:** AgReliant Genetics",
"- **Brand:** AgriGold",
f"- **Crop:** {crop_label}",
]
if p.relative_maturity and p.crop == "corn":
head.append(f"- **Relative maturity:** {p.relative_maturity}")
if p.maturity_group and p.crop == "soybeans":
head.append(f"- **Maturity group:** {p.maturity_group}")
if p.trait_descriptions:
head.append(f"- **Traits:** {', '.join(p.trait_descriptions)}")
head.append(f"- **Source:** {p.source_url}")
head.append(f"- **Rating scale (AgriGold):** {RATING_SCALE_DIRECTION}")
head.append("")
head.append("---")
head.append("")
sections: list[str] = []
for g in p.characteristics_groups:
label = (g.get("label") or "Characteristics").title()
items = g.get("items") or []
if not items:
continue
rows = "\n".join(f"| {it['characteristic']} | {it['value']} |" for it in items)
sections.append(
f"## {label}\n\n"
"| Characteristic | Value |\n"
"|---|---|\n"
f"{rows}\n"
)
return "\n".join(head) + "\n".join(sections)
# --------------------------------------------------------------------- write
def write_product(prod: AGProduct, body_md: str) -> None:
CORPUS_DIR.mkdir(parents=True, exist_ok=True)
md_path = CORPUS_DIR / f"{prod.source_key}.md"
json_path = CORPUS_DIR / f"{prod.source_key}.json"
md_path.write_text(body_md, encoding="utf-8")
sidecar = {
"source": "agrigold",
"source_key": prod.source_key,
"vendor": "AgReliant Genetics",
"brand": "AgriGold",
"product_name": prod.product_name,
"product_id": None,
"hybrid_prefix": prod.product_name,
"hybrid_suffix": None,
"crop": prod.crop,
"release_year": None,
"relative_maturity": prod.relative_maturity,
"maturity_group": prod.maturity_group,
"wheat_class": None,
"trait_stack": prod.trait_descriptions,
"trait_descriptions": prod.trait_descriptions,
"positioning_statement": None,
"strengths": [],
"characteristics_groups": prod.characteristics_groups,
"_scale_direction": RATING_SCALE_DIRECTION,
"regional_recommendations": [],
"image_url": None,
"source_urls": [prod.source_url],
"sitemap_last_modified": None,
"fetched_at": datetime.now(timezone.utc).isoformat(),
"scraper_version": SCRAPER_VERSION,
}
json_path.write_text(
json.dumps(sidecar, indent=2, ensure_ascii=False) + "\n",
encoding="utf-8",
)
# --------------------------------------------------------------------- pipeline
def process_product(
http: RateLimitedSession, *, url: str, crop: str, code: str, force: bool,
) -> tuple[str, AGProduct | None]:
source_key = source_key_for(code)
md_path = CORPUS_DIR / f"{source_key}.md"
if md_path.exists() and not force:
return "skipped", None
try:
prod = fetch_product_detail(http, url, crop, code)
except Exception as exc: # noqa: BLE001
log.error("variety %s failed: %s", code, exc)
return "failed", None
body = render_markdown(prod)
write_product(prod, body)
return "written", prod
def run(*, limit: int | None, force: bool,
only_crop: str | None, only_product: str | None) -> int:
CORPUS_DIR.mkdir(parents=True, exist_ok=True)
http = RateLimitedSession()
targets = discover_varieties(http, only_crop=only_crop)
if only_product:
targets = [
(u, c, k) for (u, c, k) in targets
if source_key_for(k) == only_product
or k.lower() == only_product.lower()
]
if not targets:
log.error("no variety matched --product=%s", only_product)
return 2
counts = {"written": 0, "skipped": 0, "failed": 0}
processed = 0
for url, crop, code in targets:
if limit is not None and processed >= limit:
break
processed += 1
status, prod = process_product(
http, url=url, crop=crop, code=code, force=force,
)
counts[status] = counts.get(status, 0) + 1
if prod is not None:
log.info(
"[%d/%s] %s %s | crop=%s rm/mg=%s traits=%s groups=%d",
processed, str(limit) if limit else "all",
prod.source_key, status, prod.crop,
prod.relative_maturity or prod.maturity_group or "-",
",".join(prod.trait_descriptions) or "-",
len(prod.characteristics_groups),
)
else:
log.info("[%d/%s] %s %s",
processed, str(limit) if limit else "all",
source_key_for(code), status)
log.info(
"done: processed=%d written=%d skipped=%d failed=%d (of %d candidates)",
processed, counts["written"], counts["skipped"],
counts["failed"], len(targets),
)
return 0 if counts["failed"] == 0 else 1
# --------------------------------------------------------------------- CLI
def _build_argparser() -> argparse.ArgumentParser:
p = argparse.ArgumentParser(
prog="scrape.sources.agrigold",
description="Scrape AgriGold (AgReliant Genetics) corn + soybean varieties.",
)
p.add_argument("--limit", type=int, default=None,
help="Stop after processing N varieties (default: all).")
p.add_argument("--force", action="store_true",
help="Re-fetch even if the markdown file already exists.")
p.add_argument("--crop", default=None, choices=list(LISTING_PATHS),
help="Limit to one crop.")
p.add_argument("--product", default=None,
help="Process a single variety by source_key or variety code.")
p.add_argument("--log-level", default=os.environ.get("LOG_LEVEL", "INFO"))
return p
def main(argv: list[str] | None = None) -> int:
args = _build_argparser().parse_args(argv)
logging.basicConfig(
level=args.log_level.upper(),
format="%(asctime)s %(levelname)s %(name)s %(message)s",
stream=sys.stderr,
)
return run(
limit=args.limit, force=args.force,
only_crop=args.crop, only_product=args.product,
)
if __name__ == "__main__":
sys.exit(main())
+412
View File
@@ -0,0 +1,412 @@
"""Ebbert's Seeds scraper — small regional Ohio/Indiana breeder.
Source: ``www.ebbertsseeds.com`` — WordPress site. robots.txt is
permissive (``Crawl-delay: 5`` only, no Disallow). Covington, OH +
Decatur, IN — Eastern Corn Belt focus.
Catalog is structured as one scrollable page PER CROP, with each
variety rendered as a CSS-grid block of `<h1>NAME TRAIT RM RM</h1>`
+ several sub-sections (MANAGEMENT & POSITIONING / CHARACTERISTICS
/ DISEASE RATINGS) where the labels and numeric values live in
separate adjacent grid cells. Reconstructing a perfectly-aligned
{characteristic: value} dict from the multi-column layout is
fiddly; the small variety count (~17 corn + similar soy/wheat)
doesn't justify the engineering. We instead **preserve the full
text body of each variety's container** in the chunk markdown so
the LLM can read the tabular text as-is.
Pages scraped: `/corn/`, `/soybeans-2/`, `/wheat/`. Grass-seed /
forage / cover-crop pages are out of scope for the row-crop
advisor.
Rating scale: ``1-5 (1 = best, lower = more resistant)`` — same
direction as AgriPro / NK. Confirmed by cross-referencing
positioning text against published values (a variety described as
"Robust tall plants" has STANDABILITY 1.0 → 1 = best).
Output:
corpus/ebberts_seeds/<source_key>.md
corpus/ebberts_seeds/<source_key>.json
source_key: ``ebberts-<slug>`` lowercased, e.g.
``ebberts-7000tr-rib`` or ``ebberts-1335-conventional``.
CLI:
python -m scrape.sources.ebberts_seeds --crop corn --limit 5
python -m scrape.sources.ebberts_seeds --force
"""
from __future__ import annotations
import argparse
import json
import logging
import os
import random
import re
import sys
import time
from dataclasses import dataclass, field
from datetime import datetime, timezone
from pathlib import Path
from typing import Any
import requests
from bs4 import BeautifulSoup
SCRAPER_VERSION = "0.1.0"
USER_AGENT = "seed-mcp-scraper/0.1 (+https://drawbar.example/contact)"
BASE = "https://www.ebbertsseeds.com"
# Ebbert's per-crop catalog pages. URL paths confirmed via homepage
# nav links 2026-05-26.
CROP_PAGES = {
"corn": "/corn/",
"soybeans": "/soybeans-2/",
"wheat": "/wheat/",
}
# Per robots.txt: Crawl-delay: 5 (seconds). We respect that.
REQ_INTERVAL_SEC = 5.0
RATING_SCALE_DIRECTION = "1-5 (1 = best, lower = more resistant)"
REPO_ROOT = Path(__file__).resolve().parents[2]
CORPUS_ROOT = Path(os.environ.get("CORPUS_ROOT") or REPO_ROOT / "corpus")
CORPUS_DIR = CORPUS_ROOT / "ebberts_seeds"
log = logging.getLogger("scrape.ebberts_seeds")
# --------------------------------------------------------------------- HTTP
class RateLimitedSession:
"""robots.txt asks for 5-sec Crawl-delay; we honor it. Ebbert's
catalog is only ~30-50 pages total so even at 5 sec/req the
full scrape finishes in <5 min."""
def __init__(self, interval: float = REQ_INTERVAL_SEC) -> None:
self.s = requests.Session()
self.s.headers["User-Agent"] = USER_AGENT
self.interval = interval
self._last = 0.0
def _wait(self) -> None:
delta = time.monotonic() - self._last
if delta < self.interval:
time.sleep(self.interval - delta)
self._last = time.monotonic()
def request(self, method: str, url: str, *, max_retries: int = 4,
timeout: float = 30.0, **kw: Any) -> requests.Response:
last_exc: Exception | None = None
for attempt in range(max_retries):
self._wait()
try:
resp = self.s.request(method, url, timeout=timeout, **kw)
except requests.RequestException as exc:
last_exc = exc
backoff = min(30.0, (2 ** attempt) + random.random())
log.warning("network error on %s %s: %s — retry in %.1fs",
method, url, exc, backoff)
time.sleep(backoff)
continue
if resp.status_code == 429 or 500 <= resp.status_code < 600:
ra = resp.headers.get("Retry-After")
backoff = float(ra) if (ra and ra.isdigit()) else min(30.0, (2 ** attempt) + random.random())
log.warning("HTTP %d on %s %s — retry in %.1fs",
resp.status_code, method, url, backoff)
time.sleep(backoff)
continue
return resp
if last_exc:
raise last_exc
return resp # type: ignore[return-value]
def get(self, url: str, **kw: Any) -> requests.Response:
return self.request("GET", url, **kw)
# --------------------------------------------------------------------- model
@dataclass
class EbProduct:
source_key: str
source_url: str # the per-crop page URL (Ebbert's doesn't have per-variety pages)
crop: str
product_name: str = "" # "7000TR RIB", "1335 CONVENTIONAL"
trait_label: str | None = None # "RIB", "CONVENTIONAL", "PC", "SSX RIB", etc.
relative_maturity: str | None = None # corn
maturity_group: str | None = None # soy
body_text: str = "" # verbatim text of the variety's container
# --------------------------------------------------------------------- discovery + parse
_VARIETY_HEADING_RE = re.compile(
r"^(?P<name>\S+(?:\s+\S+)*?)\s+(?P<rm>\d+(?:\.\d+)?)\s*RM$",
re.IGNORECASE,
)
def _variety_text(h1, next_h1) -> str:
"""Collect the visible text from this variety's <h1> up to (but
not including) the next variety's <h1>, walking the DOM in
document order.
Ebbert's grid layout spreads each variety's content across many
sibling ``.x-cell`` blocks in the outer container; the h1's
immediate parent only holds the title cell. The correct boundary
is the next variety h1 in document order.
"""
chunks: list[str] = [h1.get_text(strip=True)]
for node in h1.find_all_next(string=True):
# Stop once we cross into the next variety's h1.
if next_h1 is not None:
if node is next_h1 or next_h1 in getattr(node, "parents", []):
break
# Or text is a descendant of next_h1
anc = node.parent
while anc is not None:
if anc is next_h1:
break
anc = anc.parent
if anc is next_h1:
break
text = str(node).strip()
if text:
chunks.append(text)
body = " | ".join(chunks)
body = re.sub(r"\s*\|\s*\|\s*", " | ", body)
body = re.sub(r"\s+", " ", body).strip()
return body
def _slug(text: str) -> str:
s = re.sub(r"[^a-zA-Z0-9]+", "-", text).strip("-").lower()
return s
def discover_and_parse(
http: RateLimitedSession, *, only_crop: str | None = None,
) -> list[EbProduct]:
"""Fetch one page per crop and extract every variety container."""
out: list[EbProduct] = []
for crop, path in CROP_PAGES.items():
if only_crop and crop != only_crop:
continue
url = f"{BASE}{path}"
log.info("fetching %s", url)
r = http.get(url)
r.raise_for_status()
soup = BeautifulSoup(r.text, "html.parser")
# Every variety is anchored by an <h1>NAME ... RM RM</h1>.
v_h1s = [
h for h in soup.find_all("h1")
if _VARIETY_HEADING_RE.match(h.get_text(strip=True))
]
log.info(" %s: %d varieties", crop, len(v_h1s))
for i, h1 in enumerate(v_h1s):
title = h1.get_text(strip=True)
m = _VARIETY_HEADING_RE.match(title)
if not m:
continue
name = m.group("name").strip()
maturity = m.group("rm")
next_h1 = v_h1s[i + 1] if i + 1 < len(v_h1s) else None
body = _variety_text(h1, next_h1)
prod = EbProduct(
source_key=f"ebberts-{_slug(name)}",
source_url=url,
crop=crop,
product_name=name,
relative_maturity=maturity if crop == "corn" else None,
maturity_group=maturity if crop == "soybeans" else None,
body_text=body,
)
# Derive trait_label from the second token of the name if
# it looks like a trait (CONVENTIONAL, RIB, PC, SSX RIB,
# TR RIB, etc.). Best-effort, doesn't have to be perfect.
parts = name.split(maxsplit=1)
if len(parts) == 2:
prod.trait_label = parts[1]
out.append(prod)
log.info("total varieties discovered: %d", len(out))
return out
# --------------------------------------------------------------------- render
def render_markdown(p: EbProduct) -> str:
title = p.product_name or p.source_key
crop_label = {"corn": "Corn", "soybeans": "Soybeans",
"wheat": "Wheat"}.get(p.crop, p.crop.title())
head: list[str] = [
f"# {title}",
"",
"- **Vendor:** Ebbert's Seeds (independent regional breeder)",
"- **Brand:** Ebbert's Seeds",
f"- **Crop:** {crop_label}",
]
if p.relative_maturity and p.crop == "corn":
head.append(f"- **Relative maturity:** {p.relative_maturity}")
if p.maturity_group and p.crop == "soybeans":
head.append(f"- **Maturity group:** {p.maturity_group}")
if p.trait_label:
head.append(f"- **Trait stack (label):** {p.trait_label}")
head.append(f"- **Source:** {p.source_url}")
head.append(f"- **Rating scale (Ebbert's):** {RATING_SCALE_DIRECTION}")
head.append("- **Service area:** Covington, OH + Decatur, IN — Eastern Corn Belt regional")
head.append("")
head.append("---")
head.append("")
head.append("## Variety detail (verbatim from page)")
head.append("")
head.append(p.body_text)
head.append("")
return "\n".join(head)
# --------------------------------------------------------------------- write
def write_product(prod: EbProduct, body_md: str) -> None:
CORPUS_DIR.mkdir(parents=True, exist_ok=True)
md_path = CORPUS_DIR / f"{prod.source_key}.md"
json_path = CORPUS_DIR / f"{prod.source_key}.json"
md_path.write_text(body_md, encoding="utf-8")
sidecar = {
"source": "ebberts_seeds",
"source_key": prod.source_key,
"vendor": "Ebbert's Seeds",
"brand": "Ebbert's Seeds",
"product_name": prod.product_name,
"product_id": None,
"hybrid_prefix": prod.product_name,
"hybrid_suffix": prod.trait_label,
"crop": prod.crop,
"release_year": None,
"relative_maturity": prod.relative_maturity,
"maturity_group": prod.maturity_group,
"wheat_class": None,
"trait_stack": [prod.trait_label] if prod.trait_label else [],
"trait_descriptions": [],
"positioning_statement": None,
"strengths": [],
# No structured groups — the body markdown carries the table
# text verbatim. characteristics_groups stays empty so the
# chunker doesn't try to bucket non-existent items.
"characteristics_groups": [],
"page_text_chars": len(prod.body_text),
"_scale_direction": RATING_SCALE_DIRECTION,
"regional_recommendations": [
{"product_list_name": "Ebbert's service area (Eastern Corn Belt — OH/IN/IL)",
"agronomist": None, "agronomist_email": None, "variant_id": None},
],
"image_url": None,
"source_urls": [prod.source_url],
"sitemap_last_modified": None,
"fetched_at": datetime.now(timezone.utc).isoformat(),
"scraper_version": SCRAPER_VERSION,
}
json_path.write_text(
json.dumps(sidecar, indent=2, ensure_ascii=False) + "\n",
encoding="utf-8",
)
# --------------------------------------------------------------------- pipeline
def run(*, limit: int | None, force: bool,
only_crop: str | None, only_product: str | None) -> int:
CORPUS_DIR.mkdir(parents=True, exist_ok=True)
http = RateLimitedSession()
products = discover_and_parse(http, only_crop=only_crop)
if only_product:
products = [
p for p in products
if p.source_key == only_product
or p.product_name.lower() == only_product.lower()
]
if not products:
log.error("no variety matched --product=%s", only_product)
return 2
counts = {"written": 0, "skipped": 0}
processed = 0
for prod in products:
if limit is not None and processed >= limit:
break
processed += 1
md_path = CORPUS_DIR / f"{prod.source_key}.md"
if md_path.exists() and not force:
counts["skipped"] += 1
log.info("[%d/%s] %s skipped",
processed, str(limit) if limit else len(products),
prod.source_key)
continue
body = render_markdown(prod)
write_product(prod, body)
counts["written"] += 1
log.info(
"[%d/%s] %s written | crop=%s rm/mg=%s trait=%s chars=%d",
processed, str(limit) if limit else len(products),
prod.source_key, prod.crop,
prod.relative_maturity or prod.maturity_group or "-",
prod.trait_label or "-", len(prod.body_text),
)
log.info(
"done: processed=%d written=%d skipped=%d (of %d varieties)",
processed, counts["written"], counts["skipped"], len(products),
)
return 0
# --------------------------------------------------------------------- CLI
def _build_argparser() -> argparse.ArgumentParser:
p = argparse.ArgumentParser(
prog="scrape.sources.ebberts_seeds",
description="Scrape Ebbert's Seeds (regional Eastern Corn Belt breeder) — "
"corn / soybeans / wheat.",
)
p.add_argument("--limit", type=int, default=None,
help="Stop after processing N varieties (default: all).")
p.add_argument("--force", action="store_true",
help="Re-fetch even if the markdown file already exists.")
p.add_argument("--crop", default=None, choices=list(CROP_PAGES),
help="Limit to one crop (corn / soybeans / wheat).")
p.add_argument("--product", default=None,
help="Process a single variety by source_key or product name.")
p.add_argument("--log-level", default=os.environ.get("LOG_LEVEL", "INFO"))
return p
def main(argv: list[str] | None = None) -> int:
args = _build_argparser().parse_args(argv)
logging.basicConfig(
level=args.log_level.upper(),
format="%(asctime)s %(levelname)s %(name)s %(message)s",
stream=sys.stderr,
)
return run(
limit=args.limit, force=args.force,
only_crop=args.crop, only_product=args.product,
)
if __name__ == "__main__":
sys.exit(main())
+503
View File
@@ -0,0 +1,503 @@
"""LG Seeds scraper — AgReliant Genetics brand.
Source: ``www.lgseeds.com`` — WordPress site. Empty robots.txt
(no Disallow). Catalog covers 4 crops: corn, soybeans, alfalfa,
sorghum.
Two-layer fetch:
1. **Listing page** (one per crop): inline JavaScript variable
``products = [{...}, ...]`` carries the full variety summary —
Variety code, Maturity, Traits[], Bullets[], CropType. No
per-variety HTTP needed for identity.
2. **Detail page** (``/products/<crop>/<Variety>``): rich plant
characteristics + disease tolerance + management ratings,
rendered as ``<div class="characteristics-bar">`` blocks with
``<span class="bar-N">`` where N ∈ 1-9 is the rating. Same
convention as Bayer/Golden Harvest (9 = best).
LG Seeds is a regional brand (Eastern Corn Belt focus) under
AgReliant Genetics, the same parent as AgriGold. Brand voice is
distinct so we keep them in separate scrapers.
Rating scale: ``1-9 (9 = best)`` — verified empirically on the
bar-N markup; matches Bayer / Golden Harvest convention.
Output:
corpus/lg_seeds/<source_key>.md
corpus/lg_seeds/<source_key>.json
source_key: ``lg-<variety>`` lowercased, e.g. ``lg-lg5701``,
``lg-c3400`` (soybean — codes don't use LG prefix), ``lg-7c300``
(alfalfa), ``lg-silo-max-100`` (sorghum).
CLI:
python -m scrape.sources.lg_seeds --crop corn --limit 5
python -m scrape.sources.lg_seeds --force
"""
from __future__ import annotations
import argparse
import json
import logging
import os
import random
import re
import sys
import time
from dataclasses import dataclass, field
from datetime import datetime, timezone
from pathlib import Path
from typing import Any
import requests
from bs4 import BeautifulSoup
SCRAPER_VERSION = "0.1.0"
USER_AGENT = "seed-mcp-scraper/0.1 (+https://drawbar.example/contact)"
BASE = "https://www.lgseeds.com"
# Crops listed in nav. Each has a listing page at /products/<crop>
# with an inline `var products = [...]` JSON blob.
LISTING_PATHS = {
"corn": "/products/corn",
"soybeans": "/products/soybeans",
"alfalfa": "/products/alfalfa",
"sorghum": "/products/sorghum",
}
RATING_SCALE_DIRECTION = "1-9 (9 = best)"
REPO_ROOT = Path(__file__).resolve().parents[2]
CORPUS_ROOT = Path(os.environ.get("CORPUS_ROOT") or REPO_ROOT / "corpus")
CORPUS_DIR = CORPUS_ROOT / "lg_seeds"
REQ_INTERVAL_SEC = 1.0
log = logging.getLogger("scrape.lg_seeds")
# --------------------------------------------------------------------- HTTP
class RateLimitedSession:
def __init__(self, interval: float = REQ_INTERVAL_SEC) -> None:
self.s = requests.Session()
self.s.headers["User-Agent"] = USER_AGENT
self.interval = interval
self._last = 0.0
def _wait(self) -> None:
delta = time.monotonic() - self._last
if delta < self.interval:
time.sleep(self.interval - delta)
self._last = time.monotonic()
def request(self, method: str, url: str, *, max_retries: int = 4,
timeout: float = 30.0, **kw: Any) -> requests.Response:
last_exc: Exception | None = None
for attempt in range(max_retries):
self._wait()
try:
resp = self.s.request(method, url, timeout=timeout, **kw)
except requests.RequestException as exc:
last_exc = exc
backoff = min(30.0, (2 ** attempt) + random.random())
log.warning("network error on %s %s: %s — retry in %.1fs",
method, url, exc, backoff)
time.sleep(backoff)
continue
if resp.status_code == 429 or 500 <= resp.status_code < 600:
ra = resp.headers.get("Retry-After")
backoff = float(ra) if (ra and ra.isdigit()) else min(30.0, (2 ** attempt) + random.random())
log.warning("HTTP %d on %s %s — retry in %.1fs",
resp.status_code, method, url, backoff)
time.sleep(backoff)
continue
return resp
if last_exc:
raise last_exc
return resp # type: ignore[return-value]
def get(self, url: str, **kw: Any) -> requests.Response:
return self.request("GET", url, **kw)
# --------------------------------------------------------------------- model
@dataclass
class LGProduct:
source_key: str
source_url: str
crop: str
product_name: str = ""
product_id: int | None = None
maturity_raw: str | None = None # corn RM days / soy MG / alfalfa FD / sorghum days
fall_dormancy: str | None = None # alfalfa only
trait_descriptions: list[str] = field(default_factory=list)
bullets: list[str] = field(default_factory=list)
characteristics_groups: list[dict] = field(default_factory=list)
# --------------------------------------------------------------------- discovery
_VAR_RE = re.compile(
r'var\s+\w+\s*=\s*(\[\{"Variety":.+?\}\]);', re.S,
)
def discover_varieties(
http: RateLimitedSession, *, only_crop: str | None = None,
) -> list[tuple[str, dict]]:
"""Return ``[(crop, summary_dict), ...]`` from each listing page's
inline JSON. Summary dict has Variety / Id / Maturity / Traits /
Bullets / CropType / FallDormancy."""
out: list[tuple[str, dict]] = []
for crop, path in LISTING_PATHS.items():
if only_crop and crop != only_crop:
continue
log.info("fetching listing %s%s", BASE, path)
r = http.get(f"{BASE}{path}")
r.raise_for_status()
m = _VAR_RE.search(r.text)
if not m:
log.warning("no products array in %s", path)
continue
try:
items = json.loads(m.group(1))
except json.JSONDecodeError as exc:
log.error("JSON parse failed for %s: %s", path, exc)
continue
log.info(" %s: %d varieties", crop, len(items))
for it in items:
out.append((crop, it))
log.info("total varieties discovered: %d", len(out))
return out
# --------------------------------------------------------------------- helpers
def source_key_for(variety: str) -> str:
"""Slugify the variety code into a stable source_key."""
slug = re.sub(r"[^a-zA-Z0-9-]+", "-", variety).strip("-").lower()
return f"lg-{slug}"
_BAR_CLASS_RE = re.compile(r"^bar-(\d)$")
def _parse_bar_value(span_classes: list[str]) -> int | None:
"""Extract the integer rating from a ``bar-N`` CSS class."""
for c in span_classes or []:
m = _BAR_CLASS_RE.match(c)
if m:
return int(m.group(1))
return None
# --------------------------------------------------------------------- detail
def fetch_product_detail(
http: RateLimitedSession, summary: dict, crop: str,
) -> LGProduct:
"""Fetch the detail page and merge characteristics into an
LGProduct seeded by the listing-page summary."""
variety = summary.get("Variety") or ""
# LG's detail URL is /products/<crop>/<Variety>. The Variety in the
# listing JSON appears in correct case; LG seems to accept any case
# but we use what's published.
url = f"{BASE}/products/{crop}/{variety}"
prod = LGProduct(
source_key=source_key_for(variety),
source_url=url,
crop=crop,
product_name=variety,
product_id=summary.get("Id"),
maturity_raw=str(summary.get("Maturity")) if summary.get("Maturity") is not None else None,
fall_dormancy=str(summary.get("FallDormancy")) if summary.get("FallDormancy") else None,
trait_descriptions=list(summary.get("Traits") or []),
bullets=list(summary.get("Bullets") or []),
)
try:
r = http.get(url)
r.raise_for_status()
except Exception as exc: # noqa: BLE001
log.warning("detail fetch failed for %s: %s", variety, exc)
return prod # identity-only fallback
soup = BeautifulSoup(r.text, "html.parser")
# The detail page has multiple .product-section blocks; each has
# a heading + a collection of .characteristics-bar rows. We bucket
# by the section's text content. Common LG section labels:
# "Characteristics" / "Management" / "Disease Tolerance".
sections: list[tuple[str, list[dict]]] = []
for section in soup.find_all("div", class_=re.compile(r"product-section")):
# Heading is the first text node inside the section, before bars.
# The section class often includes a hint like "disease-toler",
# "plantCharacteristics", "management-pr".
section_classes = " ".join(section.get("class", []))
bars = section.find_all("div", class_="characteristics-bar")
if not bars:
continue
# Section label — use the first heading-like element or the
# text right after the section class anchor.
label = ""
for h in section.find_all(["h2", "h3", "h4"]):
t = h.get_text(strip=True)
if t:
label = t
break
if not label:
# fallback: section_classes hint
if "disease" in section_classes.lower():
label = "Disease Tolerance"
elif "management" in section_classes.lower():
label = "Management"
elif "plantcharacteristics" in section_classes.lower():
label = "Characteristics"
items: list[dict] = []
for bar in bars:
name_el = bar.find(class_="product-name")
value_span = bar.find("span", class_=_BAR_CLASS_RE)
name = (name_el.get_text(" ", strip=True) if name_el else "").strip()
rating = _parse_bar_value(value_span.get("class") if value_span else [])
if not name:
continue
# Some "bars" are actually qualitative (e.g. "Tar Spot Susceptible",
# "Fungicide Response High"). For those we keep the label as the
# value text rather than a missing rating.
if rating is None:
# Look inside the bar element for a non-name text snippet
inner_text = bar.get_text(" ", strip=True)
# Strip the label off the front
if inner_text.startswith(name):
inner_text = inner_text[len(name):].strip()
items.append({"characteristic": name, "value": inner_text or "-"})
else:
items.append({"characteristic": name, "value": str(rating)})
if items:
sections.append((label or "Characteristics", items))
prod.characteristics_groups = [
{"label": label.upper(), "type": "bars", "items": items}
for label, items in sections
]
return prod
# --------------------------------------------------------------------- render
def render_markdown(p: LGProduct) -> str:
title = p.product_name or p.source_key
crop_label = {
"corn": "Corn", "soybeans": "Soybeans",
"alfalfa": "Alfalfa", "sorghum": "Sorghum",
}.get(p.crop, p.crop.title())
head: list[str] = [
f"# {title}",
"",
"- **Vendor:** AgReliant Genetics",
"- **Brand:** LG Seeds",
f"- **Crop:** {crop_label}",
]
if p.maturity_raw:
if p.crop == "corn":
head.append(f"- **Relative maturity:** {p.maturity_raw}")
elif p.crop == "soybeans":
head.append(f"- **Maturity group:** {p.maturity_raw}")
elif p.crop == "alfalfa":
head.append(f"- **Fall dormancy / maturity:** {p.maturity_raw}")
elif p.crop == "sorghum":
head.append(f"- **Days to maturity:** {p.maturity_raw}")
if p.trait_descriptions:
head.append(f"- **Traits:** {', '.join(p.trait_descriptions)}")
head.append(f"- **Source:** {p.source_url}")
head.append(f"- **Rating scale (LG Seeds):** {RATING_SCALE_DIRECTION}")
head.append("")
head.append("---")
head.append("")
sections: list[str] = []
if p.bullets:
bullets = "\n".join(f"- {b}" for b in p.bullets)
sections.append("## Strengths\n\n" + bullets + "\n")
for g in p.characteristics_groups:
label = (g.get("label") or "Characteristics").title()
items = g.get("items") or []
if not items:
continue
rows = "\n".join(f"| {it['characteristic']} | {it['value']} |" for it in items)
sections.append(
f"## {label}\n\n"
"| Characteristic | Value |\n"
"|---|---|\n"
f"{rows}\n"
)
return "\n".join(head) + "\n".join(sections)
# --------------------------------------------------------------------- write
def write_product(prod: LGProduct, body_md: str) -> None:
CORPUS_DIR.mkdir(parents=True, exist_ok=True)
md_path = CORPUS_DIR / f"{prod.source_key}.md"
json_path = CORPUS_DIR / f"{prod.source_key}.json"
md_path.write_text(body_md, encoding="utf-8")
sidecar = {
"source": "lg_seeds",
"source_key": prod.source_key,
"vendor": "AgReliant Genetics",
"brand": "LG Seeds",
"product_name": prod.product_name,
"product_id": prod.product_id,
"hybrid_prefix": prod.product_name,
"hybrid_suffix": None,
"crop": prod.crop,
"release_year": None,
# Maturity routing: corn = RM days, soy = MG, alfalfa = FD,
# sorghum = days-to-maturity. Stored in the canonical fields
# so the chunker's crop-aware preamble works.
"relative_maturity": prod.maturity_raw if prod.crop in ("corn", "sorghum") else None,
"maturity_group": prod.maturity_raw if prod.crop == "soybeans" else None,
"fall_dormancy": prod.maturity_raw if prod.crop == "alfalfa" else prod.fall_dormancy,
"wheat_class": None,
"trait_stack": prod.trait_descriptions, # LG publishes full names, not codes
"trait_descriptions": prod.trait_descriptions,
"positioning_statement": None,
"strengths": prod.bullets,
"characteristics_groups": prod.characteristics_groups,
"_scale_direction": RATING_SCALE_DIRECTION,
"regional_recommendations": [],
"image_url": None,
"source_urls": [prod.source_url],
"sitemap_last_modified": None,
"fetched_at": datetime.now(timezone.utc).isoformat(),
"scraper_version": SCRAPER_VERSION,
}
json_path.write_text(
json.dumps(sidecar, indent=2, ensure_ascii=False) + "\n",
encoding="utf-8",
)
# --------------------------------------------------------------------- pipeline
def process_product(
http: RateLimitedSession, summary: dict, crop: str, *, force: bool,
) -> tuple[str, LGProduct | None]:
variety = summary.get("Variety") or ""
source_key = source_key_for(variety)
md_path = CORPUS_DIR / f"{source_key}.md"
if md_path.exists() and not force:
return "skipped", None
try:
prod = fetch_product_detail(http, summary, crop)
except Exception as exc: # noqa: BLE001
log.error("variety %s failed: %s", variety, exc)
return "failed", None
body = render_markdown(prod)
write_product(prod, body)
return "written", prod
def run(
*, limit: int | None, force: bool,
only_crop: str | None, only_product: str | None,
) -> int:
CORPUS_DIR.mkdir(parents=True, exist_ok=True)
http = RateLimitedSession()
targets = discover_varieties(http, only_crop=only_crop)
if only_product:
targets = [
(c, s) for (c, s) in targets
if source_key_for(s.get("Variety", "")) == only_product
or s.get("Variety", "").lower() == only_product.lower()
]
if not targets:
log.error("no variety matched --product=%s", only_product)
return 2
counts = {"written": 0, "skipped": 0, "failed": 0}
processed = 0
for crop, summary in targets:
if limit is not None and processed >= limit:
break
processed += 1
status, prod = process_product(http, summary, crop, force=force)
counts[status] = counts.get(status, 0) + 1
if prod is not None:
log.info(
"[%d/%s] %s %s | crop=%s maturity=%s traits=%d groups=%d",
processed, str(limit) if limit else "all",
prod.source_key, status, prod.crop,
prod.maturity_raw or "-",
len(prod.trait_descriptions),
len(prod.characteristics_groups),
)
else:
log.info("[%d/%s] %s %s",
processed, str(limit) if limit else "all",
source_key_for(summary.get("Variety", "")), status)
log.info(
"done: processed=%d written=%d skipped=%d failed=%d (of %d candidates)",
processed, counts["written"], counts["skipped"],
counts["failed"], len(targets),
)
return 0 if counts["failed"] == 0 else 1
# --------------------------------------------------------------------- CLI
def _build_argparser() -> argparse.ArgumentParser:
p = argparse.ArgumentParser(
prog="scrape.sources.lg_seeds",
description="Scrape LG Seeds (AgReliant Genetics) — corn / "
"soybeans / alfalfa / sorghum.",
)
p.add_argument("--limit", type=int, default=None,
help="Stop after processing N varieties (default: all).")
p.add_argument("--force", action="store_true",
help="Re-fetch even if the markdown file already exists.")
p.add_argument("--crop", default=None, choices=list(LISTING_PATHS),
help="Limit to one crop.")
p.add_argument("--product", default=None,
help="Process a single variety by source_key or Variety code.")
p.add_argument("--log-level", default=os.environ.get("LOG_LEVEL", "INFO"))
return p
def main(argv: list[str] | None = None) -> int:
args = _build_argparser().parse_args(argv)
logging.basicConfig(
level=args.log_level.upper(),
format="%(asctime)s %(levelname)s %(name)s %(message)s",
stream=sys.stderr,
)
return run(
limit=args.limit, force=args.force,
only_crop=args.crop, only_product=args.product,
)
if __name__ == "__main__":
sys.exit(main())