seed-mcp/scrape/sources/becks_products.py

"""Beck's product catalog scraper (identity-only until SeedIQ XHR sniff lands).

Source: Same public Sanity GROQ API as ``becks_pfr`` (no auth).
Expected count: ~860 products (corn + soy + wheat).

Current limitation: Beck's exposes IDENTITY fields publicly (product
name, RM/MG, basic trait stack) but routes the AGRONOMIC + DISEASE
ratings through their SeedIQ application, which is gated behind a
browser session cookie. The public Sanity records do not include
ratings.

What we CAN ship without SeedIQ:
- Product identity for confirmation ("yes Beck's has hybrid X at RM 112")
- RM (corn) / MG (soy) / class (wheat)
- Trait stack
- Basic descriptive text

What needs the SeedIQ XHR endpoint (BLOCKED on user sniff):
- Disease ratings (GLS, NCLB, Goss's, etc.)
- Agronomic ratings (standability, drought, etc.)
- Regional recommendations

For now this scraper is DEFERRED. Run when:
- User captures the SeedIQ XHR URL + cookie/header pattern from
  browser dev tools, OR
- We decide to ship Beck's as identity-only and let the LLM say
  "Beck's has this hybrid; ask your Beck's rep for full agronomic
  ratings" (less useful but avoids the empty-data UX).

Yellow verdict in sources.json reflects this — ``--all`` skips it.

TODO: implement (deferred).
"""
from __future__ import annotations

import sys


def main(argv: list[str] | None = None) -> int:
    print("becks_products: deferred — SeedIQ XHR sniff required for ratings, run only if user has captured the endpoint",
          file=sys.stderr)
    return 2


if __name__ == "__main__":
    sys.exit(main(sys.argv[1:]))