agripro + nk scrapers — 146 Syngenta varieties added (wheat + corn/soy)
agripro (24 varieties)
- Drupal Views form scrape via /search-agripro-brand-varieties with
explicit GET params (sidesteps the AJAX-only-on-load default that
returns an empty form skeleton).
- Per-variety parse: <h1>, .field--node--variety-type--variety,
.field--node--tag-line--variety, .field--node--body, plus the
three rated sections (Agronomics / Grain / Disease) with their
<div class="row"><div class="label">label</div><div>value</div>
pairs.
- Wheat-class distribution: 12 HRS, 7 SWW, 3 HRW, 1 HWS, 1 Barley
— provides the Northern Plains HRS coverage WestBred lacks.
nk (122 varieties — recon's "29" was outdated; the current NK seed
finder lists 41 corn + 81 soy)
- ASP.NET WebForms endpoint:
POST /NKSeeds/{Corn,Soy}ProductFinder.aspx/GetProducts returns
{"d": "<html>"} where the inner HTML is one <div class="sf-result">
per variety. BeautifulSoup tokenizes the whole blob.
- Per-card: product code (NK8005, NK008-P8XF), RM/MG from the
title <span>, "Brands Available" trait variants, marketing
positioning + bullet strengths, tech-sheet PDF URL.
- pdfplumber text extraction on the tech-sheet PDFs adds:
* corn disease ratings (Gray Leaf Spot, NCLB, Goss's Wilt,
Anthracnose, Tar Spot, Fusarium, etc.) where the PDF prints
"Label N" lines (text-extractable)
* soybean Phytophthora source genes (Rps1c, Rps3a, ...)
* soybean SCN race coverage
* soybean agronomic ratings (Emergence, Standability, Shatter
Tolerance, Green Stem) with text-extractable 1-9 values
* soybean soil-type adaptation (Best/Good/Fair/Poor) for drought
prone / high pH / poorly drained / etc.
- Agronomic rating BARS for corn (Emergence, Stalk Strength,
Drought) are not text-extractable; we record the labels with an
explicit "rated in PDF chart, see tech sheet" value so the agent
can direct the farmer at the source for those numbers.
Scale-direction correction in lessons.md:
- NK and AgriPro both use 1 = best, lower = more resistant — the
REVERSED convention vs Bayer / Golden Harvest. NK's tech-sheet
footer literally prints "1-9 Scale: 1 = Best, 9 = Worst".
AgriPro positioning on stripe-rust-resistant varieties (AP Iliad
with Stripe Rust 1, Eyespot 2) confirms the same direction.
- sources-not-yet-indexed section trimmed to just Beck's PFR +
Beck's products — everything else IS now in the corpus.
Cross-vendor coverage after this PR: 760 varieties.
bayer_seeds 475 (DEKALB 288 / Asgrow 102 / WestBred 85)
golden_harvest 139
nk 122 (41 corn / 81 soy)
agripro 24 (12 HRS / 7 SWW / 3 HRW / 1 HWS / 1 Barley)
Vendors: Bayer, Syngenta. Brands: 6. Crops: corn, soy, wheat (109
wheat now, up from 85).
requirements.txt: pdfplumber>=0.11 for NK tech-sheet parsing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+28
-16
@@ -72,7 +72,21 @@ re-stated it as `1-9 (9 = best)` in the chunk preamble; the source's
|
||||
`_scale_direction` field still says `9-to-1` so you can detect the
|
||||
provenance.
|
||||
|
||||
**Syngenta NK / AgriPro**: `1-9 (9 = best)`. Same as Bayer.
|
||||
**Syngenta NK and AgriPro**: `1-9 (1 = best, lower = more
|
||||
resistant)`. **REVERSED from Bayer and Golden Harvest.** NK's
|
||||
tech-sheet PDFs literally print *"1-9 Scale: 1 = Best, 9 = Worst"*
|
||||
in the footer; AgriPro's positioning on stripe-rust-resistant
|
||||
varieties (e.g. AP Iliad with Stripe Rust 1, Eyespot 2) confirms
|
||||
the same direction. On NK, this applies both to disease tolerance
|
||||
AND to numeric agronomic ratings (Emergence, Standability, Shatter
|
||||
Tolerance, Green Stem — all 1 = best). Cross-vendor comparisons
|
||||
MUST consult the `_scale_direction` field in each side's sidecar
|
||||
before drawing conclusions.
|
||||
|
||||
(Agronomic ratings on AgriPro are qualitative —
|
||||
"Excellent / Very Good / Good / Fair" — and have no direction
|
||||
issue. NK's soybean tech sheets ALSO publish soil-type adaptation
|
||||
as Best/Good/Fair/Poor labels which are qualitative.)
|
||||
|
||||
**Beck's**: ratings live behind SeedIQ login; only identity-level
|
||||
data is publicly available, so most disease/agronomic ratings are
|
||||
@@ -219,25 +233,23 @@ NK publishes ratings as PDF tech sheets without regional flags.
|
||||
|
||||
## sources-not-yet-indexed
|
||||
|
||||
These vendors are planned but not yet in the corpus. Don't assume
|
||||
their data is present:
|
||||
|
||||
- **Golden Harvest (Syngenta)** — ~175 varieties, sitemap-driven
|
||||
scrape pending.
|
||||
- **NK (Syngenta)** — 29 varieties.
|
||||
- **AgriPro (Syngenta wheat)** — 24 wheat varieties (HRW, HRS, HWS,
|
||||
SWW, SWS). The only wheat coverage we expect to have outside
|
||||
WestBred.
|
||||
- **Beck's PFR (research)** — 2,089 head-to-head trial documents.
|
||||
Different shape from variety records — these are studies, not
|
||||
hybrids.
|
||||
- **Beck's products** — 860 products. Identity-only (SeedIQ login
|
||||
gates the ratings).
|
||||
|
||||
If `list_versions()` doesn't show a vendor in the `vendor` facet, the
|
||||
corpus does not have it yet. Direct the farmer to that vendor's
|
||||
public catalog or their seed dealer.
|
||||
|
||||
**Already indexed**: Bayer (DEKALB / Asgrow / WestBred), Syngenta
|
||||
(Golden Harvest, NK, AgriPro).
|
||||
|
||||
**Not yet indexed**:
|
||||
|
||||
- **Beck's PFR (research)** — 2,089 head-to-head trial documents
|
||||
on the public Sanity GROQ API. Different shape from variety
|
||||
records — these are studies, not hybrids. Surfacing them would
|
||||
benefit a separate tool (e.g. `search_pfr_studies`) rather than
|
||||
share a corpus with variety identity.
|
||||
- **Beck's products** — ~860 products. Identity-only (SeedIQ login
|
||||
gates the ratings).
|
||||
|
||||
---
|
||||
|
||||
## checking-your-work
|
||||
|
||||
Reference in New Issue
Block a user