agripro + nk scrapers — 146 Syngenta varieties added (760 total in corpus) #6
Reference in New Issue
Block a user
Delete Branch "agripro-nk-scrapers"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
/search-agripro-brand-varieties?title=&variety_type_value=All. Three rated sections per variety (Agronomics / Grain / Disease) with<div class="row">label/value pairs. Wheat class distribution: 12 HRS, 7 SWW, 3 HRW, 1 HWS, 1 Barley. Closes the Northern Plains HRS gap WestBred didn't cover.POST /NKSeeds/{Corn,Soy}ProductFinder.aspx/GetProductsreturning{"d": "<html>"}with one<div class="sf-result">per variety. pdfplumber text extraction on tech-sheet PDFs pulls disease ratings (corn), Phytophthora source genes, SCN race coverage, agronomic numeric ratings (soy), and soil-type adaptation labels (soy)._scale_directionfrom each variety's sidecar, so the chunk preamble correctly tells the LLM how to interpret the numbers — anti-hallucination guarantee holds across the reversed-scale vendors.pdfplumber>=0.11added for NK tech-sheet text extraction.Coverage after merge
bayer_seedsgolden_harvestnkagripro109 wheat varieties (up from 85) — now includes HRS/SWW/HRW/HWS classes across both AgriPro and WestBred.
Known v1 limitations (documented, not blockers)
lookup_variety('agripro-ap-iliad')for exact lookup;brand='AgriPro'filter for branded search.Test plan
agriprosmoke: all 24 wheat varieties parsed; AP Iliad has Stripe Rust 1, Eyespot 2 (1 = best for AgriPro).nksmoke: 122 cards parsed (41 corn / 81 soy); 77 of 81 soy varieties have Phytophthora gene extracted; corn disease ratings extract correctly (e.g. NK8005 Tar Spot 2 = excellent).list_versionsreports 4 sources / 2 vendors / 6 brands / 3 crops with correct counts.crop_seed_api_lessons(topic='rating-scales')returns the corrected guidance with NK + AgriPro in the reversed-scale bucket.agripro (24 varieties) - Drupal Views form scrape via /search-agripro-brand-varieties with explicit GET params (sidesteps the AJAX-only-on-load default that returns an empty form skeleton). - Per-variety parse: <h1>, .field--node--variety-type--variety, .field--node--tag-line--variety, .field--node--body, plus the three rated sections (Agronomics / Grain / Disease) with their <div class="row"><div class="label">label</div><div>value</div> pairs. - Wheat-class distribution: 12 HRS, 7 SWW, 3 HRW, 1 HWS, 1 Barley — provides the Northern Plains HRS coverage WestBred lacks. nk (122 varieties — recon's "29" was outdated; the current NK seed finder lists 41 corn + 81 soy) - ASP.NET WebForms endpoint: POST /NKSeeds/{Corn,Soy}ProductFinder.aspx/GetProducts returns {"d": "<html>"} where the inner HTML is one <div class="sf-result"> per variety. BeautifulSoup tokenizes the whole blob. - Per-card: product code (NK8005, NK008-P8XF), RM/MG from the title <span>, "Brands Available" trait variants, marketing positioning + bullet strengths, tech-sheet PDF URL. - pdfplumber text extraction on the tech-sheet PDFs adds: * corn disease ratings (Gray Leaf Spot, NCLB, Goss's Wilt, Anthracnose, Tar Spot, Fusarium, etc.) where the PDF prints "Label N" lines (text-extractable) * soybean Phytophthora source genes (Rps1c, Rps3a, ...) * soybean SCN race coverage * soybean agronomic ratings (Emergence, Standability, Shatter Tolerance, Green Stem) with text-extractable 1-9 values * soybean soil-type adaptation (Best/Good/Fair/Poor) for drought prone / high pH / poorly drained / etc. - Agronomic rating BARS for corn (Emergence, Stalk Strength, Drought) are not text-extractable; we record the labels with an explicit "rated in PDF chart, see tech sheet" value so the agent can direct the farmer at the source for those numbers. Scale-direction correction in lessons.md: - NK and AgriPro both use 1 = best, lower = more resistant — the REVERSED convention vs Bayer / Golden Harvest. NK's tech-sheet footer literally prints "1-9 Scale: 1 = Best, 9 = Worst". AgriPro positioning on stripe-rust-resistant varieties (AP Iliad with Stripe Rust 1, Eyespot 2) confirms the same direction. - sources-not-yet-indexed section trimmed to just Beck's PFR + Beck's products — everything else IS now in the corpus. Cross-vendor coverage after this PR: 760 varieties. bayer_seeds 475 (DEKALB 288 / Asgrow 102 / WestBred 85) golden_harvest 139 nk 122 (41 corn / 81 soy) agripro 24 (12 HRS / 7 SWW / 3 HRW / 1 HWS / 1 Barley) Vendors: Bayer, Syngenta. Brands: 6. Crops: corn, soy, wheat (109 wheat now, up from 85). requirements.txt: pdfplumber>=0.11 for NK tech-sheet parsing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>