seed-mcp

Author	SHA1	Message	Date
claude	54094a0d43	Add university-extension variety trials: Illinois VT + Iowa ICPT + Ohio OCPT (+123 trial docs) Independent third-party performance data — land-grant programs that test every entered brand side-by-side with replication + LSD stats. This is the legitimate way to get Pioneer / DEKALB / Brevant / Channel performance the corpus can't scrape directly (data_type=trial, results[] shape; falls through the trial chunker). - illinois_vt_trials (30 docs, 1,392 rows) — U of Illinois VT. Per-region XLSX (openpyxl), corn + soy + WHEAT, 2024+2025. Rich per-site agronomic metadata; corn-following-corn vs -soybean kept distinct. - iowa_icpt_trials (24 docs, 674 rows) — Iowa State ICPT. ASP.NET GridView (viewstate postback for year/district), corn + soy by district x season. - ohio_ocpt_trials (69 docs, 4,647 rows) — OSU/CFAES OCPT. Report PDF (pdfplumber; per-site column groups split by header Yield-token count + x-coord footnote bucketing), corn + soy per site, 2024+2025. 91 distinct seed brands across the three; majors confirmed present in the independent rankings: DEKALB 395, Golden Harvest 249, Channel 241, NK 212, Xitavo 135, LG 103, Pioneer 88, Asgrow 59. (A brand only appears where it ENTERED a given program — e.g. Brevant not in Iowa, DEKALB/Channel not in Illinois — true negatives, not parse gaps.) - rag/chunk.py: gated `include_region` on _render_gh_plot_chunk; the 3 university sources route through it so the region/district is in the embedded chunk + labeled "variety trial (cross-vendor, independent third-party)". Existing plot sources (gh/lg/agrigold/proharvest) unchanged. - requirements.txt: openpyxl (Illinois XLSX; scrape-time only). - sources.json + README/CLAUDE/lessons: registered + attributed; lessons trial-data + Pioneer entries updated (Pioneer/DEKALB performance now available indirectly via these trials). Validation: all 123 chunk via rag.chunk.chunks_from_trial (0 errors), 0 out-of-range yields, 0 dup keys. Public land-grant data; attribution recorded in each tos_note. CI rebuilds the index from the committed corpus.	2026-06-10 08:35:50 -04:00
claude	0bac06b7b6	Add RobSeeCo (Rob-See-Co + Innotech): 130 corn/soy varieties from the seed-guide PDF (#18 ) Image rebuild (skip scrape) / build (push) Successful in 4m48s Details Co-authored-by: claude <claude@jpaul.io> Co-committed-by: claude <claude@jpaul.io>	2026-06-09 23:29:38 -04:00
claude	84ad2b1de6	Add 4 independent seed brands: Latham + Stine + 1st Choice + Burrus (+623 varieties) (#17 ) Image rebuild (skip scrape) / build (push) Successful in 4m44s Details Co-authored-by: claude <claude@jpaul.io> Co-committed-by: claude <claude@jpaul.io>	2026-06-04 21:58:07 -04:00
claude	22e8092faf	Add ProHarvest Seeds: 119 varieties + 161 cross-vendor plot reports (#16 ) Image rebuild (skip scrape) / build (push) Successful in 5m46s Details Co-authored-by: claude <claude@jpaul.io> Co-committed-by: claude <claude@jpaul.io>	2026-06-04 21:05:30 -04:00
justin	84b49d8360	trial data: workflow scrape steps + lessons.md trial-data guide .gitea/workflows/refresh.yml — add scrape steps for the new trial sources (agripro_trials, gh_plot_reports) so the monthly cron refreshes them alongside the variety sources. gh_plot_reports is the heaviest single source (~4,600 docs @ 1 req/sec ≈ 70 min); runs late so an earlier failure doesn't waste time before failing. Commit-message variable count expanded to surface the trial counts. docs_mcp/lessons.md — new "trial-data" section telling the agent: - The two surfaces (search_docs = identity, search_trials = perf) are complementary; how to route a farmer question to each. - What's indexed (GH plot reports cross-vendor, AgriPro regional PDFs) vs what's not (Bayer per-variety trials, NK yield results, Pioneer, university extension trials). - Recommended workflow: search_trials → identify top performers → lookup_variety on each to verify identity → don't fabricate. - How to read a GH plot report (per-column headers vary by crop: corn/soy use Yield/MST/Test Weight, silage uses Ton/Acre + Milk + Beef columns). - Single-data-point caveat: one plot is one cooperator's field; look across multiple plots for a robust recommendation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 15:22:08 -04:00
justin	9ce920f622	agripro + nk scrapers — 146 Syngenta varieties added (wheat + corn/soy) agripro (24 varieties) - Drupal Views form scrape via /search-agripro-brand-varieties with explicit GET params (sidesteps the AJAX-only-on-load default that returns an empty form skeleton). - Per-variety parse: <h1>, .field--node--variety-type--variety, .field--node--tag-line--variety, .field--node--body, plus the three rated sections (Agronomics / Grain / Disease) with their <div class="row"><div class="label">label</div><div>value</div> pairs. - Wheat-class distribution: 12 HRS, 7 SWW, 3 HRW, 1 HWS, 1 Barley — provides the Northern Plains HRS coverage WestBred lacks. nk (122 varieties — recon's "29" was outdated; the current NK seed finder lists 41 corn + 81 soy) - ASP.NET WebForms endpoint: POST /NKSeeds/{Corn,Soy}ProductFinder.aspx/GetProducts returns {"d": "<html>"} where the inner HTML is one <div class="sf-result"> per variety. BeautifulSoup tokenizes the whole blob. - Per-card: product code (NK8005, NK008-P8XF), RM/MG from the title <span>, "Brands Available" trait variants, marketing positioning + bullet strengths, tech-sheet PDF URL. - pdfplumber text extraction on the tech-sheet PDFs adds: * corn disease ratings (Gray Leaf Spot, NCLB, Goss's Wilt, Anthracnose, Tar Spot, Fusarium, etc.) where the PDF prints "Label N" lines (text-extractable) * soybean Phytophthora source genes (Rps1c, Rps3a, ...) * soybean SCN race coverage * soybean agronomic ratings (Emergence, Standability, Shatter Tolerance, Green Stem) with text-extractable 1-9 values * soybean soil-type adaptation (Best/Good/Fair/Poor) for drought prone / high pH / poorly drained / etc. - Agronomic rating BARS for corn (Emergence, Stalk Strength, Drought) are not text-extractable; we record the labels with an explicit "rated in PDF chart, see tech sheet" value so the agent can direct the farmer at the source for those numbers. Scale-direction correction in lessons.md: - NK and AgriPro both use 1 = best, lower = more resistant — the REVERSED convention vs Bayer / Golden Harvest. NK's tech-sheet footer literally prints "1-9 Scale: 1 = Best, 9 = Worst". AgriPro positioning on stripe-rust-resistant varieties (AP Iliad with Stripe Rust 1, Eyespot 2) confirms the same direction. - sources-not-yet-indexed section trimmed to just Beck's PFR + Beck's products — everything else IS now in the corpus. Cross-vendor coverage after this PR: 760 varieties. bayer_seeds 475 (DEKALB 288 / Asgrow 102 / WestBred 85) golden_harvest 139 nk 122 (41 corn / 81 soy) agripro 24 (12 HRS / 7 SWW / 3 HRW / 1 HWS / 1 Barley) Vendors: Bayer, Syngenta. Brands: 6. Crops: corn, soy, wheat (109 wheat now, up from 85). requirements.txt: pdfplumber>=0.11 for NK tech-sheet parsing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 14:16:36 -04:00
justin	4009dc0b78	Phase 11: crop_seed_api_lessons tool + Pioneer fallback Add the fifth MCP tool — crop_seed_api_lessons(topic?) — backed by docs_mcp/lessons.md, the ONLY source of opinionated content in the server. Everything else (search_docs, get_page, lookup_variety) returns verbatim from vendor catalogs; lessons.md fills the gaps the corpus can't cover. The Pioneer fallback is the critical anti-hallucination piece: Pioneer's ToS bans automation, so the corpus has no Pioneer data. Without this tool, an agent might surface Bayer/Asgrow chunks as mediocre matches for a Pioneer query. The tool's docstring tells the agent to call it on any Pioneer / P-series question; the 'pioneer' section says clearly: "I don't have Pioneer's variety data indexed... please consult Pioneer or an extension service." "Do NOT invent Pioneer hybrid ratings." Other lesson sections cover knowledge the agent needs to interpret search_docs / get_page output correctly: - rating-scales: Bayer 1-9, Golden Harvest 9-to-1, what R/MR/S/Rps1c/R3 mean in soybean disease columns - maturity-semantics: corn RM days vs soybean MG vs wheat class + qualitative early/medium/late - trait-glossary: SSRIB, VT2PRIB, XF, E3, Conkesta, Clearfield, etc. - scn-resistance: race coverage + Peking vs PI 88788 source - regional-listings: how to interpret Bayer's "local profiles" - sources-not-yet-indexed: which vendors aren't in the corpus yet - checking-your-work: always call lookup_variety before quoting Lesson lookup prefers slug-match (returns just `rating-scales` for topic="rating", not every section that mentions ratings); falls back to body-match only when no slug matches. Smoke-tested with topic=pioneer, topic=rating, topic=trait, topic=zzzzzz (no match), and topic=None (full index = 10K chars, 8 sections). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 13:18:57 -04:00

7 Commits