54094a0d43
Independent third-party performance data — land-grant programs that test every entered brand side-by-side with replication + LSD stats. This is the legitimate way to get Pioneer / DEKALB / Brevant / Channel performance the corpus can't scrape directly (data_type=trial, results[] shape; falls through the trial chunker). - illinois_vt_trials (30 docs, 1,392 rows) — U of Illinois VT. Per-region XLSX (openpyxl), corn + soy + WHEAT, 2024+2025. Rich per-site agronomic metadata; corn-following-corn vs -soybean kept distinct. - iowa_icpt_trials (24 docs, 674 rows) — Iowa State ICPT. ASP.NET GridView (viewstate postback for year/district), corn + soy by district x season. - ohio_ocpt_trials (69 docs, 4,647 rows) — OSU/CFAES OCPT. Report PDF (pdfplumber; per-site column groups split by header Yield-token count + x-coord footnote bucketing), corn + soy per site, 2024+2025. 91 distinct seed brands across the three; majors confirmed present in the independent rankings: DEKALB 395, Golden Harvest 249, Channel 241, NK 212, Xitavo 135, LG 103, Pioneer 88, Asgrow 59. (A brand only appears where it ENTERED a given program — e.g. Brevant not in Iowa, DEKALB/Channel not in Illinois — true negatives, not parse gaps.) - rag/chunk.py: gated `include_region` on _render_gh_plot_chunk; the 3 university sources route through it so the region/district is in the embedded chunk + labeled "variety trial (cross-vendor, independent third-party)". Existing plot sources (gh/lg/agrigold/proharvest) unchanged. - requirements.txt: openpyxl (Illinois XLSX; scrape-time only). - sources.json + README/CLAUDE/lessons: registered + attributed; lessons trial-data + Pioneer entries updated (Pioneer/DEKALB performance now available indirectly via these trials). Validation: all 123 chunk via rag.chunk.chunks_from_trial (0 errors), 0 out-of-range yields, 0 dup keys. Public land-grant data; attribution recorded in each tos_note. CI rebuilds the index from the committed corpus.
22 lines
470 B
Plaintext
22 lines
470 B
Plaintext
# MCP server
|
|
mcp[fastmcp]>=1.0.0
|
|
pydantic>=2.0
|
|
httpx>=0.27
|
|
|
|
# Vector store + embeddings
|
|
chromadb>=0.5.0
|
|
ollama>=0.4.0 # if using Ollama-hosted embedder; swap if not
|
|
|
|
# Scraping (Phase 1; adjust per product)
|
|
beautifulsoup4>=4.12
|
|
requests>=2.31
|
|
# playwright>=1.40 # uncomment if you need headless browser fallback
|
|
|
|
# Evaluation
|
|
numpy>=1.26
|
|
|
|
# Dev / utility
|
|
python-dateutil>=2.8
|
|
pdfplumber>=0.11
|
|
openpyxl>=3.1 # Illinois VT trial XLSX parsing (scrape-time only)
|