Merge pull request 'trial data: workflow scrape steps + lessons.md trial-data guide' (#8) from workflow-and-lessons-for-trials into main
Image rebuild (skip scrape) / build (push) Failing after 10s
Image rebuild (skip scrape) / build (push) Failing after 10s
This commit was merged in pull request #8.
This commit is contained in:
@@ -83,10 +83,21 @@ jobs:
|
||||
if: ${{ inputs.sources == '' || contains(inputs.sources, 'agripro') }}
|
||||
run: python -m scrape.runner --source agripro --force
|
||||
|
||||
- name: Scrape AgriPro regional trial PDFs
|
||||
if: ${{ inputs.sources == '' || contains(inputs.sources, 'agripro_trials') }}
|
||||
run: python -m scrape.runner --source agripro_trials --force
|
||||
|
||||
- name: Scrape Golden Harvest plot reports (cross-vendor yield trials)
|
||||
if: ${{ inputs.sources == '' || contains(inputs.sources, 'gh_plot_reports') }}
|
||||
# Heaviest single source — ~4,600 docs at 1 req/sec ≈ 70 min.
|
||||
# Wraps the bulk of CI time; runs late so an earlier failure
|
||||
# doesn't waste 70 min before failing.
|
||||
run: python -m scrape.runner --source gh_plot_reports --force
|
||||
|
||||
- name: Scrape Beck's PFR research corpus
|
||||
if: ${{ inputs.sources == '' || contains(inputs.sources, 'becks_pfr') }}
|
||||
# Heaviest source — ~2,089 docs via public Sanity GROQ.
|
||||
# No auth, but rate-limit ourselves to be polite.
|
||||
# Deferred (returns 0 cleanly from a stub) — implementation
|
||||
# pending. Public Sanity GROQ at mc8v24rf.api.sanity.io.
|
||||
run: python -m scrape.runner --source becks_pfr --force
|
||||
|
||||
# ---- Commit corpus changes + retry-on-race -----------------
|
||||
@@ -107,8 +118,10 @@ jobs:
|
||||
n_gh=$(find corpus/golden_harvest -name '*.json' 2>/dev/null | wc -l)
|
||||
n_nk=$(find corpus/nk -name '*.json' 2>/dev/null | wc -l)
|
||||
n_ag=$(find corpus/agripro -name '*.json' 2>/dev/null | wc -l)
|
||||
n_agt=$(find corpus/agripro_trials -name '*.json' 2>/dev/null | wc -l)
|
||||
n_ghpr=$(find corpus/gh_plot_reports -name '*.json' 2>/dev/null | wc -l)
|
||||
n_pfr=$(find corpus/becks_pfr -name '*.json' 2>/dev/null | wc -l)
|
||||
git commit -m "monthly refresh: ${ts} — bayer=${n_bayer} gh=${n_gh} nk=${n_nk} agripro=${n_ag} pfr=${n_pfr}"
|
||||
git commit -m "monthly refresh: ${ts} — bayer=${n_bayer} gh=${n_gh} nk=${n_nk} agripro=${n_ag} ag_trials=${n_agt} gh_plot_reports=${n_ghpr} pfr=${n_pfr}"
|
||||
attempt=1
|
||||
while [ $attempt -le 3 ]; do
|
||||
if git push; then
|
||||
|
||||
@@ -252,6 +252,79 @@ public catalog or their seed dealer.
|
||||
|
||||
---
|
||||
|
||||
## trial-data
|
||||
|
||||
The MCP exposes TWO complementary surfaces:
|
||||
|
||||
* **`search_docs`** — variety IDENTITY (what a hybrid IS):
|
||||
disease ratings, trait stack, maturity, vendor positioning.
|
||||
* **`search_trials`** — variety PERFORMANCE (how it ACTUALLY did):
|
||||
ranked yield at specific cooperator fields and regions.
|
||||
|
||||
**Indexed trial sources**:
|
||||
|
||||
- **Golden Harvest plot reports** (~4,600 cross-vendor head-to-head
|
||||
trials, 2024+2025). Each trial = one cooperator's field at a
|
||||
specific state/year, comparing products from multiple brands
|
||||
(NK / DEKALB / Golden Harvest / Enogen / Pioneer / Channel, etc.)
|
||||
side by side. **This is the closest thing to independent
|
||||
comparison data the corpus has** — Bayer doesn't publish its own
|
||||
trial data, but GH publishes plots where DEKALB hybrids appear
|
||||
alongside their competitors.
|
||||
- **AgriPro regional trial PDFs** (14 PDFs) — multi-year
|
||||
multi-location wheat performance for Northern Plains / Pacific
|
||||
Northwest / Plains regions. Variety + per-location yields
|
||||
preserved verbatim.
|
||||
|
||||
**Recommended workflow when a farmer asks about performance**:
|
||||
|
||||
1. Call `search_trials(crop, state, year, ...)` to find trials
|
||||
from the relevant region/season.
|
||||
2. Identify the top performers in the rankings.
|
||||
3. Call `lookup_variety(source_key=...)` for each leading hybrid to
|
||||
verify identity (RM, traits, disease ratings) — confirm the
|
||||
variety actually fits the farmer's situation, not just that it
|
||||
won someone else's trial.
|
||||
4. If the leading variety is from a brand whose trial data isn't
|
||||
directly published (e.g. DEKALB), the GH plot reports often
|
||||
show it competing — that's still the agent's best public
|
||||
third-party signal.
|
||||
|
||||
**Trial data NOT in the corpus** (don't fabricate):
|
||||
|
||||
- **DEKALB / Asgrow / Channel** per-variety yield trials —
|
||||
Bayer keeps these in rep tools, not on the public catalog. The
|
||||
GH plot reports surface DEKALB/Asgrow performance indirectly,
|
||||
but per-variety dedicated trials aren't indexed.
|
||||
- **NK yield results** — the data exists at
|
||||
`syngenta-us.com/nk/yield-results` but the ASMX endpoint is
|
||||
fiddly; not yet scraped. The variety identity is in the corpus
|
||||
(`search_docs` finds it), just not the per-region trial yields.
|
||||
- **Pioneer trials** — ToS bans automation, so we have neither
|
||||
variety identity nor trial data. Direct the farmer to a
|
||||
Pioneer dealer.
|
||||
- **University extension trials** (Iowa State, Illinois,
|
||||
Purdue, etc.) — third-party trial data that publishes Pioneer
|
||||
+ competitors. Not in the corpus today; could be added in a
|
||||
future enrichment.
|
||||
|
||||
**Reading a GH plot report**:
|
||||
|
||||
Each plot has a cooperator name (the farmer running the trial), a
|
||||
state, a year, planting/harvest dates, population, row width, and a
|
||||
ranked table of products. The columns vary by crop:
|
||||
|
||||
- **Corn / Soy**: Rank | Brand | Product | Traits | Yield BU/Ac
|
||||
| %MST | Test Weight | Gross Revenue
|
||||
- **Silage**: Rank | Brand | Product | Traits | Ton/Acre
|
||||
| Milk Per Acre | Milk Per Ton | Beef Per Acre | Beef Per Ton
|
||||
|
||||
Rank 1 = top performer at that site/year. Note that a single plot
|
||||
is one data point — for a robust recommendation, look across
|
||||
multiple plots from the same region.
|
||||
|
||||
---
|
||||
|
||||
## checking-your-work
|
||||
|
||||
Before quoting a specific number to a farmer, **always** call
|
||||
|
||||
Reference in New Issue
Block a user