trial data: workflow scrape steps + lessons.md trial-data guide #8

Merged
justin merged 1 commits from workflow-and-lessons-for-trials into main 2026-05-25 15:22:26 -04:00
Owner

Add scrape steps for the new trial sources (agripro_trials, gh_plot_reports) in the monthly refresh workflow, and a 'trial-data' lesson in lessons.md telling the agent how to route between search_docs (identity) and search_trials (performance), what's indexed vs deferred, and how to read a GH plot report.

Companion to PR #7. Light follow-up — no scraper changes.

Add scrape steps for the new trial sources (agripro_trials, gh_plot_reports) in the monthly refresh workflow, and a 'trial-data' lesson in lessons.md telling the agent how to route between search_docs (identity) and search_trials (performance), what's indexed vs deferred, and how to read a GH plot report. Companion to PR #7. Light follow-up — no scraper changes.
justin added 1 commit 2026-05-25 15:22:25 -04:00
.gitea/workflows/refresh.yml — add scrape steps for the new trial
sources (agripro_trials, gh_plot_reports) so the monthly cron
refreshes them alongside the variety sources. gh_plot_reports
is the heaviest single source (~4,600 docs @ 1 req/sec ≈ 70 min);
runs late so an earlier failure doesn't waste time before failing.
Commit-message variable count expanded to surface the trial counts.

docs_mcp/lessons.md — new "trial-data" section telling the agent:

- The two surfaces (search_docs = identity, search_trials = perf)
  are complementary; how to route a farmer question to each.
- What's indexed (GH plot reports cross-vendor, AgriPro regional
  PDFs) vs what's not (Bayer per-variety trials, NK yield results,
  Pioneer, university extension trials).
- Recommended workflow: search_trials → identify top performers →
  lookup_variety on each to verify identity → don't fabricate.
- How to read a GH plot report (per-column headers vary by crop:
  corn/soy use Yield/MST/Test Weight, silage uses Ton/Acre +
  Milk + Beef columns).
- Single-data-point caveat: one plot is one cooperator's field;
  look across multiple plots for a robust recommendation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
justin merged commit cfa27d0bca into main 2026-05-25 15:22:26 -04:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: justin/seed-mcp#8