d65c7d0d67
The scaffold-era README was out of sync with the shipped product: - Vendor counts stale (recon estimates, not actual deployed counts) - Trial data sources (gh_plot_reports + agripro_trials) entirely unmentioned - Tool list listed `corpus_status` (doesn't exist) and missed both `lookup_variety` and `search_trials` - Build-phase table showed everything as "pending" / "next" but Phases 1-8 + 11 all shipped Rewrite to reflect the deployed state: - Corpus inventory: 760 variety records + 4,313 trial documents = 5,073 chunks across 6 sources - All 6 MCP tools documented with their purpose - Eval baseline table (hybrid+rerank wins 100%, P@1 90%, MRR 0.905) with the surprising findings (dense alone is noise; hybrid w/o rerank is WORSE than BM25 alone) - Deploy mechanics: Watchtower chain, 4-GPU embedder pool, shared llama-rerank sidecar with the network-attach gotcha - Status table: ✅ on the phases that shipped, deferred work list (becks_pfr, 2023 plot backfill, NK trials, Channel Seed brand) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
196 lines
10 KiB
Markdown
196 lines
10 KiB
Markdown
# seed-mcp
|
||
|
||
MCP server over the public catalogs of major US row-crop seed
|
||
vendors — **variety identity** (what each hybrid IS) plus **yield-trial data** (how they actually perform in real cooperator fields). Sibling project to
|
||
[`crop-chem-docs`](https://git.jpaul.io/justin/crop-chem-docs)
|
||
(pesticide labels), feeding the same Drawbar farm-advisor AI.
|
||
|
||
**Deployed 2026-05-25** on trashpanda as a sibling sidecar to
|
||
`chem-mcp`; the Drawbar advisor calls it via the `seed:` prefix.
|
||
|
||
## What's in the corpus
|
||
|
||
**5,073 indexed chunks** across two complementary surfaces:
|
||
|
||
### Variety identity — 760 records
|
||
|
||
| Source | Count | Vendor | Brand |
|
||
|---|---|---|---|
|
||
| `bayer_seeds` | 475 | Bayer | DEKALB (corn) / Asgrow (soy) / WestBred (wheat) |
|
||
| `golden_harvest` | 139 | Syngenta | Golden Harvest (corn / soy) |
|
||
| `nk` | 122 | Syngenta | NK (corn / soy) |
|
||
| `agripro` | 24 | Syngenta | AgriPro (wheat — HRW / HRS / HWS / SWW) |
|
||
|
||
### Yield-trial data — 4,313 documents
|
||
|
||
| Source | Count | Notes |
|
||
|---|---|---|
|
||
| `gh_plot_reports` | 4,299 | Golden Harvest plot reports 2024+2025. **Cross-vendor head-to-head** — DEKALB / NK / GH / Pioneer / Channel all appear in the same trial rankings. The closest thing to independent comparison data the corpus has. |
|
||
| `agripro_trials` | 14 | Regional wheat trial PDF summaries (PNW, Western Plains, Northern Plains, etc.) |
|
||
|
||
### Not in the corpus (documented in `docs_mcp/lessons.md`)
|
||
|
||
- **Pioneer / Corteva** — ToS bans automation. Curated fallback lesson points the farmer at pioneer.com / a local dealer.
|
||
- **NK yield-results** — fiddly ASMX/SOAP endpoint, needs a dedicated reverse-engineer session.
|
||
- **Bayer per-variety trial data** — not publicly indexed (DEKALB / Asgrow trial data flows through Channel reps). Partially covered by the GH plot reports' cross-vendor results.
|
||
|
||
## MCP tools (6)
|
||
|
||
| Tool | Purpose |
|
||
|---|---|
|
||
| `search_docs` | Variety IDENTITY — what a hybrid IS (disease ratings, traits, maturity). Hybrid dense+BM25 + cross-encoder rerank + variety-code prefilter. |
|
||
| `search_trials` | Variety PERFORMANCE — head-to-head yield trial results. Filterable by crop, state, year, product. |
|
||
| `get_page` | Full canonical record for one variety + structured ratings header sourced from the sidecar JSON. |
|
||
| `lookup_variety` | Raw sidecar JSON for one variety — **fact-check tool**; call before quoting any specific rating value. |
|
||
| `list_versions` | Discover facets (sources, vendors, brands, crops) currently indexed. |
|
||
| `crop_seed_api_lessons` | Curated knowledge: Pioneer fallback policy, scale-direction differences across vendors, trait glossary, SCN race coverage notes. |
|
||
|
||
`search_docs` defaults to `data_type="variety"`; `search_trials` uses `data_type="trial"` — single Chroma collection, metadata-filtered.
|
||
|
||
## Retrieval — eval-validated
|
||
|
||
From `eval/results/baseline.md` (21 golden queries, k=5):
|
||
|
||
| Retriever | Pass | Recall | P@1 | MRR | Avg ms |
|
||
|---|---|---|---|---|---|
|
||
| **hybrid+rerank** | **21/21** | **100%** | **90%** | **0.905** | 2064 |
|
||
| bm25 | 20/21 | 95% | 81% | 0.833 | 5 |
|
||
| hybrid (no rerank) | 15/21 | 71% | 62% | 0.619 | 73 |
|
||
| dense | 14/21 | 67% | 38% | 0.440 | 79 |
|
||
|
||
**Deploy config**: `HYBRID_SEARCH=true` + `RERANK_URL=http://llama-rerank:8080`.
|
||
|
||
Some surprises worth knowing:
|
||
|
||
1. **Dense embedding alone is the weakest config**. Variety codes (DKC62-08RIB), gene names (Rps3a), and trait codes (XF) have no semantic neighbors — nomic-embed-text returns noise on them.
|
||
2. **Hybrid alone is WORSE than BM25 alone.** RRF dilutes BM25's strong ranking with dense noise. Don't ship without rerank.
|
||
3. **BM25-alone (95% recall, 5 ms) is an excellent fallback** when the rerank sidecar is unavailable. The variety-code prefilter in `search_docs` does heavy lifting.
|
||
4. **Anti-hallucination queries pass on every retriever** — Pioneer fallback + not-in-corpus product checks hold across all configs.
|
||
|
||
## Quick start
|
||
|
||
```bash
|
||
git clone https://git.jpaul.io/justin/seed-mcp.git
|
||
cd seed-mcp
|
||
python -m venv venv && source venv/bin/activate
|
||
pip install -r requirements.txt
|
||
|
||
# Sample-scrape just to verify wiring:
|
||
python -m scrape.runner --source bayer_seeds --limit 3
|
||
|
||
# Full refresh (all 6 sources; expect ~25 min for gh_plot_reports
|
||
# with 4 concurrent workers):
|
||
python -m scrape.runner --all --force
|
||
|
||
# Rebuild Chroma + BM25 from the corpus:
|
||
OLLAMA_URL=http://192.168.0.125:11434 PRODUCT_NAME=crop_seed \
|
||
python -m rag.index --rebuild
|
||
|
||
# Run the eval harness:
|
||
RERANK_URL=http://localhost:18080 python -m eval.run_eval \
|
||
--queries eval/queries.jsonl --k 5 \
|
||
--output eval/results/baseline.md
|
||
|
||
# Local MCP server (stdio for Claude Desktop dev):
|
||
PRODUCT_NAME=crop_seed python -m docs_mcp.server --transport stdio
|
||
|
||
# Local HTTP server (matches production transport):
|
||
PRODUCT_NAME=crop_seed python -m docs_mcp.server \
|
||
--transport streamable-http --port 8000
|
||
```
|
||
|
||
## Repo layout
|
||
|
||
```
|
||
.
|
||
├── CLAUDE.md # Canonical agent guide. Read first.
|
||
├── PLAN.md # Template's 13-phase build guide.
|
||
├── README.md
|
||
├── requirements.txt
|
||
├── Dockerfile
|
||
├── sources.json # Source catalog (one entry per scraper)
|
||
├── deploy/docker-compose.yml # Drop-in compose snippet for Drawbar
|
||
├── .gitea/workflows/
|
||
│ ├── refresh.yml # Monthly cron: scrape + index + image push
|
||
│ └── image-only.yml # On-demand code-only ship cycle
|
||
├── scrape/
|
||
│ ├── runner.py # `python -m scrape.runner --source <id>`
|
||
│ ├── changelog.py # Reused from template
|
||
│ └── sources/
|
||
│ ├── bayer_seeds.py # ~475 varieties across 3 brands
|
||
│ ├── golden_harvest.py # ~139 varieties (post-discontinued filter)
|
||
│ ├── nk.py # 122 varieties (corn + soy)
|
||
│ ├── agripro.py # 24 wheat varieties
|
||
│ ├── gh_plot_reports.py # 4,299 cross-vendor yield trials
|
||
│ ├── agripro_trials.py # 14 regional trial PDFs
|
||
│ └── becks_pfr.py # stub — Sanity GROQ research corpus
|
||
├── rag/
|
||
│ ├── embeddings.py # nomic-embed-text via Ollama
|
||
│ ├── chunk.py # one-chunk-per-variety + trial chunker
|
||
│ ├── index.py # Chroma + BM25 builder
|
||
│ └── bm25.py # FTS5 lexical index w/ seed-domain facets
|
||
├── docs_mcp/
|
||
│ ├── server.py # FastMCP — 6 tools, hybrid+rerank
|
||
│ ├── lessons.md # Curated knowledge layer (Pioneer fallback)
|
||
│ └── usage.py # TimedCall + JSONL telemetry
|
||
├── eval/
|
||
│ ├── queries.jsonl # 21 golden queries
|
||
│ ├── retrievers.py # dense / bm25 / hybrid / hybrid+rerank
|
||
│ ├── run_eval.py # MRR / Recall@k / Precision@1
|
||
│ └── results/baseline.md # Current deploy-config eval numbers
|
||
└── corpus/ # Committed scrape output (CI-refreshed)
|
||
├── bayer_seeds/
|
||
├── golden_harvest/
|
||
├── nk/
|
||
├── agripro/
|
||
├── gh_plot_reports/
|
||
└── agripro_trials/
|
||
```
|
||
|
||
## Infrastructure
|
||
|
||
- **Registry**: pushes to `192.168.0.2:1234` (LAN, no CF body cap); deploys pull `git.jpaul.io/justin/seed-mcp:latest` (public, CF tunnel). Also tagged `:<sha12>` for rollback pinning and `:corpus-YYYY.MM.DD` for snapshot pinning.
|
||
- **Embedder pool (CI)**: 3 GPU-pinned Ollama endpoints, weighted toward `.0.125` (RTX 40-series, 242 embeds/sec):
|
||
- `.0.125:11434` ×4 (4090)
|
||
- `.0.2:11436` ×2 (GPU-pinned)
|
||
- `.0.2:11435` ×1 (GPU-pinned)
|
||
- Do NOT use `.0.2:11434` (not GPU-pinned) or `localhost:11434` (works in dev, breaks in CI — runner container has no Ollama on its loopback).
|
||
- **Reranker**: shared `llama-rerank` sidecar on trashpanda's Tesla P4 (jina-reranker-v2-base via llama.cpp). One container serves both seed-mcp and crop-chem-docs. **Must be on `drawbar-backend_default` Docker network** — see `deploy/docker-compose.yml` for the network-attach gotcha that caused silent rerank degradation on chem-mcp prior to 2026-05-25.
|
||
- **PRODUCT_NAME**: `crop_seed` — used in the Chroma collection name (`crop_seed_docs`), the BM25 db filename (`bm25/crop_seed_docs.db`), and the `crop_seed_api_lessons` tool name. Not `seed_mcp` — that would conflict with the container/service name.
|
||
|
||
## Deploy mechanics
|
||
|
||
Watchtower handles auto-deploy. Every push to `seed-mcp/main` that touches `docs_mcp/`, `rag/`, `scrape/`, `requirements.txt`, `Dockerfile`, or `sources.json` triggers `image-only.yml`:
|
||
|
||
1. Checks out main with full corpus
|
||
2. Rebuilds Chroma + BM25 (~3 min on the GPU pool)
|
||
3. `docker build` + push three tags to the LAN registry
|
||
4. Links the package to the repo via Gitea API
|
||
5. Watchtower on trashpanda polls `:latest` every 5 min → pulls + recreates `drawbar-backend-seed-mcp-1`
|
||
|
||
Corpus refresh runs monthly via `refresh.yml` (1st of each month, 06:00 UTC) — re-scrapes all GREEN sources, commits any corpus diff, rebuilds indexes, ships a new image with `:corpus-YYYY.MM.DD` tagged.
|
||
|
||
See `CLAUDE.md` for canonical sidecar schemas, the reversed disease-scale gotcha (NK + AgriPro publish 1=best, vs Bayer/GH 9=best), and the scraper conventions.
|
||
|
||
## Status
|
||
|
||
| Phase | Status |
|
||
|---|---|
|
||
| 0 — scaffold | ✅ |
|
||
| 1 — scrapers (bayer_seeds / golden_harvest / nk / agripro / gh_plot_reports / agripro_trials) | ✅ |
|
||
| 2 — chunk + index | ✅ |
|
||
| 3 — MCP tools (6) | ✅ |
|
||
| 4-5 — Dockerfile + Gitea CI | ✅ |
|
||
| 6 — reranker integration | ✅ (eval-validated; deploy uses hybrid+rerank) |
|
||
| 7 — eval harness | ✅ (21 golden queries, baseline committed) |
|
||
| 8 — hybrid search | ✅ (default ON) |
|
||
| 11 — `crop_seed_api_lessons` curated layer | ✅ (Pioneer fallback + 7 other lessons) |
|
||
| 13 — weekly_digest | not planned for seed-mcp |
|
||
|
||
Remaining work (deferred, not blocking):
|
||
|
||
- `becks_pfr` scraper (2,089 research docs via public Sanity GROQ)
|
||
- 2023 GH plot reports backfill (~3,619 more docs)
|
||
- NK yield-results endpoint reverse-engineer
|
||
- Channel Seed brand (~320 more Bayer varieties — separate brand under the same sitemap)
|