Phase 11 + Phase 6 GPU move

## Phase 11 — Curated agronomy / label-handling knowledge layer

docs_mcp/lessons.md: 13 topic-anchored markdown sections covering
the LLM-side context a farmer-advisor needs alongside the raw
label corpus —
  - how-to-use-this-corpus
  - epa-signal-words
  - rei-phi-fundamentals
  - rup-handling
  - supplemental-labels-24c-2ee
  - tank-mix-fundamentals
  - resistance-management-hrac-frac-irac
  - glufosinate-application-rules
  - dicamba-application-rules
  - lake-erie-watershed-ohio
  - scn-and-other-seed-treatment-context
  - drift-management-essentials
  - how-to-format-recommendations

Each Topic block is independently retrievable via the new MCP tool:

  ppls_api_lessons(topic="rup-handling")

Or with no topic to get the full TOC, or with a substring to
match-and-return matching sections ("dicamba" → dicamba-application-rules).

Tool docstring instructs the LLM to call this proactively before any
pesticide recommendation so the recommendation lands with regulatory
framing, resistance-group callouts, RUP applicator language, and the
canonical recommendation format — not just a rate from a label.

## Phase 6 — Reranker moved to GPU on trashpanda

Stopped the local CPU container and started on trashpanda's Tesla P4
(8 GB VRAM) via:

  docker run -d --name llama-rerank --restart unless-stopped --gpus all \
    -p 8082:8080 \
    ghcr.io/ggml-org/llama.cpp:server-cuda \
    -hf gpustack/jina-reranker-v2-base-multilingual-GGUF:Q8_0 \
    --reranking --host 0.0.0.0 --port 8080 -ngl 99

The :server-cuda image variant (not :server) is required for CUDA
backend; -ngl 99 offloads all layers to GPU.

Latency: 50-doc rerank dropped from ~23 s on CPU to ~0.7-1.5 s on
the Tesla P4 — production-grade interactive speeds.

deploy/rerank-docker.md updated with the trashpanda deploy recipe,
troubleshooting (mostly "did you use server-cuda?"), and a perf
reference table. The MCP server's RERANK_URL just points at
http://10.10.1.65:8082 now.

GPU eval still completing in background; results land in
eval/results/with_rerank_gpu.md as a follow-up commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-24 12:10:09 -04:00
parent 278fe5f456
commit af44d7a102
3 changed files with 410 additions and 27 deletions
+52 -22
View File
@@ -4,29 +4,48 @@ Phase 6 setup. The MCP server reads `RERANK_URL` and, when set, pipes
the top-50 dense (or hybrid) chunks through this sidecar before the top-50 dense (or hybrid) chunks through this sidecar before
returning to the LLM. See `docs_mcp/server.py:_rerank_pool`. returning to the LLM. See `docs_mcp/server.py:_rerank_pool`.
## Run ## Production deploy — trashpanda (Tesla P4, 8 GB VRAM)
This is where the reranker lives. Same box that runs the Drawbar
backend + Cloudflare Tunnel, so the MCP server can reach it on the
internal LAN.
```bash ```bash
docker run -d --name llama-rerank -p 8082:8080 \ ssh justin@10.10.1.65 \
ghcr.io/ggml-org/llama.cpp:server \ 'docker run -d --name llama-rerank --restart unless-stopped --gpus all \
-hf gpustack/jina-reranker-v2-base-multilingual-GGUF:Q8_0 \ -p 8082:8080 \
--reranking --host 0.0.0.0 --port 8080 ghcr.io/ggml-org/llama.cpp:server-cuda \
-hf gpustack/jina-reranker-v2-base-multilingual-GGUF:Q8_0 \
--reranking --host 0.0.0.0 --port 8080 -ngl 99'
``` ```
The image auto-downloads the GGUF on first start (~280 MB, one-time). Key flags:
First request loads the model into memory (~1s on CPU). - `--gpus all` — pass through the Tesla P4
- `server-cuda` image — CUDA-built llama.cpp (not the CPU-only `:server`)
- `-ngl 99` — offload all layers to GPU
- `-hf <repo>` — auto-download from HuggingFace on first start (~280 MB,
cached in the container volume)
- `--reranking` — enables `/v1/rerank` endpoint
- `--restart unless-stopped` — survives reboot
VRAM usage: ~280 MB model + CUDA context. Well under the 8 GB the
Tesla P4 has, leaves room for nomic-embed-text (~560 MB) if you
later co-host it.
## Configure the MCP server ## Configure the MCP server
```bash ```bash
export RERANK_URL=http://localhost:8082 export RERANK_URL=http://10.10.1.65:8082
# search_docs will now rerank automatically # search_docs now reranks the hybrid pool through the GPU before returning
``` ```
In production (the MetaMCP-fronted Drawbar deploy), this is baked
into the MCP server's container env.
## Verify ## Verify
```bash ```bash
curl http://localhost:8082/v1/rerank -H 'Content-Type: application/json' -d '{ curl http://10.10.1.65:8082/v1/rerank -H 'Content-Type: application/json' -d '{
"query": "soybean herbicide for waterhemp", "query": "soybean herbicide for waterhemp",
"documents": [ "documents": [
"Roundup Custom for fallow burndown", "Roundup Custom for fallow burndown",
@@ -36,17 +55,28 @@ curl http://localhost:8082/v1/rerank -H 'Content-Type: application/json' -d '{
``` ```
Expect index=1 (the Sencor doc) at score ~0.8, index=0 at a strongly Expect index=1 (the Sencor doc) at score ~0.8, index=0 at a strongly
negative score. negative score, in under 1 s.
## Performance notes ## Performance reference
- **CPU-only is slow.** ~0.5s per (query, doc) pair → ~23s for a | Mode | Pool | Wall time |
50-doc pool. Fine for batch eval; painful for interactive queries. |---|---|---|
- For production, run on GPU: add `--gpus all` to docker, llama.cpp | CPU (local 28-thread Xeon) | 50 docs | ~23 s |
uses the CUDA backend automatically. Expect ~10-20× speedup. | GPU (Tesla P4 on trashpanda) | 50 docs | ~0.7-1.5 s |
- Alternative: drop `RERANK_POOL` from 50 to ~20 in the server env. | GPU (Tesla P4) | 20 docs | ~0.4 s |
Cuts latency 2.5× at the cost of some quality (rerank gets fewer
candidates to choose from). The Tesla P4 is Pascal-era (8.1 TFLOPs FP32) so a modern Ampere or
- For very small batches the reranker can also run alongside Ada Lovelace GPU would be ~3-5× faster, but for the row-crop label
Ollama on the same GPU box — `jina-reranker-v2-base` is ~280 MB corpus query rate the P4 is plenty.
and won't conflict with `nomic-embed-text` (~560 MB VRAM each).
## Troubleshooting
- **Model not on GPU?** Check `docker logs llama-rerank | grep CUDA`
you should see `CUDA0 : Tesla P4 (8109 MiB, ... free)` and tensor
load lines. If you see CPU-only init, you forgot `--gpus all` or
used `:server` instead of `:server-cuda`.
- **Conflict with Ollama on the same GPU?** No — both processes can
share the GPU, CUDA handles VRAM partitioning. nomic-embed-text +
jina-reranker-v2-base together use ~840 MB on the 8 GB card.
- **First rerank call is slow (~4 s)?** Warm-up. Subsequent calls are
~0.7 s for 50 docs.
+262
View File
@@ -0,0 +1,262 @@
# PPLS API Lessons
Curated agronomy + label-handling knowledge that an LLM should know
*before* giving recommendations from the labels corpus. Surfaced by
the `ppls_api_lessons` MCP tool.
Each top-level `## Topic: <slug>` block is independently retrievable.
The tool docstring tells the LLM to call this proactively before
answering any pesticide recommendation question.
---
## Topic: how-to-use-this-corpus
The PPLS docs corpus is the source of truth for *what's on the label*.
You should:
1. **Run `search_docs` first** with the user's natural-language
question. Hybrid+rerank mode (default in production) returns the
most relevant label chunks across Bayer + every major US ag-chem
registrant via EPA PPLS.
2. **Cite the EPA Reg No** next to any product recommendation. Format:
`PRODUCT NAME (EPA Reg X-Y)`. Drop this and the recommendation is
ungrounded.
3. **Link the label PDF URL** so the user can verify and the spray
operator can have the actual label on hand. The sidecar's
`label.url` is in the search result metadata.
4. **Quote — don't paraphrase — rate ranges**. Labels say "16 to 32
fl oz/A"; *do not* tighten that to "use 24 fl oz/A" unless the
label gives a specific use case at that rate.
5. **If you can't find a label-grounded answer**, say so. Better to
return "no label in corpus matches this; consult the manufacturer
or your CCA" than to hallucinate a rate.
The corpus is **scoped to US row crops: corn, soybeans, wheat**.
Outside that scope, results are sparse or empty.
## Topic: epa-signal-words
Every EPA-registered pesticide label has a signal word in the upper
front panel. It maps to acute toxicity:
| Signal word | Toxicity | Typical examples |
|---|---|---|
| **DANGER** + "POISON" + skull-and-crossbones | Cat I, highly toxic | Paraquat (Gramoxone), some methyl bromide |
| **DANGER** (no POISON) | Cat I (skin/eye irritation only) | Some restricted-use ester formulations |
| **WARNING** | Cat II | Many fomesafen formulations, some 2,4-D esters |
| **CAUTION** | Cat III/IV, least toxic | Most modern soybean/corn herbicides — glyphosate, mesotrione, fomesafen amine salts |
| **(none)** | Cat IV | A few biopesticides + some adjuvants |
When recommending a DANGER-labeled product, *always* call out PPE
requirements (typically chemical-resistant gloves, footwear, eyewear,
respirator depending on activity).
## Topic: rei-phi-fundamentals
Two distinct intervals — don't confuse them:
- **REI** (Restricted Entry Interval): minimum time AFTER application
before a worker may enter the treated area *without* the label's
full PPE. Typical values: 4, 12, 24, 48 hours.
- **PHI** (Pre-Harvest Interval): minimum time BETWEEN last application
and crop harvest. Typical values: 7, 14, 21, 30, 60, 90 days
depending on chemistry + crop.
Always state both when relevant to the workflow. For burndown
applications, PHI rarely matters; for in-crop foliar, it's critical.
## Topic: rup-handling
Restricted Use Pesticide (RUP) is a *federal* designation that means:
**the product can only be purchased, possessed, and applied by (or
under direct supervision of) a certified pesticide applicator.**
Row-crop products you'll commonly see in RUP class:
- **Paraquat-based** (Gramoxone Inteon, Helmquat, Firestorm) — RUP + special closed-system training required since 2019
- **Dicamba formulations approved for in-crop soybean/cotton** (XtendiMax, Engenia, Tavium) — RUP + applicator training every year
- **Some pyrethroids** (Warrior II, Mustang Maxx) — RUP in some states
When recommending an RUP, *always* say:
> "This is a Restricted Use Pesticide — application requires a
> certified applicator and proper recordkeeping per state regs."
Never give a "casual" application recommendation for an RUP. The
recommendation must include the applicator-certification framing.
## Topic: supplemental-labels-24c-2ee
Beyond the main federal (§3) label, products often have:
- **2(ee) recommendations**: manufacturer-issued, label-compliant
*additional uses* that don't require formal re-registration.
These add new tank-mixes, crops, or pests within the existing
label's authority. You can recommend a 2(ee) — but tell the user
the 2(ee) document itself must be in their possession at spray time.
- **24(c) Special Local Need (SLN)**: state-specific labels approved
by the state lead agency for a problem peculiar to that state.
Same possession-at-spray rule. SLNs are common for cotton in TX
and rice in southern states; less common in OH row crops.
The Bayer scraper captures these as `supplemental_documents` in each
label's sidecar (`kind: "2EE"` or `"24C"`). For EPA PPLS labels, the
main label is what's in the corpus.
## Topic: tank-mix-fundamentals
When recommending tank mixes:
1. **The more restrictive label wins.** If product A allows 2 qt/A
max in-crop and product B caps tank-mix partners at 1 qt/A for
that crop, the cap is 1 qt/A.
2. **Antagonism is real.** A few well-known antagonisms:
- Glufosinate + grass herbicides (clethodim, sethoxydim) → reduced
grass control. Apply grass herbicides separately, 7 days apart.
- Atrazine + dicamba + Group 15 (e.g., S-metolachlor) all-at-once
can hammer corn under cold/wet conditions.
- 2,4-D ester + glufosinate → can reduce glufosinate activity.
3. **Adjuvant compatibility:**
- Glufosinate (Liberty) REQUIRES AMS @ 1.5-3 lb/A. No exceptions.
- Glyphosate works best with NIS in soft water, or with AMS
conditioner in hard water (Mg/Ca > 200 ppm).
- PPO herbicides (lactofen, fomesafen) often want COC, not NIS.
4. **Always check both labels' "Tank-Mix Compatibility" or
"Restrictions" sections** before recommending — the corpus has
these sections; pull them with `search_docs`.
## Topic: resistance-management-hrac-frac-irac
Herbicide resistance is the single biggest threat to row-crop weed
control in the US Midwest. Always communicate resistance group when
recommending:
- **HRAC** (Herbicide Resistance Action Committee) groups (formerly
WSSA numbers). Use the *number* not just the name — farmers
recognize "Group 14" faster than "PPO inhibitor".
- **FRAC** for fungicides.
- **IRAC** for insecticides.
Key Midwest resistance hotspots:
- **Waterhemp + Palmer amaranth**: resistant to Groups 2, 5, 9, 14,
15, 27 in places. Means glyphosate, ALS, atrazine, fomesafen,
metolachlor, and HPPDs (mesotrione) all have spotty efficacy.
→ Always mix MOAs; never spray a single Group twice in a season.
- **Marestail/horseweed**: glyphosate-resistant nationwide; 2,4-D
remains the burndown anchor + Sharpen (saflufenacil, Group 14).
- **Giant ragweed**: glyphosate + ALS resistant in many areas.
When the user asks for a recommendation, *say* the group number
(e.g., "Sencor (metribuzin, Group 5)") so they can rotate.
## Topic: glufosinate-application-rules
Glufosinate (Liberty 280 SL, Cheetah Max generic, etc.) is unique:
- **Photosynthesis-dependent**: needs bright sun within ~4 hours of
application. Cloudy days = poor control.
- **Needs warmth**: ideally daytime temp > 60°F at application.
- **AMS is mandatory** at 1.5-3 lb/A.
- **Coverage trumps droplet size**: use flat-fan or AIXR nozzles, 15-20
GPA carrier, medium droplets. Don't go ultra-coarse to reduce drift.
- **Two-pass strategy** for heavy weed pressure (V2 + V4-V5 in
soybean) outperforms a single higher-rate pass.
- **Weed-size critical**: best on weeds ≤ 4". After 6" efficacy drops.
## Topic: dicamba-application-rules
Dicamba in-crop in soybean/cotton (XtendiMax, Engenia, Tavium) is
under intense EPA scrutiny. Current label rules (verify against the
specific label in corpus before recommending):
- **RUP + annual applicator training** required.
- **State and date cutoffs**: most states have application date
cutoffs (e.g., June 30 in OH for soybean; varies by state). Check
the state-specific 24(c) label.
- **Wind**: 3-10 mph at boom height. No spraying during temperature
inversions (typically pre-sunrise + late evening).
- **Buffers**: downwind buffer to sensitive areas (typically 110-220
ft; depends on state + downwind sensitivity).
- **Approved nozzles only**: TTI or AIXR with very-coarse-to-ultra-
coarse droplets. Manufacturer publishes approved nozzle lists.
- **Tank cleanout**: triple-rinse with ammonia-based cleaner after
every load. Dicamba contamination of subsequent loads is the #1
off-target damage cause.
If the label in the corpus is older than the current EPA decision,
*say so* and direct the user to the latest manufacturer label —
EPA has revised dicamba registrations multiple times.
## Topic: lake-erie-watershed-ohio
Ohio's H2Ohio program + the Western Lake Erie Basin (WLEB) impose
additional considerations for nutrient/pesticide runoff:
- **Atrazine**: WLEB subwatersheds have voluntary reduction targets;
formal label restrictions in some HUC-12 watersheds. Atrazine over
0.75 lb/A on highly-erodible land may require soil conservation
practices (cover crops, buffer strips).
- **Dicamba**: see Topic: dicamba-application-rules. OH cutoff has
historically been June 30 for in-crop soybean.
- **2,4-D + 2,4-DB**: drift sensitivity in OH given the high mix of
row-crop, specialty-crop (tomato, grape), and homeowner areas.
When recommending to OH farmers, surface H2Ohio cost-share options
if relevant (no-till + cover crops + variable-rate nutrient
management can offset chemistry needs).
## Topic: scn-and-other-seed-treatment-context
Soybean cyst nematode (SCN) is universal in OH/IN/IL/IA. When
recommending a soybean program, *always* check whether the seed
treatment includes nematicide/SCN protection:
- **Abamectin** (Avicta) — original SCN nematicide seed treatment
- **Fluopyram** (ILeVO) — broader nematode + SDS suppression
- **Pydiflumetofen** (Saltro) — newer; nematode + SDS protection
without ILeVO's halo effect on seedling
- **Pasteuria nishizawae** (Clariva) — biological nematicide
This isn't strictly a "pesticide label" topic but it's the right
context for ANY soybean herbicide recommendation: a great herbicide
program on SCN-infested fields without nematicide seed treatment is
leaving yield on the floor.
## Topic: drift-management-essentials
Drift mitigation is increasingly enforced and increasingly important
for off-target damage liability:
- **Wind**: most labels specify 3-10 mph at boom height. Below 3 mph
risks temperature inversion (worst case: cool morning over warm
ground, fine spray hangs and drifts miles).
- **Temperature inversion detection**: smoke test. Smoke that rises
and dissipates = no inversion. Smoke that hangs flat = inversion.
- **Nozzle selection**: AIXR / TTI / TT — air-induction nozzles
produce larger droplets that drift less. Required for dicamba/2,4-D.
- **Boom height**: lower is better for drift. 20 inches over canopy
for AIXR; manufacturer specs for TTI.
- **Buffer to sensitive crops**: tomatoes (esp. for 2,4-D + dicamba),
grapes, organic fields, residential lawns. Always check downwind.
- **Adjuvant choice affects drift**: NIS reduces droplet size; deposition
aids (e.g., InterLock, Strike Zone) increase droplet weight and reduce
drift.
## Topic: how-to-format-recommendations
When the LLM produces a pesticide recommendation, the canonical shape
that makes it actionable for a farmer:
```
**[Product name]** (EPA Reg [X-Y]) — [active ingredient(s)], [Group N]
- **Rate:** [from label, with range]
- **Timing:** [growth stage / DAT]
- **Carrier + adjuvant:** [GPA + adjuvant requirements]
- **REI/PHI:** [from label]
- **Label PDF:** [URL from search result]
- **Notes:** [resistance group, drift considerations, RUP framing if
applicable, tank-mix antagonism warnings]
```
Skip the canonical shape and the recommendation is hard to apply
without the farmer doing their own label hunting. The corpus has
everything needed — surface it cleanly.
+96 -5
View File
@@ -599,16 +599,107 @@ def corpus_status() -> str:
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# Stubs for later phases — keep the signatures in this file so refactors # Phase 11 — Curated agronomy / label-handling knowledge
# don't lose the contracts. Implementations come per phase. # ---------------------------------------------------------------------------
LESSONS_PATH = Path(__file__).resolve().parent / "lessons.md"
_lessons_cache: tuple[str, list[tuple[str, str]]] | None = None # (full, sections)
def _load_lessons() -> tuple[str, list[tuple[str, str]]]:
"""Read lessons.md once, split into (topic_slug, body) sections."""
global _lessons_cache
if _lessons_cache is not None:
return _lessons_cache
if not LESSONS_PATH.exists():
_lessons_cache = ("", [])
return _lessons_cache
full = LESSONS_PATH.read_text(encoding="utf-8")
sections: list[tuple[str, str]] = []
# Split on lines like "## Topic: <slug>" (case-sensitive marker)
parts = re.split(r"^## Topic:\s+(\S+)\s*$", full, flags=re.MULTILINE)
# parts = [preamble, slug1, body1, slug2, body2, ...]
for i in range(1, len(parts), 2):
slug = parts[i].strip()
body = parts[i + 1].strip() if i + 1 < len(parts) else ""
sections.append((slug, body))
_lessons_cache = (full, sections)
return _lessons_cache
@mcp.tool()
def ppls_api_lessons(
topic: Annotated[
str | None,
Field(description="OPTIONAL: topic slug or substring (e.g., "
"'rup-handling', 'dicamba', 'rei'). Omit to get "
"the full table of contents."),
] = None,
) -> str:
"""Surface curated agronomy + label-handling knowledge that supplements
the raw label corpus.
**Call this proactively whenever you're about to give a pesticide
recommendation from search_docs results.** The lessons cover:
EPA signal words, REI/PHI fundamentals, RUP handling, 2(ee)/24(c)
supplemental labels, tank-mix and resistance-management
fundamentals (HRAC/FRAC/IRAC groups), product-specific application
rules (glufosinate, dicamba), Lake Erie watershed considerations
for Ohio, SCN context for soybean, drift management, and the
canonical recommendation format the farmer expects.
Without these lessons, your recommendations risk being technically
correct but missing the regulatory framing, resistance group
callouts, RUP applicator requirements, or off-target-damage
warnings that make them actionable. Call this first; cite specific
lessons in your response.
"""
with TimedCall("ppls_api_lessons", {"topic": topic}) as _call:
full, sections = _load_lessons()
if not sections:
_call.set(sections=0)
return "_(lessons.md not found — Phase 11 knowledge layer not populated)_"
if not topic:
_call.set(sections=len(sections), returned="toc")
toc_lines = [
"# PPLS API lessons — table of contents",
"",
f"Call `ppls_api_lessons(topic='<slug>')` to fetch a specific section.",
"",
]
for slug, body in sections:
# First non-blank, non-list line as the headline summary
summary = ""
for line in body.splitlines():
s = line.strip()
if s and not s.startswith(("-", "*", "|", "```")):
summary = s[:140]
break
toc_lines.append(f"- **`{slug}`** — {summary}")
return "\n".join(toc_lines)
topic_lc = topic.lower()
matched = [(slug, body) for slug, body in sections if topic_lc in slug.lower()]
if not matched:
_call.set(sections=0, returned="no-match")
available = ", ".join(f"`{s}`" for s, _ in sections)
return (f"_(no lesson matched topic `{topic}`. Available: {available})_")
_call.set(sections=len(matched), returned="match")
out: list[str] = []
for slug, body in matched:
out.append(f"## Topic: {slug}\n\n{body}")
return "\n\n---\n\n".join(out)
# ---------------------------------------------------------------------------
# Stubs for later phases
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# @mcp.tool() # Phase 12 # @mcp.tool() # Phase 12
# def find_doc_inconsistencies(scope_query: str, ...) -> str: ... # def find_doc_inconsistencies(scope_query: str, ...) -> str: ...
# @mcp.tool() # Phase 11
# def ppls_label_lessons(topic: str | None = None) -> str: ...
# =========================================================================== # ===========================================================================
# Entry point # Entry point