deploy: Drawbar compose snippet — first image is published

Image pushed to git.jpaul.io/justin/crop-chem-docs with three tags:
  :latest             — Watchtower auto-pull target
  :a97107de4636       — commit-sha rollback pin
  :corpus-2026.05.24  — corpus-snapshot pin (prod-recommended)

Drawbar compose snippet at deploy/drawbar-compose-snippet.md.
Wires the container against the existing infra:
  - Ollama pool: 192.168.0.2:11434, 192.168.0.2:11435,
                 192.168.0.125:11434, 10.10.1.65:11434
  - Reranker:    http://10.10.1.65:8082
  - HYBRID_SEARCH=true (production retrieval — BM25 + dense + rerank)
  - Exposes streamable-HTTP MCP on port 8000

Pull path uses git.jpaul.io (public hostname, CF-fronted; pull
response bodies aren't capped). Push path uses 192.168.0.2:1234
(LAN endpoint, bypasses CF 100MB body cap). Same registry,
different URLs — per the template gotcha doc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-24 12:48:24 -04:00
parent 420b4fa2d8
commit 8766d73327
+102
View File
@@ -0,0 +1,102 @@
# Drawbar deploy — `crop-chem-docs` MCP server snippet
Drop this into Drawbar's `docker-compose.yml`. Targets the existing
trashpanda infra: Ollama pool on the LAN, `llama-rerank` container
on Tesla P4, Cloudflare Tunnel out front.
## Pre-reqs (one-time on the deploy host)
1. **Login to the Gitea registry** so the host can pull:
```bash
docker login git.jpaul.io -u justin # PAT for password
```
2. **Ollama embed pool** reachable from this host (already up):
- `192.168.0.2:11434`, `192.168.0.2:11435` (Gitea-host GPUs)
- `192.168.0.125:11434` (Windows GPU)
3. **Reranker** reachable (already up on trashpanda):
- `http://10.10.1.65:8082`
## Compose service
```yaml
services:
crop-chem-docs:
image: git.jpaul.io/justin/crop-chem-docs:latest
# Or pin to an immutable tag for prod:
# image: git.jpaul.io/justin/crop-chem-docs:corpus-2026.05.24
container_name: crop-chem-docs
restart: unless-stopped
ports:
- "8001:8000" # MCP server (streamable-http). Adjust host port.
environment:
# Embedder pool. Round-robined for parallel search.
OLLAMA_URL: "http://192.168.0.2:11434,http://192.168.0.2:11435,http://192.168.0.125:11434,http://10.10.1.65:11434"
# Reranker on trashpanda's Tesla P4.
RERANK_URL: "http://10.10.1.65:8082"
# Production retrieval: BM25 + dense fused, then reranked.
HYBRID_SEARCH: "true"
# Override docs URL shown to the LLM if needed (default is EPA PPLS portal).
# PRODUCT_DOCS_URL: "https://..."
labels:
# Watchtower auto-pulls :latest on update.
com.centurylinklabs.watchtower.enable: "true"
# Optional: if you want Watchtower to drive auto-updates of this
# container too, you already run watchtower elsewhere — just make
# sure this container has the label above set true.
```
## Test from the host
```bash
# Tool inventory (uses MCP's HTTP transport — adjust if you have a
# different MCP client probe handy):
curl -s http://localhost:8001/sse # or whichever endpoint your
# client expects from streamable-http
# Or exec into the container and run the stdio transport:
docker exec -it crop-chem-docs \
python -m docs_mcp.server --transport stdio < /dev/null
```
## What the container exposes
| Tool | What it does |
|---|---|
| `search_docs` | Hybrid+rerank pesticide-label search with optional filters |
| `get_page` | Full label markdown + metadata by `(source, source_key)` |
| `list_versions` | Discover sources, product classes, signal words, registrants |
| `corpus_status` | Counts + freshness; useful for health probes |
| `crop_chem_api_lessons` | Curated agronomy/label-handling knowledge — call before recommending |
## Versioning
Tags published by the Gitea Actions workflows:
| Tag | When | Use for |
|---|---|---|
| `:latest` | Every monthly refresh + every code push | Dev / Watchtower auto-pull |
| `:<sha12>` | Every build | Rollback pin |
| `:corpus-YYYY.MM.DD` | Every build | Pin to a specific corpus snapshot in prod |
The `:corpus-YYYY.MM.DD` tag is the right one for production —
guarantees the running container has a known, frozen corpus that
matches the labels you've validated against.
## Updating the corpus
Two paths:
1. **Wait for the monthly cron** — 1st @ 06:00 UTC, full re-scrape
of Bayer + EPA PPLS, then reindex, then image push. Watchtower
pulls the new `:latest` automatically.
2. **Trigger manually** in Gitea Actions UI → `Monthly corpus
refresh` → `Run workflow`. Optional `sources` input for
single-source refresh (e.g., `bayer` only).
## Switching corpus scope
The row-crop filter (corn/soybeans/wheat) is in
`scrape/sources/epa_ppls.py` as `ROW_CROP_KEYWORDS`. Edit + push +
let the next workflow run pick it up. Same for the registrant
allowlist at `scrape/sources/epa_registrant_allowlist.json`.