From 8766d73327f0c1492e02fc710db1722b96cb3485 Mon Sep 17 00:00:00 2001 From: Justin Paul Date: Sun, 24 May 2026 12:48:24 -0400 Subject: [PATCH] =?UTF-8?q?deploy:=20Drawbar=20compose=20snippet=20?= =?UTF-8?q?=E2=80=94=20first=20image=20is=20published?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Image pushed to git.jpaul.io/justin/crop-chem-docs with three tags: :latest — Watchtower auto-pull target :a97107de4636 — commit-sha rollback pin :corpus-2026.05.24 — corpus-snapshot pin (prod-recommended) Drawbar compose snippet at deploy/drawbar-compose-snippet.md. Wires the container against the existing infra: - Ollama pool: 192.168.0.2:11434, 192.168.0.2:11435, 192.168.0.125:11434, 10.10.1.65:11434 - Reranker: http://10.10.1.65:8082 - HYBRID_SEARCH=true (production retrieval — BM25 + dense + rerank) - Exposes streamable-HTTP MCP on port 8000 Pull path uses git.jpaul.io (public hostname, CF-fronted; pull response bodies aren't capped). Push path uses 192.168.0.2:1234 (LAN endpoint, bypasses CF 100MB body cap). Same registry, different URLs — per the template gotcha doc. Co-Authored-By: Claude Opus 4.7 (1M context) --- deploy/drawbar-compose-snippet.md | 102 ++++++++++++++++++++++++++++++ 1 file changed, 102 insertions(+) create mode 100644 deploy/drawbar-compose-snippet.md diff --git a/deploy/drawbar-compose-snippet.md b/deploy/drawbar-compose-snippet.md new file mode 100644 index 0000000..ace4f9c --- /dev/null +++ b/deploy/drawbar-compose-snippet.md @@ -0,0 +1,102 @@ +# Drawbar deploy — `crop-chem-docs` MCP server snippet + +Drop this into Drawbar's `docker-compose.yml`. Targets the existing +trashpanda infra: Ollama pool on the LAN, `llama-rerank` container +on Tesla P4, Cloudflare Tunnel out front. + +## Pre-reqs (one-time on the deploy host) + +1. **Login to the Gitea registry** so the host can pull: + ```bash + docker login git.jpaul.io -u justin # PAT for password + ``` +2. **Ollama embed pool** reachable from this host (already up): + - `192.168.0.2:11434`, `192.168.0.2:11435` (Gitea-host GPUs) + - `192.168.0.125:11434` (Windows GPU) +3. **Reranker** reachable (already up on trashpanda): + - `http://10.10.1.65:8082` + +## Compose service + +```yaml +services: + crop-chem-docs: + image: git.jpaul.io/justin/crop-chem-docs:latest + # Or pin to an immutable tag for prod: + # image: git.jpaul.io/justin/crop-chem-docs:corpus-2026.05.24 + container_name: crop-chem-docs + restart: unless-stopped + ports: + - "8001:8000" # MCP server (streamable-http). Adjust host port. + environment: + # Embedder pool. Round-robined for parallel search. + OLLAMA_URL: "http://192.168.0.2:11434,http://192.168.0.2:11435,http://192.168.0.125:11434,http://10.10.1.65:11434" + # Reranker on trashpanda's Tesla P4. + RERANK_URL: "http://10.10.1.65:8082" + # Production retrieval: BM25 + dense fused, then reranked. + HYBRID_SEARCH: "true" + # Override docs URL shown to the LLM if needed (default is EPA PPLS portal). + # PRODUCT_DOCS_URL: "https://..." + labels: + # Watchtower auto-pulls :latest on update. + com.centurylinklabs.watchtower.enable: "true" + + # Optional: if you want Watchtower to drive auto-updates of this + # container too, you already run watchtower elsewhere — just make + # sure this container has the label above set true. +``` + +## Test from the host + +```bash +# Tool inventory (uses MCP's HTTP transport — adjust if you have a +# different MCP client probe handy): +curl -s http://localhost:8001/sse # or whichever endpoint your + # client expects from streamable-http + +# Or exec into the container and run the stdio transport: +docker exec -it crop-chem-docs \ + python -m docs_mcp.server --transport stdio < /dev/null +``` + +## What the container exposes + +| Tool | What it does | +|---|---| +| `search_docs` | Hybrid+rerank pesticide-label search with optional filters | +| `get_page` | Full label markdown + metadata by `(source, source_key)` | +| `list_versions` | Discover sources, product classes, signal words, registrants | +| `corpus_status` | Counts + freshness; useful for health probes | +| `crop_chem_api_lessons` | Curated agronomy/label-handling knowledge — call before recommending | + +## Versioning + +Tags published by the Gitea Actions workflows: + +| Tag | When | Use for | +|---|---|---| +| `:latest` | Every monthly refresh + every code push | Dev / Watchtower auto-pull | +| `:` | Every build | Rollback pin | +| `:corpus-YYYY.MM.DD` | Every build | Pin to a specific corpus snapshot in prod | + +The `:corpus-YYYY.MM.DD` tag is the right one for production — +guarantees the running container has a known, frozen corpus that +matches the labels you've validated against. + +## Updating the corpus + +Two paths: + +1. **Wait for the monthly cron** — 1st @ 06:00 UTC, full re-scrape + of Bayer + EPA PPLS, then reindex, then image push. Watchtower + pulls the new `:latest` automatically. +2. **Trigger manually** in Gitea Actions UI → `Monthly corpus + refresh` → `Run workflow`. Optional `sources` input for + single-source refresh (e.g., `bayer` only). + +## Switching corpus scope + +The row-crop filter (corn/soybeans/wheat) is in +`scrape/sources/epa_ppls.py` as `ROW_CROP_KEYWORDS`. Edit + push + +let the next workflow run pick it up. Same for the registrant +allowlist at `scrape/sources/epa_registrant_allowlist.json`.