diff --git a/deploy/drawbar-compose-snippet.md b/deploy/drawbar-compose-snippet.md new file mode 100644 index 0000000..ace4f9c --- /dev/null +++ b/deploy/drawbar-compose-snippet.md @@ -0,0 +1,102 @@ +# Drawbar deploy — `crop-chem-docs` MCP server snippet + +Drop this into Drawbar's `docker-compose.yml`. Targets the existing +trashpanda infra: Ollama pool on the LAN, `llama-rerank` container +on Tesla P4, Cloudflare Tunnel out front. + +## Pre-reqs (one-time on the deploy host) + +1. **Login to the Gitea registry** so the host can pull: + ```bash + docker login git.jpaul.io -u justin # PAT for password + ``` +2. **Ollama embed pool** reachable from this host (already up): + - `192.168.0.2:11434`, `192.168.0.2:11435` (Gitea-host GPUs) + - `192.168.0.125:11434` (Windows GPU) +3. **Reranker** reachable (already up on trashpanda): + - `http://10.10.1.65:8082` + +## Compose service + +```yaml +services: + crop-chem-docs: + image: git.jpaul.io/justin/crop-chem-docs:latest + # Or pin to an immutable tag for prod: + # image: git.jpaul.io/justin/crop-chem-docs:corpus-2026.05.24 + container_name: crop-chem-docs + restart: unless-stopped + ports: + - "8001:8000" # MCP server (streamable-http). Adjust host port. + environment: + # Embedder pool. Round-robined for parallel search. + OLLAMA_URL: "http://192.168.0.2:11434,http://192.168.0.2:11435,http://192.168.0.125:11434,http://10.10.1.65:11434" + # Reranker on trashpanda's Tesla P4. + RERANK_URL: "http://10.10.1.65:8082" + # Production retrieval: BM25 + dense fused, then reranked. + HYBRID_SEARCH: "true" + # Override docs URL shown to the LLM if needed (default is EPA PPLS portal). + # PRODUCT_DOCS_URL: "https://..." + labels: + # Watchtower auto-pulls :latest on update. + com.centurylinklabs.watchtower.enable: "true" + + # Optional: if you want Watchtower to drive auto-updates of this + # container too, you already run watchtower elsewhere — just make + # sure this container has the label above set true. +``` + +## Test from the host + +```bash +# Tool inventory (uses MCP's HTTP transport — adjust if you have a +# different MCP client probe handy): +curl -s http://localhost:8001/sse # or whichever endpoint your + # client expects from streamable-http + +# Or exec into the container and run the stdio transport: +docker exec -it crop-chem-docs \ + python -m docs_mcp.server --transport stdio < /dev/null +``` + +## What the container exposes + +| Tool | What it does | +|---|---| +| `search_docs` | Hybrid+rerank pesticide-label search with optional filters | +| `get_page` | Full label markdown + metadata by `(source, source_key)` | +| `list_versions` | Discover sources, product classes, signal words, registrants | +| `corpus_status` | Counts + freshness; useful for health probes | +| `crop_chem_api_lessons` | Curated agronomy/label-handling knowledge — call before recommending | + +## Versioning + +Tags published by the Gitea Actions workflows: + +| Tag | When | Use for | +|---|---|---| +| `:latest` | Every monthly refresh + every code push | Dev / Watchtower auto-pull | +| `:` | Every build | Rollback pin | +| `:corpus-YYYY.MM.DD` | Every build | Pin to a specific corpus snapshot in prod | + +The `:corpus-YYYY.MM.DD` tag is the right one for production — +guarantees the running container has a known, frozen corpus that +matches the labels you've validated against. + +## Updating the corpus + +Two paths: + +1. **Wait for the monthly cron** — 1st @ 06:00 UTC, full re-scrape + of Bayer + EPA PPLS, then reindex, then image push. Watchtower + pulls the new `:latest` automatically. +2. **Trigger manually** in Gitea Actions UI → `Monthly corpus + refresh` → `Run workflow`. Optional `sources` input for + single-source refresh (e.g., `bayer` only). + +## Switching corpus scope + +The row-crop filter (corn/soybeans/wheat) is in +`scrape/sources/epa_ppls.py` as `ROW_CROP_KEYWORDS`. Edit + push + +let the next workflow run pick it up. Same for the registrant +allowlist at `scrape/sources/epa_registrant_allowlist.json`.