# Drawbar deploy — `crop-chem-docs` MCP server snippet Drop this into Drawbar's `docker-compose.yml`. Targets the existing trashpanda infra: Ollama pool on the LAN, `llama-rerank` container on Tesla P4, Cloudflare Tunnel out front. ## Pre-reqs (one-time on the deploy host) 1. **Login to the Gitea registry** so the host can pull: ```bash docker login git.jpaul.io -u justin # PAT for password ``` 2. **Ollama embed pool** reachable from this host (already up): - `192.168.0.2:11434`, `192.168.0.2:11435` (Gitea-host GPUs) - `192.168.0.125:11434` (Windows GPU) 3. **Reranker** reachable (already up on trashpanda): - `http://10.10.1.65:8082` ## Compose service ```yaml services: crop-chem-docs: image: git.jpaul.io/justin/crop-chem-docs:latest # Or pin to an immutable tag for prod: # image: git.jpaul.io/justin/crop-chem-docs:corpus-2026.05.24 container_name: crop-chem-docs restart: unless-stopped ports: - "8001:8000" # MCP server (streamable-http). Adjust host port. environment: # Embedder pool. Round-robined for parallel search. OLLAMA_URL: "http://192.168.0.2:11434,http://192.168.0.2:11435,http://192.168.0.125:11434,http://10.10.1.65:11434" # Reranker on trashpanda's Tesla P4. RERANK_URL: "http://10.10.1.65:8082" # Production retrieval: BM25 + dense fused, then reranked. HYBRID_SEARCH: "true" # Override docs URL shown to the LLM if needed (default is EPA PPLS portal). # PRODUCT_DOCS_URL: "https://..." labels: # Watchtower auto-pulls :latest on update. com.centurylinklabs.watchtower.enable: "true" # Optional: if you want Watchtower to drive auto-updates of this # container too, you already run watchtower elsewhere — just make # sure this container has the label above set true. ``` ## Test from the host ```bash # Tool inventory (uses MCP's HTTP transport — adjust if you have a # different MCP client probe handy): curl -s http://localhost:8001/sse # or whichever endpoint your # client expects from streamable-http # Or exec into the container and run the stdio transport: docker exec -it crop-chem-docs \ python -m docs_mcp.server --transport stdio < /dev/null ``` ## What the container exposes | Tool | What it does | |---|---| | `search_docs` | Hybrid+rerank pesticide-label search with optional filters | | `get_page` | Full label markdown + metadata by `(source, source_key)` | | `list_versions` | Discover sources, product classes, signal words, registrants | | `corpus_status` | Counts + freshness; useful for health probes | | `crop_chem_api_lessons` | Curated agronomy/label-handling knowledge — call before recommending | ## Versioning Tags published by the Gitea Actions workflows: | Tag | When | Use for | |---|---|---| | `:latest` | Every monthly refresh + every code push | Dev / Watchtower auto-pull | | `:` | Every build | Rollback pin | | `:corpus-YYYY.MM.DD` | Every build | Pin to a specific corpus snapshot in prod | The `:corpus-YYYY.MM.DD` tag is the right one for production — guarantees the running container has a known, frozen corpus that matches the labels you've validated against. ## Updating the corpus Two paths: 1. **Wait for the monthly cron** — 1st @ 06:00 UTC, full re-scrape of Bayer + EPA PPLS, then reindex, then image push. Watchtower pulls the new `:latest` automatically. 2. **Trigger manually** in Gitea Actions UI → `Monthly corpus refresh` → `Run workflow`. Optional `sources` input for single-source refresh (e.g., `bayer` only). ## Switching corpus scope The row-crop filter (corn/soybeans/wheat) is in `scrape/sources/epa_ppls.py` as `ROW_CROP_KEYWORDS`. Edit + push + let the next workflow run pick it up. Same for the registrant allowlist at `scrape/sources/epa_registrant_allowlist.json`.