Files
crop-chem-docs/deploy/drawbar-compose-snippet.md
T
justin 8766d73327 deploy: Drawbar compose snippet — first image is published
Image pushed to git.jpaul.io/justin/crop-chem-docs with three tags:
  :latest             — Watchtower auto-pull target
  :a97107de4636       — commit-sha rollback pin
  :corpus-2026.05.24  — corpus-snapshot pin (prod-recommended)

Drawbar compose snippet at deploy/drawbar-compose-snippet.md.
Wires the container against the existing infra:
  - Ollama pool: 192.168.0.2:11434, 192.168.0.2:11435,
                 192.168.0.125:11434, 10.10.1.65:11434
  - Reranker:    http://10.10.1.65:8082
  - HYBRID_SEARCH=true (production retrieval — BM25 + dense + rerank)
  - Exposes streamable-HTTP MCP on port 8000

Pull path uses git.jpaul.io (public hostname, CF-fronted; pull
response bodies aren't capped). Push path uses 192.168.0.2:1234
(LAN endpoint, bypasses CF 100MB body cap). Same registry,
different URLs — per the template gotcha doc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 12:48:24 -04:00

3.8 KiB

Drawbar deploy — crop-chem-docs MCP server snippet

Drop this into Drawbar's docker-compose.yml. Targets the existing trashpanda infra: Ollama pool on the LAN, llama-rerank container on Tesla P4, Cloudflare Tunnel out front.

Pre-reqs (one-time on the deploy host)

  1. Login to the Gitea registry so the host can pull:
    docker login git.jpaul.io -u justin   # PAT for password
    
  2. Ollama embed pool reachable from this host (already up):
    • 192.168.0.2:11434, 192.168.0.2:11435 (Gitea-host GPUs)
    • 192.168.0.125:11434 (Windows GPU)
  3. Reranker reachable (already up on trashpanda):
    • http://10.10.1.65:8082

Compose service

services:
  crop-chem-docs:
    image: git.jpaul.io/justin/crop-chem-docs:latest
    # Or pin to an immutable tag for prod:
    # image: git.jpaul.io/justin/crop-chem-docs:corpus-2026.05.24
    container_name: crop-chem-docs
    restart: unless-stopped
    ports:
      - "8001:8000"   # MCP server (streamable-http). Adjust host port.
    environment:
      # Embedder pool. Round-robined for parallel search.
      OLLAMA_URL: "http://192.168.0.2:11434,http://192.168.0.2:11435,http://192.168.0.125:11434,http://10.10.1.65:11434"
      # Reranker on trashpanda's Tesla P4.
      RERANK_URL: "http://10.10.1.65:8082"
      # Production retrieval: BM25 + dense fused, then reranked.
      HYBRID_SEARCH: "true"
      # Override docs URL shown to the LLM if needed (default is EPA PPLS portal).
      # PRODUCT_DOCS_URL: "https://..."
    labels:
      # Watchtower auto-pulls :latest on update.
      com.centurylinklabs.watchtower.enable: "true"

  # Optional: if you want Watchtower to drive auto-updates of this
  # container too, you already run watchtower elsewhere — just make
  # sure this container has the label above set true.

Test from the host

# Tool inventory (uses MCP's HTTP transport — adjust if you have a
# different MCP client probe handy):
curl -s http://localhost:8001/sse  # or whichever endpoint your
                                   # client expects from streamable-http

# Or exec into the container and run the stdio transport:
docker exec -it crop-chem-docs \
  python -m docs_mcp.server --transport stdio < /dev/null

What the container exposes

Tool What it does
search_docs Hybrid+rerank pesticide-label search with optional filters
get_page Full label markdown + metadata by (source, source_key)
list_versions Discover sources, product classes, signal words, registrants
corpus_status Counts + freshness; useful for health probes
crop_chem_api_lessons Curated agronomy/label-handling knowledge — call before recommending

Versioning

Tags published by the Gitea Actions workflows:

Tag When Use for
:latest Every monthly refresh + every code push Dev / Watchtower auto-pull
:<sha12> Every build Rollback pin
:corpus-YYYY.MM.DD Every build Pin to a specific corpus snapshot in prod

The :corpus-YYYY.MM.DD tag is the right one for production — guarantees the running container has a known, frozen corpus that matches the labels you've validated against.

Updating the corpus

Two paths:

  1. Wait for the monthly cron — 1st @ 06:00 UTC, full re-scrape of Bayer + EPA PPLS, then reindex, then image push. Watchtower pulls the new :latest automatically.
  2. Trigger manually in Gitea Actions UI → Monthly corpus refreshRun workflow. Optional sources input for single-source refresh (e.g., bayer only).

Switching corpus scope

The row-crop filter (corn/soybeans/wheat) is in scrape/sources/epa_ppls.py as ROW_CROP_KEYWORDS. Edit + push + let the next workflow run pick it up. Same for the registrant allowlist at scrape/sources/epa_registrant_allowlist.json.