Files
crop-chem-docs/deploy/drawbar-compose-snippet.md
T
justin c5ed5560fc
Image rebuild (skip scrape) / build (push) Failing after 1h41m9s
deploy: sensible Dockerfile defaults + simplified compose snippet
Dockerfile now sets OLLAMA_URL=http://ollama:11434 and
RERANK_URL=http://llama-rerank:8080 as image defaults, assuming the
MCP container shares a Docker network with services named `ollama`
and `llama-rerank` (typical compose pattern). Drawbar's stack
already runs both — no cross-host IPs to maintain, no off-stack
GPU dependencies. Stays inside the trashpanda compose.

deploy/drawbar-compose-snippet.md simplified: no environment
overrides needed for the common case. Override block shown only
for stacks with non-default service names. Pull tag updated to
:corpus-2026.05.24.

Per the new architecture call:
- MCP doesn't reach out to cross-host Ollama instances (192.168.0.2,
  192.168.0.125 etc.) at serve time — only at index-build time in CI.
- All serve-time dependencies are in the same Docker network as
  the consumer apps.

Code push touches Dockerfile → image-only.yml will rebuild + push.
Future-me note: the image-only.yml needs Ollama reachable from the
Gitea Actions runner for the reindex step; that still uses the LAN
endpoints (workflow env), which is correct since indexing is CI-side
not serve-side.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 13:09:38 -04:00

3.0 KiB

Drawbar deploy — crop-chem-docs MCP server snippet

Drop this into Drawbar's docker-compose.yml. Targets the existing trashpanda stack: shared Docker network with ollama + llama-rerank service containers, Cloudflare Tunnel out front.

Pre-reqs (one-time on the deploy host)

  1. Login to the Gitea registry so the host can pull:
    docker login git.jpaul.io -u justin   # PAT for password
    
  2. ollama and llama-rerank services are already running in the same compose stack on the same Docker network. The MCP container resolves them by service name via Docker's embedded DNS — no IPs to maintain.

Compose service

services:
  crop-chem-docs:
    image: git.jpaul.io/justin/crop-chem-docs:corpus-2026.05.24
    # :latest for dev / Watchtower auto-pull
    container_name: crop-chem-docs
    restart: unless-stopped
    ports:
      - "8001:8000"   # MCP server (streamable-http). Adjust host port.
    # No environment block needed — the image's defaults handle it:
    #   OLLAMA_URL=http://ollama:11434
    #   RERANK_URL=http://llama-rerank:8080
    #   HYBRID_SEARCH=true
    #   PRODUCT_NAME=crop_chem
    # Override here only if your services have different names.
    networks:
      - default  # or whichever shared network ollama/llama-rerank are on
    labels:
      com.centurylinklabs.watchtower.enable: "true"

If your stack uses non-default service names:

    environment:
      OLLAMA_URL: "http://<your-ollama-service>:11434"
      RERANK_URL: "http://<your-rerank-service>:8080"

Test from the host

# Verify counts + indexes from inside the container:
docker exec crop-chem-docs python -c \
  "from docs_mcp.server import corpus_status; print(corpus_status())"

What the container exposes

Tool What it does
search_docs Hybrid+rerank pesticide-label search with optional filters
get_page Full label markdown + metadata by (source, source_key)
list_versions Discover sources, product classes, signal words, registrants
corpus_status Counts + freshness; useful for health probes
crop_chem_api_lessons Curated agronomy / label-handling knowledge — call before recommending

Tag scheme

Tag When Use for
:latest Every monthly refresh + every code push Dev / Watchtower auto-pull
:<sha12> Every build Rollback pin
:corpus-YYYY.MM.DD Every build Production pin (frozen corpus version)

Updating the corpus

  • Monthly cron — 1st @ 06:00 UTC, full re-scrape of Bayer + EPA PPLS, reindex, image push. Watchtower pulls the new :latest automatically.
  • Manual — Gitea Actions UI → Monthly corpus refreshRun workflow. Optional sources input for single-source refresh (e.g., bayer only).

Switching corpus scope

The row-crop filter (corn/soybeans/wheat) is in scrape/sources/epa_ppls.py as ROW_CROP_KEYWORDS. Edit + push + let the next workflow run pick it up. Same for the registrant allowlist at scrape/sources/epa_registrant_allowlist.json.