Drawbar's compose doesn't have a rerank service today — the llama-rerank container I spun up earlier was a standalone docker run, not a compose service. For Docker DNS resolution (http://llama-rerank:8080) to work between MCP + reranker, both need to be siblings in the same compose stack. Added the llama-rerank service entry with: - :server-cuda image (CUDA-built llama.cpp; the plain :server is CPU-only and 25× slower for our 50-doc rerank pool) - -ngl 99 to offload all layers to GPU - deploy.resources.reservations.devices block for compose v3 GPU passthrough (preferred over the older `runtime: nvidia` syntax) - volume for the HuggingFace model cache so first-start GGUF download survives container recreates - no host port mapping — internal-network-only Tesla P4 compatibility notes inline: Pascal (CC 6.1) is in the :server-cuda image's compute-arch list (500-1200) so no special handling beyond the standard compose entry. Also: cleanup instruction to docker rm -f the standalone llama-rerank from the earlier setup before bringing up compose (name collision). And: noted that if trashpanda's existing Ollama is a host-mode process rather than a compose service, the MCP needs host.docker.internal override (snippet included). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4.8 KiB
Drawbar deploy — crop-chem-docs MCP server snippet
Drop these two services into Drawbar's docker-compose.yml. Targets
the trashpanda stack: shared Docker network with the existing
Drawbar services + the Cloudflare Tunnel.
Pre-reqs (one-time on the deploy host)
- Docker login to the Gitea registry:
docker login git.jpaul.io -u justin # PAT for password - NVIDIA Container Toolkit — already installed on trashpanda
(the existing standalone
llama-rerankcontainer ran with--gpus allfine). - If a standalone
llama-rerankcontainer is already running (left over from earlier setup), remove it so the compose service can bind the same name:docker rm -f llama-rerank
Compose services
services:
# ---- Reranker sidecar -----------------------------------------
# jina-reranker-v2-base-multilingual via llama.cpp on the Tesla P4.
# Internal port only (no host port mapping needed — the MCP reaches
# it via Docker DNS). ~280 MB GPU VRAM at idle, ~500 MB during a
# 50-doc rerank. Co-exists fine with any other GPU users on the P4.
llama-rerank:
image: ghcr.io/ggml-org/llama.cpp:server-cuda
container_name: llama-rerank
restart: unless-stopped
command:
- "-hf"
- "gpustack/jina-reranker-v2-base-multilingual-GGUF:Q8_0"
- "--reranking"
- "--host"
- "0.0.0.0"
- "--port"
- "8080"
- "-ngl"
- "99" # offload all layers to GPU
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
# Model cache survives container recreates; first start downloads
# the GGUF (~280 MB) from HuggingFace.
volumes:
- llama-rerank-cache:/root/.cache/huggingface
networks:
- default
# ---- MCP server ------------------------------------------------
crop-chem-docs:
image: git.jpaul.io/justin/crop-chem-docs:corpus-2026.05.24
# :latest for dev / Watchtower auto-pull
container_name: crop-chem-docs
restart: unless-stopped
ports:
- "8001:8000" # MCP server (streamable-http). Adjust host port.
# No environment block needed — the image's defaults handle it:
# OLLAMA_URL=http://ollama:11434
# RERANK_URL=http://llama-rerank:8080
# HYBRID_SEARCH=true
# PRODUCT_NAME=crop_chem
# Override here only if your services have different names.
depends_on:
- llama-rerank
networks:
- default
labels:
com.centurylinklabs.watchtower.enable: "true"
volumes:
llama-rerank-cache:
Note on the existing ollama service
The Dockerfile default is OLLAMA_URL=http://ollama:11434 — that
assumes there's an ollama service in the same compose stack. If
trashpanda's Ollama is a host-mode process (not a compose service),
override the env in the crop-chem-docs block:
environment:
OLLAMA_URL: "http://host.docker.internal:11434"
extra_hosts:
- "host.docker.internal:host-gateway"
Or just add Ollama itself to the compose stack as a sibling service.
Test once both are up
docker compose up -d llama-rerank crop-chem-docs
# Wait ~10s for both to come up, then:
docker exec crop-chem-docs python -c \
"from docs_mcp.server import corpus_status; print(corpus_status())"
Expect: # crop-chem-docs corpus status, 4,159 labels, 216,467
chunks, BM25 db present, RERANK_URL=http://llama-rerank:8080,
HYBRID_SEARCH=on.
Then a live search to verify hybrid+rerank:
docker exec crop-chem-docs python -c \
"from docs_mcp.server import search_docs; print(search_docs('soybean herbicide for waterhemp', k=2))"
Expect: 2 hits with Sencor/Tackle/Warrant in top-2, mode=hybrid-rrf+rerank in the header.
What the MCP container exposes
| Tool | What it does |
|---|---|
search_docs |
Hybrid+rerank pesticide-label search with optional filters |
get_page |
Full label markdown + metadata by (source, source_key) |
list_versions |
Discover sources, product classes, signal words, registrants |
corpus_status |
Counts + freshness; useful for health probes |
crop_chem_api_lessons |
Curated agronomy / label-handling knowledge — call before recommending |
Tag scheme
| Tag | When | Use for |
|---|---|---|
:latest |
Every monthly refresh + every code push | Dev / Watchtower auto-pull |
:<sha12> |
Every build | Rollback pin |
:corpus-YYYY.MM.DD |
Every build | Production pin (frozen corpus version) |
Updating the corpus
- Monthly cron — 1st @ 06:00 UTC, full re-scrape of Bayer + EPA PPLS,
reindex, image push. Watchtower pulls the new
:latestautomatically. - Manual — Gitea Actions UI →
Monthly corpus refresh→Run workflow. Optionalsourcesinput for single-source refresh (e.g.,bayeronly).