Files

T

justin e5da4b21b0 deploy: add llama-rerank service to compose snippet

Drawbar's compose doesn't have a rerank service today — the
llama-rerank container I spun up earlier was a standalone
docker run, not a compose service. For Docker DNS resolution
(http://llama-rerank:8080) to work between MCP + reranker, both
need to be siblings in the same compose stack.

Added the llama-rerank service entry with:
- :server-cuda image (CUDA-built llama.cpp; the plain :server is
  CPU-only and 25× slower for our 50-doc rerank pool)
- -ngl 99 to offload all layers to GPU
- deploy.resources.reservations.devices block for compose v3 GPU
  passthrough (preferred over the older `runtime: nvidia` syntax)
- volume for the HuggingFace model cache so first-start GGUF
  download survives container recreates
- no host port mapping — internal-network-only

Tesla P4 compatibility notes inline: Pascal (CC 6.1) is in the
:server-cuda image's compute-arch list (500-1200) so no special
handling beyond the standard compose entry.

Also: cleanup instruction to docker rm -f the standalone
llama-rerank from the earlier setup before bringing up compose
(name collision).

And: noted that if trashpanda's existing Ollama is a host-mode
process rather than a compose service, the MCP needs
host.docker.internal override (snippet included).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-24 13:25:34 -04:00

4.8 KiB

Raw Blame History

Drawbar deploy — `crop-chem-docs` MCP server snippet

Drop these two services into Drawbar's docker-compose.yml. Targets the trashpanda stack: shared Docker network with the existing Drawbar services + the Cloudflare Tunnel.

Pre-reqs (one-time on the deploy host)

Docker login to the Gitea registry:

docker login git.jpaul.io -u justin   # PAT for password

NVIDIA Container Toolkit — already installed on trashpanda (the existing standalone llama-rerank container ran with --gpus all fine).
If a standalone llama-rerank container is already running (left over from earlier setup), remove it so the compose service can bind the same name:
```
docker rm -f llama-rerank
```

Compose services

services:

  # ---- Reranker sidecar -----------------------------------------
  # jina-reranker-v2-base-multilingual via llama.cpp on the Tesla P4.
  # Internal port only (no host port mapping needed — the MCP reaches
  # it via Docker DNS). ~280 MB GPU VRAM at idle, ~500 MB during a
  # 50-doc rerank. Co-exists fine with any other GPU users on the P4.
  llama-rerank:
    image: ghcr.io/ggml-org/llama.cpp:server-cuda
    container_name: llama-rerank
    restart: unless-stopped
    command:
      - "-hf"
      - "gpustack/jina-reranker-v2-base-multilingual-GGUF:Q8_0"
      - "--reranking"
      - "--host"
      - "0.0.0.0"
      - "--port"
      - "8080"
      - "-ngl"
      - "99"            # offload all layers to GPU
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    # Model cache survives container recreates; first start downloads
    # the GGUF (~280 MB) from HuggingFace.
    volumes:
      - llama-rerank-cache:/root/.cache/huggingface
    networks:
      - default

  # ---- MCP server ------------------------------------------------
  crop-chem-docs:
    image: git.jpaul.io/justin/crop-chem-docs:corpus-2026.05.24
    # :latest for dev / Watchtower auto-pull
    container_name: crop-chem-docs
    restart: unless-stopped
    ports:
      - "8001:8000"   # MCP server (streamable-http). Adjust host port.
    # No environment block needed — the image's defaults handle it:
    #   OLLAMA_URL=http://ollama:11434
    #   RERANK_URL=http://llama-rerank:8080
    #   HYBRID_SEARCH=true
    #   PRODUCT_NAME=crop_chem
    # Override here only if your services have different names.
    depends_on:
      - llama-rerank
    networks:
      - default
    labels:
      com.centurylinklabs.watchtower.enable: "true"

volumes:
  llama-rerank-cache:

Note on the existing `ollama` service

The Dockerfile default is OLLAMA_URL=http://ollama:11434 — that assumes there's an ollama service in the same compose stack. If trashpanda's Ollama is a host-mode process (not a compose service), override the env in the crop-chem-docs block:

    environment:
      OLLAMA_URL: "http://host.docker.internal:11434"
    extra_hosts:
      - "host.docker.internal:host-gateway"

Or just add Ollama itself to the compose stack as a sibling service.

Test once both are up

docker compose up -d llama-rerank crop-chem-docs

# Wait ~10s for both to come up, then:
docker exec crop-chem-docs python -c \
  "from docs_mcp.server import corpus_status; print(corpus_status())"

Expect: # crop-chem-docs corpus status, 4,159 labels, 216,467 chunks, BM25 db present, RERANK_URL=http://llama-rerank:8080, HYBRID_SEARCH=on.

Then a live search to verify hybrid+rerank:

docker exec crop-chem-docs python -c \
  "from docs_mcp.server import search_docs; print(search_docs('soybean herbicide for waterhemp', k=2))"

Expect: 2 hits with Sencor/Tackle/Warrant in top-2, mode=hybrid-rrf+rerank in the header.

What the MCP container exposes

Tool	What it does
`search_docs`	Hybrid+rerank pesticide-label search with optional filters
`get_page`	Full label markdown + metadata by `(source, source_key)`
`list_versions`	Discover sources, product classes, signal words, registrants
`corpus_status`	Counts + freshness; useful for health probes
`crop_chem_api_lessons`	Curated agronomy / label-handling knowledge — call before recommending

Tag scheme

Tag	When	Use for
`:latest`	Every monthly refresh + every code push	Dev / Watchtower auto-pull
`:<sha12>`	Every build	Rollback pin
`:corpus-YYYY.MM.DD`	Every build	Production pin (frozen corpus version)

Updating the corpus

Monthly cron — 1st @ 06:00 UTC, full re-scrape of Bayer + EPA PPLS, reindex, image push. Watchtower pulls the new :latest automatically.
Manual — Gitea Actions UI → Monthly corpus refresh → Run workflow. Optional sources input for single-source refresh (e.g., bayer only).

4.8 KiB Raw Blame History

Drawbar deploy — crop-chem-docs MCP server snippet