deploy: add llama-rerank service to compose snippet
Drawbar's compose doesn't have a rerank service today — the llama-rerank container I spun up earlier was a standalone docker run, not a compose service. For Docker DNS resolution (http://llama-rerank:8080) to work between MCP + reranker, both need to be siblings in the same compose stack. Added the llama-rerank service entry with: - :server-cuda image (CUDA-built llama.cpp; the plain :server is CPU-only and 25× slower for our 50-doc rerank pool) - -ngl 99 to offload all layers to GPU - deploy.resources.reservations.devices block for compose v3 GPU passthrough (preferred over the older `runtime: nvidia` syntax) - volume for the HuggingFace model cache so first-start GGUF download survives container recreates - no host port mapping — internal-network-only Tesla P4 compatibility notes inline: Pascal (CC 6.1) is in the :server-cuda image's compute-arch list (500-1200) so no special handling beyond the standard compose entry. Also: cleanup instruction to docker rm -f the standalone llama-rerank from the earlier setup before bringing up compose (name collision). And: noted that if trashpanda's existing Ollama is a host-mode process rather than a compose service, the MCP needs host.docker.internal override (snippet included). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,24 +1,64 @@
|
|||||||
# Drawbar deploy — `crop-chem-docs` MCP server snippet
|
# Drawbar deploy — `crop-chem-docs` MCP server snippet
|
||||||
|
|
||||||
Drop this into Drawbar's `docker-compose.yml`. Targets the existing
|
Drop these two services into Drawbar's `docker-compose.yml`. Targets
|
||||||
trashpanda stack: shared Docker network with `ollama` + `llama-rerank`
|
the trashpanda stack: shared Docker network with the existing
|
||||||
service containers, Cloudflare Tunnel out front.
|
Drawbar services + the Cloudflare Tunnel.
|
||||||
|
|
||||||
## Pre-reqs (one-time on the deploy host)
|
## Pre-reqs (one-time on the deploy host)
|
||||||
|
|
||||||
1. **Login to the Gitea registry** so the host can pull:
|
1. **Docker login to the Gitea registry:**
|
||||||
```bash
|
```bash
|
||||||
docker login git.jpaul.io -u justin # PAT for password
|
docker login git.jpaul.io -u justin # PAT for password
|
||||||
```
|
```
|
||||||
2. **`ollama` and `llama-rerank` services** are already running in
|
2. **NVIDIA Container Toolkit** — already installed on trashpanda
|
||||||
the same compose stack on the same Docker network. The MCP
|
(the existing standalone `llama-rerank` container ran with
|
||||||
container resolves them by service name via Docker's embedded
|
`--gpus all` fine).
|
||||||
DNS — no IPs to maintain.
|
3. **If a standalone `llama-rerank` container is already running**
|
||||||
|
(left over from earlier setup), remove it so the compose service
|
||||||
|
can bind the same name:
|
||||||
|
```bash
|
||||||
|
docker rm -f llama-rerank
|
||||||
|
```
|
||||||
|
|
||||||
## Compose service
|
## Compose services
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
services:
|
services:
|
||||||
|
|
||||||
|
# ---- Reranker sidecar -----------------------------------------
|
||||||
|
# jina-reranker-v2-base-multilingual via llama.cpp on the Tesla P4.
|
||||||
|
# Internal port only (no host port mapping needed — the MCP reaches
|
||||||
|
# it via Docker DNS). ~280 MB GPU VRAM at idle, ~500 MB during a
|
||||||
|
# 50-doc rerank. Co-exists fine with any other GPU users on the P4.
|
||||||
|
llama-rerank:
|
||||||
|
image: ghcr.io/ggml-org/llama.cpp:server-cuda
|
||||||
|
container_name: llama-rerank
|
||||||
|
restart: unless-stopped
|
||||||
|
command:
|
||||||
|
- "-hf"
|
||||||
|
- "gpustack/jina-reranker-v2-base-multilingual-GGUF:Q8_0"
|
||||||
|
- "--reranking"
|
||||||
|
- "--host"
|
||||||
|
- "0.0.0.0"
|
||||||
|
- "--port"
|
||||||
|
- "8080"
|
||||||
|
- "-ngl"
|
||||||
|
- "99" # offload all layers to GPU
|
||||||
|
deploy:
|
||||||
|
resources:
|
||||||
|
reservations:
|
||||||
|
devices:
|
||||||
|
- driver: nvidia
|
||||||
|
count: all
|
||||||
|
capabilities: [gpu]
|
||||||
|
# Model cache survives container recreates; first start downloads
|
||||||
|
# the GGUF (~280 MB) from HuggingFace.
|
||||||
|
volumes:
|
||||||
|
- llama-rerank-cache:/root/.cache/huggingface
|
||||||
|
networks:
|
||||||
|
- default
|
||||||
|
|
||||||
|
# ---- MCP server ------------------------------------------------
|
||||||
crop-chem-docs:
|
crop-chem-docs:
|
||||||
image: git.jpaul.io/justin/crop-chem-docs:corpus-2026.05.24
|
image: git.jpaul.io/justin/crop-chem-docs:corpus-2026.05.24
|
||||||
# :latest for dev / Watchtower auto-pull
|
# :latest for dev / Watchtower auto-pull
|
||||||
@@ -32,29 +72,57 @@ services:
|
|||||||
# HYBRID_SEARCH=true
|
# HYBRID_SEARCH=true
|
||||||
# PRODUCT_NAME=crop_chem
|
# PRODUCT_NAME=crop_chem
|
||||||
# Override here only if your services have different names.
|
# Override here only if your services have different names.
|
||||||
|
depends_on:
|
||||||
|
- llama-rerank
|
||||||
networks:
|
networks:
|
||||||
- default # or whichever shared network ollama/llama-rerank are on
|
- default
|
||||||
labels:
|
labels:
|
||||||
com.centurylinklabs.watchtower.enable: "true"
|
com.centurylinklabs.watchtower.enable: "true"
|
||||||
|
|
||||||
|
volumes:
|
||||||
|
llama-rerank-cache:
|
||||||
```
|
```
|
||||||
|
|
||||||
If your stack uses non-default service names:
|
## Note on the existing `ollama` service
|
||||||
|
|
||||||
|
The Dockerfile default is `OLLAMA_URL=http://ollama:11434` — that
|
||||||
|
assumes there's an `ollama` service in the same compose stack. If
|
||||||
|
trashpanda's Ollama is a host-mode process (not a compose service),
|
||||||
|
override the env in the `crop-chem-docs` block:
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
environment:
|
environment:
|
||||||
OLLAMA_URL: "http://<your-ollama-service>:11434"
|
OLLAMA_URL: "http://host.docker.internal:11434"
|
||||||
RERANK_URL: "http://<your-rerank-service>:8080"
|
extra_hosts:
|
||||||
|
- "host.docker.internal:host-gateway"
|
||||||
```
|
```
|
||||||
|
|
||||||
## Test from the host
|
Or just add Ollama itself to the compose stack as a sibling service.
|
||||||
|
|
||||||
|
## Test once both are up
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Verify counts + indexes from inside the container:
|
docker compose up -d llama-rerank crop-chem-docs
|
||||||
|
|
||||||
|
# Wait ~10s for both to come up, then:
|
||||||
docker exec crop-chem-docs python -c \
|
docker exec crop-chem-docs python -c \
|
||||||
"from docs_mcp.server import corpus_status; print(corpus_status())"
|
"from docs_mcp.server import corpus_status; print(corpus_status())"
|
||||||
```
|
```
|
||||||
|
|
||||||
## What the container exposes
|
Expect: `# crop-chem-docs corpus status`, 4,159 labels, 216,467
|
||||||
|
chunks, BM25 db present, `RERANK_URL=http://llama-rerank:8080`,
|
||||||
|
`HYBRID_SEARCH=on`.
|
||||||
|
|
||||||
|
Then a live search to verify hybrid+rerank:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker exec crop-chem-docs python -c \
|
||||||
|
"from docs_mcp.server import search_docs; print(search_docs('soybean herbicide for waterhemp', k=2))"
|
||||||
|
```
|
||||||
|
|
||||||
|
Expect: 2 hits with Sencor/Tackle/Warrant in top-2, `mode=hybrid-rrf+rerank` in the header.
|
||||||
|
|
||||||
|
## What the MCP container exposes
|
||||||
|
|
||||||
| Tool | What it does |
|
| Tool | What it does |
|
||||||
|---|---|
|
|---|---|
|
||||||
@@ -78,10 +146,3 @@ docker exec crop-chem-docs python -c \
|
|||||||
reindex, image push. Watchtower pulls the new `:latest` automatically.
|
reindex, image push. Watchtower pulls the new `:latest` automatically.
|
||||||
- **Manual** — Gitea Actions UI → `Monthly corpus refresh` → `Run workflow`.
|
- **Manual** — Gitea Actions UI → `Monthly corpus refresh` → `Run workflow`.
|
||||||
Optional `sources` input for single-source refresh (e.g., `bayer` only).
|
Optional `sources` input for single-source refresh (e.g., `bayer` only).
|
||||||
|
|
||||||
## Switching corpus scope
|
|
||||||
|
|
||||||
The row-crop filter (corn/soybeans/wheat) is in
|
|
||||||
`scrape/sources/epa_ppls.py` as `ROW_CROP_KEYWORDS`. Edit + push +
|
|
||||||
let the next workflow run pick it up. Same for the registrant
|
|
||||||
allowlist at `scrape/sources/epa_registrant_allowlist.json`.
|
|
||||||
|
|||||||
Reference in New Issue
Block a user