Drawbar's compose doesn't have a rerank service today — the
llama-rerank container I spun up earlier was a standalone
docker run, not a compose service. For Docker DNS resolution
(http://llama-rerank:8080) to work between MCP + reranker, both
need to be siblings in the same compose stack.
Added the llama-rerank service entry with:
- :server-cuda image (CUDA-built llama.cpp; the plain :server is
CPU-only and 25× slower for our 50-doc rerank pool)
- -ngl 99 to offload all layers to GPU
- deploy.resources.reservations.devices block for compose v3 GPU
passthrough (preferred over the older `runtime: nvidia` syntax)
- volume for the HuggingFace model cache so first-start GGUF
download survives container recreates
- no host port mapping — internal-network-only
Tesla P4 compatibility notes inline: Pascal (CC 6.1) is in the
:server-cuda image's compute-arch list (500-1200) so no special
handling beyond the standard compose entry.
Also: cleanup instruction to docker rm -f the standalone
llama-rerank from the earlier setup before bringing up compose
(name collision).
And: noted that if trashpanda's existing Ollama is a host-mode
process rather than a compose service, the MCP needs
host.docker.internal override (snippet included).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Dockerfile now sets OLLAMA_URL=http://ollama:11434 and
RERANK_URL=http://llama-rerank:8080 as image defaults, assuming the
MCP container shares a Docker network with services named `ollama`
and `llama-rerank` (typical compose pattern). Drawbar's stack
already runs both — no cross-host IPs to maintain, no off-stack
GPU dependencies. Stays inside the trashpanda compose.
deploy/drawbar-compose-snippet.md simplified: no environment
overrides needed for the common case. Override block shown only
for stacks with non-default service names. Pull tag updated to
:corpus-2026.05.24.
Per the new architecture call:
- MCP doesn't reach out to cross-host Ollama instances (192.168.0.2,
192.168.0.125 etc.) at serve time — only at index-build time in CI.
- All serve-time dependencies are in the same Docker network as
the consumer apps.
Code push touches Dockerfile → image-only.yml will rebuild + push.
Future-me note: the image-only.yml needs Ollama reachable from the
Gitea Actions runner for the reindex step; that still uses the LAN
endpoints (workflow env), which is correct since indexing is CI-side
not serve-side.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>