Commit Graph

3 Commits

Author SHA1 Message Date
justin e5da4b21b0 deploy: add llama-rerank service to compose snippet
Drawbar's compose doesn't have a rerank service today — the
llama-rerank container I spun up earlier was a standalone
docker run, not a compose service. For Docker DNS resolution
(http://llama-rerank:8080) to work between MCP + reranker, both
need to be siblings in the same compose stack.

Added the llama-rerank service entry with:
- :server-cuda image (CUDA-built llama.cpp; the plain :server is
  CPU-only and 25× slower for our 50-doc rerank pool)
- -ngl 99 to offload all layers to GPU
- deploy.resources.reservations.devices block for compose v3 GPU
  passthrough (preferred over the older `runtime: nvidia` syntax)
- volume for the HuggingFace model cache so first-start GGUF
  download survives container recreates
- no host port mapping — internal-network-only

Tesla P4 compatibility notes inline: Pascal (CC 6.1) is in the
:server-cuda image's compute-arch list (500-1200) so no special
handling beyond the standard compose entry.

Also: cleanup instruction to docker rm -f the standalone
llama-rerank from the earlier setup before bringing up compose
(name collision).

And: noted that if trashpanda's existing Ollama is a host-mode
process rather than a compose service, the MCP needs
host.docker.internal override (snippet included).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 13:25:34 -04:00
justin c5ed5560fc deploy: sensible Dockerfile defaults + simplified compose snippet
Image rebuild (skip scrape) / build (push) Failing after 1h41m9s
Dockerfile now sets OLLAMA_URL=http://ollama:11434 and
RERANK_URL=http://llama-rerank:8080 as image defaults, assuming the
MCP container shares a Docker network with services named `ollama`
and `llama-rerank` (typical compose pattern). Drawbar's stack
already runs both — no cross-host IPs to maintain, no off-stack
GPU dependencies. Stays inside the trashpanda compose.

deploy/drawbar-compose-snippet.md simplified: no environment
overrides needed for the common case. Override block shown only
for stacks with non-default service names. Pull tag updated to
:corpus-2026.05.24.

Per the new architecture call:
- MCP doesn't reach out to cross-host Ollama instances (192.168.0.2,
  192.168.0.125 etc.) at serve time — only at index-build time in CI.
- All serve-time dependencies are in the same Docker network as
  the consumer apps.

Code push touches Dockerfile → image-only.yml will rebuild + push.
Future-me note: the image-only.yml needs Ollama reachable from the
Gitea Actions runner for the reindex step; that still uses the LAN
endpoints (workflow env), which is correct since indexing is CI-side
not serve-side.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 13:09:38 -04:00
justin 8766d73327 deploy: Drawbar compose snippet — first image is published
Image pushed to git.jpaul.io/justin/crop-chem-docs with three tags:
  :latest             — Watchtower auto-pull target
  :a97107de4636       — commit-sha rollback pin
  :corpus-2026.05.24  — corpus-snapshot pin (prod-recommended)

Drawbar compose snippet at deploy/drawbar-compose-snippet.md.
Wires the container against the existing infra:
  - Ollama pool: 192.168.0.2:11434, 192.168.0.2:11435,
                 192.168.0.125:11434, 10.10.1.65:11434
  - Reranker:    http://10.10.1.65:8082
  - HYBRID_SEARCH=true (production retrieval — BM25 + dense + rerank)
  - Exposes streamable-HTTP MCP on port 8000

Pull path uses git.jpaul.io (public hostname, CF-fronted; pull
response bodies aren't capped). Push path uses 192.168.0.2:1234
(LAN endpoint, bypasses CF 100MB body cap). Same registry,
different URLs — per the template gotcha doc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 12:48:24 -04:00