75f714b454
deploy/docker-compose.yml — replace <product>/<registry> placeholders with concrete values for Drawbar's stack: - image: git.jpaul.io/justin/seed-mcp:latest (CF tunnel for pulls; CI pushes via LAN 192.168.0.2:1234 to avoid 100 MB body cap) - container_name: seed-mcp - port 8001:8000 (8001 host-side to not collide with crop-chem-docs on 8000) - PRODUCT_NAME=crop_seed, hybrid search enabled, stateless HTTP - llama-rerank shared with crop-chem-docs (NOT redefined here — expected to already be in Drawbar's parent compose network) - networks.drawbar-mcp external: true so seed-mcp joins the existing cross-MCP shared network .gitignore — corpus/ is now COMMITTED, not ignored. The monthly refresh workflow scrapes and commits corpus changes; the image-only workflow rebuilds indexes from the committed corpus. Allowing the corpus to flow through git means the :corpus-YYYY.MM.DD image tag pins to a specific seed-catalog snapshot. chroma/ and bm25/ remain ignored — those are deterministically derived from corpus. Initial committed snapshot: 614 varieties. - bayer_seeds: 475 (DEKALB 288 + Asgrow 102 + WestBred 85) - golden_harvest: 139 (Syngenta corn + soy; 36 sitemap URLs 302-redirected = discontinued) rag/chunk.py — normalize brand and crop to uppercase/lowercase in Chroma metadata so cross-vendor brand-filter lookups don't break on casing inconsistency (Bayer stores "DEKALB", Golden Harvest stores "Golden Harvest"; _build_where uppercases user-supplied brand which matched the former but not the latter pre-fix). Sidecar JSON keeps original casing for display. Stub scrapers (nk, agripro, becks_pfr, becks_products) — change return code from 2 to 0 so the monthly-refresh CI workflow doesn't fail on deferred sources. Real implementations will return 0 on success / 1 on failure when they ship. Smoke-tested cross-vendor retrieval against the 614-chunk index: - list_versions shows both vendors with correct facet counts - broad "corn hybrid 100 RM" query returns both DEKALB and Golden Harvest hits in top 5 - brand='Golden Harvest' filter returns 3 GH-only varieties - variety-code prefilter still works (E085Z5 → top hit on GH) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
84 lines
3.4 KiB
YAML
84 lines
3.4 KiB
YAML
# Hosting stack for the seed-mcp MCP server.
|
|
#
|
|
# This compose file is meant to live in Drawbar's deploy stack and is
|
|
# included here as the canonical reference. The seed-mcp image is
|
|
# self-contained — corpus + Chroma + BM25 are baked in by CI at build
|
|
# time — so the only host-side concerns are usage-log persistence and
|
|
# the shared reranker / Ollama sidecars.
|
|
#
|
|
# The reranker container (llama-rerank) is SHARED with crop-chem-docs.
|
|
# Drawbar's compose already has it from the crop-chem-docs deploy;
|
|
# don't duplicate it here when stacking the two MCPs together.
|
|
#
|
|
# Watchtower auto-pulls on :latest changes — but ONLY for containers
|
|
# labeled `com.centurylinklabs.watchtower.enable=true`.
|
|
|
|
services:
|
|
|
|
# The seed-mcp server. Image is rebuilt nightly by .gitea/workflows/
|
|
# refresh.yml; pulled via the public git.jpaul.io endpoint (CF
|
|
# tunnels in front, so the 100 MB body cap doesn't matter on pulls).
|
|
seed-mcp:
|
|
image: git.jpaul.io/justin/seed-mcp:latest
|
|
container_name: seed-mcp
|
|
restart: unless-stopped
|
|
ports:
|
|
- "8001:8000"
|
|
environment:
|
|
PRODUCT_NAME: "crop_seed"
|
|
PRODUCT_DOCS_URL: "https://git.jpaul.io/justin/seed-mcp"
|
|
|
|
# Streamable-HTTP transport, stateless mode (every request gets
|
|
# a fresh ephemeral session). Required for production: avoids
|
|
# 404 storms when Watchtower recreates the container while
|
|
# clients hold session IDs from the previous instance.
|
|
MCP_TRANSPORT: streamable-http
|
|
MCP_HOST: 0.0.0.0
|
|
MCP_PORT: "8000"
|
|
MCP_DISABLE_DNS_REBINDING_PROTECTION: "1"
|
|
|
|
# Embedding pool. Drawbar's compose puts the seed-mcp on the
|
|
# same docker network as Ollama; comma-separate multiple
|
|
# endpoints (one per GPU) for indexing throughput. At runtime
|
|
# only search_docs hits this (one embed per query, ~5ms).
|
|
OLLAMA_URL: "http://ollama:11434"
|
|
|
|
# Reranker. The llama.cpp sidecar serving jina-reranker-v2-base
|
|
# is SHARED with crop-chem-docs. Drawbar's compose already
|
|
# defines llama-rerank from the crop-chem-docs deploy; we just
|
|
# point at the same DNS name. Falls back to dense-only on any
|
|
# rerank error so MCP requests never block on the sidecar.
|
|
RERANK_URL: "http://llama-rerank:8080"
|
|
RERANK_POOL: "200"
|
|
RERANK_TIMEOUT: "30"
|
|
|
|
# Hybrid retrieval (BM25 + dense + RRF + exact-code prefilter).
|
|
# Worth it for seed-mcp because farmer queries often contain
|
|
# rare technical tokens — variety codes (DKC62-08RIB), trait
|
|
# codes (XF/VT2PRIB), Rps gene names, disease abbreviations.
|
|
HYBRID_SEARCH: "true"
|
|
RRF_K: "60"
|
|
|
|
# Usage telemetry. JSONL with daily rotation; 90-day retention.
|
|
USAGE_LOG_DIR: /app/var/logs
|
|
USAGE_LOG_KEEP_DAYS: "90"
|
|
volumes:
|
|
# Usage logs persist across container recreates. Mount point
|
|
# creates host directory `./seed-mcp-logs/` on first run.
|
|
- ./seed-mcp-logs:/app/var/logs
|
|
labels:
|
|
# Watchtower polls only containers with this label = true.
|
|
com.centurylinklabs.watchtower.enable: "true"
|
|
networks:
|
|
- drawbar-mcp
|
|
|
|
# NOTE: do NOT include llama-rerank or ollama here if you're stacking
|
|
# this compose alongside crop-chem-docs. They're already defined in
|
|
# the parent stack. The networks: external: true block below assumes
|
|
# those services live on the drawbar-mcp shared network.
|
|
|
|
networks:
|
|
drawbar-mcp:
|
|
external: true
|
|
name: drawbar-mcp
|