278fe5f456
Wires the docs_mcp/server.py reranker hook into a real backend:
ghcr.io/ggml-org/llama.cpp:server \\
-hf gpustack/jina-reranker-v2-base-multilingual-GGUF:Q8_0 \\
--reranking --host 0.0.0.0 --port 8080
Setup recipe at deploy/rerank-docker.md. The MCP server already
honors RERANK_URL (added in Phase 7+8 commit); setting it to
http://<host>:8082 turns on rerank automatically.
## Eval results (35 queries, k=5, pool=50)
| Retriever | MRR | Recall@5 | nDCG@5 |
|----------------|-------|----------|--------|
| dense | 0.027 | 0.086 | 0.041 |
| bm25 | 0.544 | 0.586 | 0.524 |
| hybrid-rrf | 0.114 | 0.114 | 0.108 |
| dense+rerank | 0.171 | 0.143 | 0.149 |
| hybrid+rerank | 0.672 | 0.638 | 0.621 | ← winner
The reranker fixes hybrid's failure mode (dense noise polluting
the fused pool) by scoring each (query, chunk) pair independently.
Net: hybrid+rerank gives +24% MRR over BM25-only.
Smoke test for the reranker itself (query: "soybean herbicide for
waterhemp", 4 candidates):
index=1 SENCOR metribuzin waterhemp soybean → score=0.84 ← right
index=3 Headline wheat fungicide → score=-2.80
index=2 Lorsban corn rootworm → score=-2.91
index=0 Roundup fallow burndown → score=-3.44
Strong separation between the right doc and the rest.
## Production gotchas
- CPU-only reranker is slow (~23s for a 50-doc pool). For
interactive use put it on GPU (`--gpus all`); ~10-20× faster.
- jina-reranker rejects the ENTIRE batch if any pair exceeds
n_ctx_train=1024 — server truncates each doc to 2000 chars
before sending. Already handled in _rerank_pool.
Per-query rerank report at eval/results/with_rerank.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>