Phase 6: reranker sidecar (jina-reranker-v2-base via llama.cpp)

Wires the docs_mcp/server.py reranker hook into a real backend:
  ghcr.io/ggml-org/llama.cpp:server \\
    -hf gpustack/jina-reranker-v2-base-multilingual-GGUF:Q8_0 \\
    --reranking --host 0.0.0.0 --port 8080

Setup recipe at deploy/rerank-docker.md. The MCP server already
honors RERANK_URL (added in Phase 7+8 commit); setting it to
http://<host>:8082 turns on rerank automatically.

## Eval results (35 queries, k=5, pool=50)

  | Retriever      | MRR   | Recall@5 | nDCG@5 |
  |----------------|-------|----------|--------|
  | dense          | 0.027 | 0.086    | 0.041  |
  | bm25           | 0.544 | 0.586    | 0.524  |
  | hybrid-rrf     | 0.114 | 0.114    | 0.108  |
  | dense+rerank   | 0.171 | 0.143    | 0.149  |
  | hybrid+rerank  | 0.672 | 0.638    | 0.621  |  ← winner

The reranker fixes hybrid's failure mode (dense noise polluting
the fused pool) by scoring each (query, chunk) pair independently.
Net: hybrid+rerank gives +24% MRR over BM25-only.

Smoke test for the reranker itself (query: "soybean herbicide for
waterhemp", 4 candidates):
  index=1 SENCOR metribuzin waterhemp soybean → score=0.84  ← right
  index=3 Headline wheat fungicide           → score=-2.80
  index=2 Lorsban corn rootworm              → score=-2.91
  index=0 Roundup fallow burndown            → score=-3.44
Strong separation between the right doc and the rest.

## Production gotchas

- CPU-only reranker is slow (~23s for a 50-doc pool). For
  interactive use put it on GPU (`--gpus all`); ~10-20× faster.
- jina-reranker rejects the ENTIRE batch if any pair exceeds
  n_ctx_train=1024 — server truncates each doc to 2000 chars
  before sending. Already handled in _rerank_pool.

Per-query rerank report at eval/results/with_rerank.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-24 10:50:03 -04:00
parent 335c33465b
commit 278fe5f456
2 changed files with 108 additions and 0 deletions
+52
View File
@@ -0,0 +1,52 @@
# Reranker sidecar — llama.cpp + jina-reranker-v2-base
Phase 6 setup. The MCP server reads `RERANK_URL` and, when set, pipes
the top-50 dense (or hybrid) chunks through this sidecar before
returning to the LLM. See `docs_mcp/server.py:_rerank_pool`.
## Run
```bash
docker run -d --name llama-rerank -p 8082:8080 \
ghcr.io/ggml-org/llama.cpp:server \
-hf gpustack/jina-reranker-v2-base-multilingual-GGUF:Q8_0 \
--reranking --host 0.0.0.0 --port 8080
```
The image auto-downloads the GGUF on first start (~280 MB, one-time).
First request loads the model into memory (~1s on CPU).
## Configure the MCP server
```bash
export RERANK_URL=http://localhost:8082
# search_docs will now rerank automatically
```
## Verify
```bash
curl http://localhost:8082/v1/rerank -H 'Content-Type: application/json' -d '{
"query": "soybean herbicide for waterhemp",
"documents": [
"Roundup Custom for fallow burndown",
"Sencor metribuzin controls waterhemp in soybean pre-emergence"
]
}'
```
Expect index=1 (the Sencor doc) at score ~0.8, index=0 at a strongly
negative score.
## Performance notes
- **CPU-only is slow.** ~0.5s per (query, doc) pair → ~23s for a
50-doc pool. Fine for batch eval; painful for interactive queries.
- For production, run on GPU: add `--gpus all` to docker, llama.cpp
uses the CUDA backend automatically. Expect ~10-20× speedup.
- Alternative: drop `RERANK_POOL` from 50 to ~20 in the server env.
Cuts latency 2.5× at the cost of some quality (rerank gets fewer
candidates to choose from).
- For very small batches the reranker can also run alongside
Ollama on the same GPU box — `jina-reranker-v2-base` is ~280 MB
and won't conflict with `nomic-embed-text` (~560 MB VRAM each).
+56
View File
@@ -0,0 +1,56 @@
# Eval results — queries.jsonl
- queries: 35
- k: 5
- pool: 50
- retrievers: dense, bm25, hybrid-rrf, dense+rerank, hybrid+rerank
## Summary
| Retriever | MRR | Recall@5 | nDCG@5 | Errors | Time (s) |
|---|---|---|---|---|---|
| dense | 0.027 | 0.086 | 0.041 | 0 | 5.2 |
| bm25 | 0.544 | 0.586 | 0.524 | 0 | 4.8 |
| hybrid-rrf | 0.114 | 0.114 | 0.108 | 0 | 8.5 |
| dense+rerank | 0.171 | 0.143 | 0.149 | 0 | 804.8 |
| hybrid+rerank | 0.672 | 0.638 | 0.621 | 0 | 823.3 |
## Per-query — dense
| Query | Expected | Top retrieved | MRR | Recall |
|---|---|---|---|---|
| Warrant herbicide rate for soybean | bayer/warrant, epa_ppls/524-591 | epa_ppls/524-508, epa_ppls/524-521, epa_ppls/42750-176 | 0.00 | 0.00 |
| Huskie wheat herbicide tank mix | bayer/huskie, bayer/huskie-complete | epa_ppls/71368-64, epa_ppls/279-9610, epa_ppls/10182-134 | 0.00 | 0.00 |
| Harness 20G granular corn herbicide | bayer/harness, epa_ppls/524-487 | epa_ppls/352-612, epa_ppls/352-608, epa_ppls/352-817 | 0.00 | 0.00 |
| Laudis tembotrione post-emergence corn | bayer/laudis, epa_ppls/264-860 | bayer/diflexx, epa_ppls/70506-331, epa_ppls/84229-48 | 0.00 | 0.00 |
| Roundup Custom glyphosate burndown application rate | epa_ppls/524-677, epa_ppls/524-475 | epa_ppls/42750-122, epa_ppls/5905-656, epa_ppls/228-666 | 0.00 | 0.00 |
| Liberty 280 SL glufosinate ammonium soybean | epa_ppls/7969-448 | epa_ppls/71368-111, epa_ppls/84229-45, epa_ppls/7969-500 | 0.00 | 0.00 |
| Atrazine 4L corn pre-emergence rate per acre | epa_ppls/5905-7877 | epa_ppls/5905-624, epa_ppls/89167-75, epa_ppls/7969-140 | 0.00 | 0.00 |
| Albaugh dicamba DMA salt application restrictions | epa_ppls/42750-40 | epa_ppls/5905-638, epa_ppls/34704-861, epa_ppls/5905-624 | 0.20 | 1.00 |
| Authority 4F sulfentrazone soybean residual | epa_ppls/279-3146 | epa_ppls/279-9663, epa_ppls/87290-70, epa_ppls/66222-248 | 0.00 | 0.00 |
| Prowl 10-G pendimethalin granular pre-plant | epa_ppls/241-254 | epa_ppls/70506-333, epa_ppls/42750-340, epa_ppls/91234-231 | 0.00 | 0.00 |
| Callisto GT mesotrione corn postemergence broadleaf control | epa_ppls/100-1470 | epa_ppls/100-1131, epa_ppls/89167-51, epa_ppls/100-1349 | 0.00 | 0.00 |
| Acuron Flexi corn pre-emergence S-metolachlor | epa_ppls/100-1568 | epa_ppls/62719-312, epa_ppls/42750-122, epa_ppls/5905-638 | 0.00 | 0.00 |
| Sencor 4 flowable metribuzin soybean waterhemp | epa_ppls/264-735 | epa_ppls/1381-259, epa_ppls/279-9624, epa_ppls/89167-101 | 0.00 | 0.00 |
| Broadstrike trifluralin pre-plant incorporated | epa_ppls/62719-222 | epa_ppls/87290-81, epa_ppls/70506-333, epa_ppls/91234-73 | 0.00 | 0.00 |
| Headline azoxystrobin pyraclostrobin wheat foliar fungicide | epa_ppls/7969-186 | epa_ppls/100-1222, epa_ppls/100-1164, epa_ppls/87290-63 | 0.00 | 0.00 |
| Trivapro pydiflumetofen corn fungicide tar spot | epa_ppls/100-1613 | epa_ppls/66222-250, epa_ppls/264-1209, epa_ppls/62719-346 | 0.00 | 0.00 |
| Poncho 600 clothianidin seed treatment corn | epa_ppls/7969-458 | epa_ppls/7969-459, epa_ppls/7969-458, bayer/poncho-beta | 0.50 | 1.00 |
| Gustafson Lorsban 30 chlorpyrifos granular corn rootworm | epa_ppls/264-932 | epa_ppls/89167-78, epa_ppls/5481-525, epa_ppls/1381-193 | 0.00 | 0.00 |
| RT-3 glyphosate potassium salt herbicide | bayer/rt-3 | bayer/roundup-powermax-3, epa_ppls/19713-597, epa_ppls/19713-606 | 0.25 | 1.00 |
| Roundup PowerMAX 3 glyphosate K-salt rate | bayer/roundup-powermax-3, epa_ppls/524-659 | epa_ppls/19713-597, epa_ppls/19713-606, epa_ppls/51036-333 | 0.00 | 0.00 |
| Nortron SC ethofumesate sugar beet | bayer/nortron-sc | epa_ppls/71368-25, epa_ppls/42750-122, epa_ppls/524-715 | 0.00 | 0.00 |
| DiFlexx Duo tembotrione dicamba corn | bayer/diflexx-duo | epa_ppls/71368-65, epa_ppls/1812-434, epa_ppls/1381-191 | 0.00 | 0.00 |
| Corvus thiencarbazone-methyl isoxaflutole corn pre-emergence | bayer/corvus, epa_ppls/264-1066 | epa_ppls/42750-122, bayer/scoparia, epa_ppls/70506-331 | 0.00 | 0.00 |
| Capreno tembotrione thiencarbazone corn herbicide | bayer/capreno, epa_ppls/264-1063 | epa_ppls/91234-314, epa_ppls/352-894, epa_ppls/42750-32 | 0.00 | 0.00 |
| Tilt propiconazole wheat fungicide rust | epa_ppls/100-617 | epa_ppls/19713-692, epa_ppls/34704-1113, epa_ppls/228-670 | 0.00 | 0.00 |
| what controls horseweed marestail before planting soybean | epa_ppls/524-475, epa_ppls/524-677 | epa_ppls/524-716, epa_ppls/524-717, epa_ppls/524-722 | 0.00 | 0.00 |
| what can I tank mix with 2,4-D for burndown in spring | epa_ppls/5905-7877, epa_ppls/228-666 | epa_ppls/34704-1158, epa_ppls/264-738, epa_ppls/228-364 | 0.00 | 0.00 |
| best fungicide for corn tar spot foliar application | epa_ppls/100-1613, epa_ppls/100-1547 | epa_ppls/100-1178, epa_ppls/87290-63, epa_ppls/100-1262 | 0.00 | 0.00 |
| seed treatment to control wireworm in corn | epa_ppls/7969-458, epa_ppls/7969-459 | epa_ppls/10182-212, epa_ppls/1381-231, epa_ppls/42750-300 | 0.00 | 0.00 |
| pre-emergence residual herbicide for soybean for waterhemp | epa_ppls/279-3146, epa_ppls/264-735 | epa_ppls/352-675, epa_ppls/279-3564, epa_ppls/279-3589 | 0.00 | 0.00 |
| what insecticide for soybean aphid foliar | epa_ppls/279-3206, epa_ppls/264-840 | epa_ppls/264-1157, epa_ppls/264-1159, epa_ppls/279-9615 | 0.00 | 0.00 |
| what is the rainfast interval for glyphosate | epa_ppls/524-475, epa_ppls/524-677 | epa_ppls/89167-56, epa_ppls/524-523, epa_ppls/524-707 | 0.00 | 0.00 |
| wheat fungicide for fusarium head blight | epa_ppls/7969-186, epa_ppls/100-1547 | bayer/stratego, epa_ppls/7969-246, epa_ppls/66222-250 | 0.00 | 0.00 |
| endangered species act precautions for pesticide application | epa_ppls/524-475, epa_ppls/524-591 | epa_ppls/70506-318, epa_ppls/70506-324, epa_ppls/34704-1044 | 0.00 | 0.00 |
| what herbicide do I use for postemergence broadleaf in corn | bayer/laudis, bayer/capreno, bayer/diflexx-duo | epa_ppls/352-842, epa_ppls/100-1349, epa_ppls/89167-51 | 0.00 | 0.00 |