diff --git a/deploy/rerank-docker.md b/deploy/rerank-docker.md new file mode 100644 index 0000000..78910c1 --- /dev/null +++ b/deploy/rerank-docker.md @@ -0,0 +1,52 @@ +# Reranker sidecar — llama.cpp + jina-reranker-v2-base + +Phase 6 setup. The MCP server reads `RERANK_URL` and, when set, pipes +the top-50 dense (or hybrid) chunks through this sidecar before +returning to the LLM. See `docs_mcp/server.py:_rerank_pool`. + +## Run + +```bash +docker run -d --name llama-rerank -p 8082:8080 \ + ghcr.io/ggml-org/llama.cpp:server \ + -hf gpustack/jina-reranker-v2-base-multilingual-GGUF:Q8_0 \ + --reranking --host 0.0.0.0 --port 8080 +``` + +The image auto-downloads the GGUF on first start (~280 MB, one-time). +First request loads the model into memory (~1s on CPU). + +## Configure the MCP server + +```bash +export RERANK_URL=http://localhost:8082 +# search_docs will now rerank automatically +``` + +## Verify + +```bash +curl http://localhost:8082/v1/rerank -H 'Content-Type: application/json' -d '{ + "query": "soybean herbicide for waterhemp", + "documents": [ + "Roundup Custom for fallow burndown", + "Sencor metribuzin controls waterhemp in soybean pre-emergence" + ] +}' +``` + +Expect index=1 (the Sencor doc) at score ~0.8, index=0 at a strongly +negative score. + +## Performance notes + +- **CPU-only is slow.** ~0.5s per (query, doc) pair → ~23s for a + 50-doc pool. Fine for batch eval; painful for interactive queries. +- For production, run on GPU: add `--gpus all` to docker, llama.cpp + uses the CUDA backend automatically. Expect ~10-20× speedup. +- Alternative: drop `RERANK_POOL` from 50 to ~20 in the server env. + Cuts latency 2.5× at the cost of some quality (rerank gets fewer + candidates to choose from). +- For very small batches the reranker can also run alongside + Ollama on the same GPU box — `jina-reranker-v2-base` is ~280 MB + and won't conflict with `nomic-embed-text` (~560 MB VRAM each). diff --git a/eval/results/with_rerank.md b/eval/results/with_rerank.md new file mode 100644 index 0000000..11a6019 --- /dev/null +++ b/eval/results/with_rerank.md @@ -0,0 +1,56 @@ +# Eval results — queries.jsonl + +- queries: 35 +- k: 5 +- pool: 50 +- retrievers: dense, bm25, hybrid-rrf, dense+rerank, hybrid+rerank + +## Summary + +| Retriever | MRR | Recall@5 | nDCG@5 | Errors | Time (s) | +|---|---|---|---|---|---| +| dense | 0.027 | 0.086 | 0.041 | 0 | 5.2 | +| bm25 | 0.544 | 0.586 | 0.524 | 0 | 4.8 | +| hybrid-rrf | 0.114 | 0.114 | 0.108 | 0 | 8.5 | +| dense+rerank | 0.171 | 0.143 | 0.149 | 0 | 804.8 | +| hybrid+rerank | 0.672 | 0.638 | 0.621 | 0 | 823.3 | + +## Per-query — dense + +| Query | Expected | Top retrieved | MRR | Recall | +|---|---|---|---|---| +| Warrant herbicide rate for soybean | bayer/warrant, epa_ppls/524-591 | epa_ppls/524-508, epa_ppls/524-521, epa_ppls/42750-176 | 0.00 | 0.00 | +| Huskie wheat herbicide tank mix | bayer/huskie, bayer/huskie-complete | epa_ppls/71368-64, epa_ppls/279-9610, epa_ppls/10182-134 | 0.00 | 0.00 | +| Harness 20G granular corn herbicide | bayer/harness, epa_ppls/524-487 | epa_ppls/352-612, epa_ppls/352-608, epa_ppls/352-817 | 0.00 | 0.00 | +| Laudis tembotrione post-emergence corn | bayer/laudis, epa_ppls/264-860 | bayer/diflexx, epa_ppls/70506-331, epa_ppls/84229-48 | 0.00 | 0.00 | +| Roundup Custom glyphosate burndown application rate | epa_ppls/524-677, epa_ppls/524-475 | epa_ppls/42750-122, epa_ppls/5905-656, epa_ppls/228-666 | 0.00 | 0.00 | +| Liberty 280 SL glufosinate ammonium soybean | epa_ppls/7969-448 | epa_ppls/71368-111, epa_ppls/84229-45, epa_ppls/7969-500 | 0.00 | 0.00 | +| Atrazine 4L corn pre-emergence rate per acre | epa_ppls/5905-7877 | epa_ppls/5905-624, epa_ppls/89167-75, epa_ppls/7969-140 | 0.00 | 0.00 | +| Albaugh dicamba DMA salt application restrictions | epa_ppls/42750-40 | epa_ppls/5905-638, epa_ppls/34704-861, epa_ppls/5905-624 | 0.20 | 1.00 | +| Authority 4F sulfentrazone soybean residual | epa_ppls/279-3146 | epa_ppls/279-9663, epa_ppls/87290-70, epa_ppls/66222-248 | 0.00 | 0.00 | +| Prowl 10-G pendimethalin granular pre-plant | epa_ppls/241-254 | epa_ppls/70506-333, epa_ppls/42750-340, epa_ppls/91234-231 | 0.00 | 0.00 | +| Callisto GT mesotrione corn postemergence broadleaf control | epa_ppls/100-1470 | epa_ppls/100-1131, epa_ppls/89167-51, epa_ppls/100-1349 | 0.00 | 0.00 | +| Acuron Flexi corn pre-emergence S-metolachlor | epa_ppls/100-1568 | epa_ppls/62719-312, epa_ppls/42750-122, epa_ppls/5905-638 | 0.00 | 0.00 | +| Sencor 4 flowable metribuzin soybean waterhemp | epa_ppls/264-735 | epa_ppls/1381-259, epa_ppls/279-9624, epa_ppls/89167-101 | 0.00 | 0.00 | +| Broadstrike trifluralin pre-plant incorporated | epa_ppls/62719-222 | epa_ppls/87290-81, epa_ppls/70506-333, epa_ppls/91234-73 | 0.00 | 0.00 | +| Headline azoxystrobin pyraclostrobin wheat foliar fungicide | epa_ppls/7969-186 | epa_ppls/100-1222, epa_ppls/100-1164, epa_ppls/87290-63 | 0.00 | 0.00 | +| Trivapro pydiflumetofen corn fungicide tar spot | epa_ppls/100-1613 | epa_ppls/66222-250, epa_ppls/264-1209, epa_ppls/62719-346 | 0.00 | 0.00 | +| Poncho 600 clothianidin seed treatment corn | epa_ppls/7969-458 | epa_ppls/7969-459, epa_ppls/7969-458, bayer/poncho-beta | 0.50 | 1.00 | +| Gustafson Lorsban 30 chlorpyrifos granular corn rootworm | epa_ppls/264-932 | epa_ppls/89167-78, epa_ppls/5481-525, epa_ppls/1381-193 | 0.00 | 0.00 | +| RT-3 glyphosate potassium salt herbicide | bayer/rt-3 | bayer/roundup-powermax-3, epa_ppls/19713-597, epa_ppls/19713-606 | 0.25 | 1.00 | +| Roundup PowerMAX 3 glyphosate K-salt rate | bayer/roundup-powermax-3, epa_ppls/524-659 | epa_ppls/19713-597, epa_ppls/19713-606, epa_ppls/51036-333 | 0.00 | 0.00 | +| Nortron SC ethofumesate sugar beet | bayer/nortron-sc | epa_ppls/71368-25, epa_ppls/42750-122, epa_ppls/524-715 | 0.00 | 0.00 | +| DiFlexx Duo tembotrione dicamba corn | bayer/diflexx-duo | epa_ppls/71368-65, epa_ppls/1812-434, epa_ppls/1381-191 | 0.00 | 0.00 | +| Corvus thiencarbazone-methyl isoxaflutole corn pre-emergence | bayer/corvus, epa_ppls/264-1066 | epa_ppls/42750-122, bayer/scoparia, epa_ppls/70506-331 | 0.00 | 0.00 | +| Capreno tembotrione thiencarbazone corn herbicide | bayer/capreno, epa_ppls/264-1063 | epa_ppls/91234-314, epa_ppls/352-894, epa_ppls/42750-32 | 0.00 | 0.00 | +| Tilt propiconazole wheat fungicide rust | epa_ppls/100-617 | epa_ppls/19713-692, epa_ppls/34704-1113, epa_ppls/228-670 | 0.00 | 0.00 | +| what controls horseweed marestail before planting soybean | epa_ppls/524-475, epa_ppls/524-677 | epa_ppls/524-716, epa_ppls/524-717, epa_ppls/524-722 | 0.00 | 0.00 | +| what can I tank mix with 2,4-D for burndown in spring | epa_ppls/5905-7877, epa_ppls/228-666 | epa_ppls/34704-1158, epa_ppls/264-738, epa_ppls/228-364 | 0.00 | 0.00 | +| best fungicide for corn tar spot foliar application | epa_ppls/100-1613, epa_ppls/100-1547 | epa_ppls/100-1178, epa_ppls/87290-63, epa_ppls/100-1262 | 0.00 | 0.00 | +| seed treatment to control wireworm in corn | epa_ppls/7969-458, epa_ppls/7969-459 | epa_ppls/10182-212, epa_ppls/1381-231, epa_ppls/42750-300 | 0.00 | 0.00 | +| pre-emergence residual herbicide for soybean for waterhemp | epa_ppls/279-3146, epa_ppls/264-735 | epa_ppls/352-675, epa_ppls/279-3564, epa_ppls/279-3589 | 0.00 | 0.00 | +| what insecticide for soybean aphid foliar | epa_ppls/279-3206, epa_ppls/264-840 | epa_ppls/264-1157, epa_ppls/264-1159, epa_ppls/279-9615 | 0.00 | 0.00 | +| what is the rainfast interval for glyphosate | epa_ppls/524-475, epa_ppls/524-677 | epa_ppls/89167-56, epa_ppls/524-523, epa_ppls/524-707 | 0.00 | 0.00 | +| wheat fungicide for fusarium head blight | epa_ppls/7969-186, epa_ppls/100-1547 | bayer/stratego, epa_ppls/7969-246, epa_ppls/66222-250 | 0.00 | 0.00 | +| endangered species act precautions for pesticide application | epa_ppls/524-475, epa_ppls/524-591 | epa_ppls/70506-318, epa_ppls/70506-324, epa_ppls/34704-1044 | 0.00 | 0.00 | +| what herbicide do I use for postemergence broadleaf in corn | bayer/laudis, bayer/capreno, bayer/diflexx-duo | epa_ppls/352-842, epa_ppls/100-1349, epa_ppls/89167-51 | 0.00 | 0.00 | \ No newline at end of file