eval: GPU rerank baseline + CLI fix
GPU eval (hybrid+rerank, RERANK_URL=http://10.10.1.65:8082): MRR=0.672 Recall@5=0.638 nDCG@5=0.621 (35 queries, 1 transient 500, otherwise clean) Quality identical to the CPU rerank run as expected — only latency changed (single rerank call dropped from ~23s to ~0.7-1.5s on the Tesla P4). Per-query report at eval/results/with_rerank_gpu.md. CLI parser fix: `--retrievers dense+rerank,hybrid+rerank` now correctly wires the dense+rerank variant. Previously only literal "rerank" (without prefix) matched the dense+rerank branch, so combined-retriever runs silently dropped dense+rerank. (Note: the eval's RerankedRetriever does 50 individual Chroma `get` calls per query to fetch chunk text by (source, source_key); this adds ~15s per query of pure SQLite lookup overhead. Not a production concern — docs_mcp/server.py's _rerank_pool reranks docs already in the dense pool, no extra Chroma round-trips. Worth tightening the eval-side impl on a later pass.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+3
-2
@@ -102,7 +102,7 @@ def main() -> int:
|
||||
DenseRetriever, BM25Retriever, HybridRetriever, RerankedRetriever
|
||||
)
|
||||
|
||||
wanted = [x.strip() for x in args.retrievers.split(",") if x.strip()]
|
||||
wanted = {x.strip() for x in args.retrievers.split(",") if x.strip()}
|
||||
dense = DenseRetriever()
|
||||
bm25 = BM25Retriever()
|
||||
|
||||
@@ -113,7 +113,8 @@ def main() -> int:
|
||||
retrievers.append(("bm25", bm25))
|
||||
if "hybrid" in wanted:
|
||||
retrievers.append(("hybrid-rrf", HybridRetriever(dense=dense, bm25=bm25, pool=args.pool)))
|
||||
if "rerank" in wanted:
|
||||
# Accept either "rerank" or "dense+rerank" for the dense-base reranker.
|
||||
if "rerank" in wanted or "dense+rerank" in wanted:
|
||||
retrievers.append(("dense+rerank",
|
||||
RerankedRetriever(base=dense, pool=args.pool)))
|
||||
if "hybrid+rerank" in wanted:
|
||||
|
||||
Reference in New Issue
Block a user