rag: resilient embedder — rotate/split on endpoint errors; 4-GPU embed pool
Port of zerto-docs PR #45. OllamaEmbeddings previously made a single attempt per batch — any transient connection drop or HTTP error from one endpoint failed the entire index rebuild. - _embed() now rotates to the next endpoint and retries with backoff (5 attempts) on transport errors, and additionally halves the input (floor 16) on HTTP status errors: the .0.125 Windows Ollama (4090) 400s when its model runner dies on an oversized input array. Error response bodies are logged instead of swallowed. - CI workflows: OLLAMA_URLS extended from the two ripper instances to the full 4-endpoint GPU pool (+ .0.125 4090, + .0.126). At the 64-chunk batches this indexer already uses, .0.125 is the fastest embedder in the fleet (242 embeds/s measured on seed-mcp). Verified against the live pool: 64-text happy path, dead-endpoint rotation, and a forced 512-text 400 on .0.125 that split and completed. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
@@ -22,7 +22,7 @@ env:
|
||||
# Two GPU-pinned Ollama containers on the Gitea host — same infra
|
||||
# zerto-docs uses. :11435 = Titan X, :11436 = 1080 Ti. Indexer
|
||||
# round-robins per batch.
|
||||
OLLAMA_URLS: http://192.168.0.2:11435,http://192.168.0.2:11436
|
||||
OLLAMA_URLS: http://192.168.0.2:11435,http://192.168.0.2:11436,http://192.168.0.125:11434,http://192.168.0.126:11434
|
||||
EMBED_MODEL: nomic-embed-text
|
||||
PRODUCT_NAME: morpheus
|
||||
|
||||
|
||||
Reference in New Issue
Block a user