rag: resilient embedder — rotate/split on endpoint errors; 4-GPU embed pool #8

Merged
justin merged 1 commits from embed-pool-resilience into main 2026-06-10 15:47:02 -04:00
Owner

Port of zerto-docs PR #45 (see that PR for the run #36 post-mortem). The embedder now rotates endpoints and retries on transport errors, halves the batch on HTTP errors (the .0.125 Windows 4090 Ollama crashes its runner on oversized input arrays), and logs error bodies. CI workflows extended to the full 4-endpoint GPU pool. Verified against the live pool including a forced 400-split on .0.125.

🤖 Generated with Claude Code

Port of zerto-docs PR #45 (see that PR for the run #36 post-mortem). The embedder now rotates endpoints and retries on transport errors, halves the batch on HTTP errors (the .0.125 Windows 4090 Ollama crashes its runner on oversized input arrays), and logs error bodies. CI workflows extended to the full 4-endpoint GPU pool. Verified against the live pool including a forced 400-split on .0.125. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
justin added 1 commit 2026-06-10 15:47:01 -04:00
Port of zerto-docs PR #45. OllamaEmbeddings previously made a single
attempt per batch — any transient connection drop or HTTP error from
one endpoint failed the entire index rebuild.

- _embed() now rotates to the next endpoint and retries with backoff
  (5 attempts) on transport errors, and additionally halves the input
  (floor 16) on HTTP status errors: the .0.125 Windows Ollama (4090)
  400s when its model runner dies on an oversized input array. Error
  response bodies are logged instead of swallowed.
- CI workflows: OLLAMA_URLS extended from the two ripper instances to
  the full 4-endpoint GPU pool (+ .0.125 4090, + .0.126). At the
  64-chunk batches this indexer already uses, .0.125 is the fastest
  embedder in the fleet (242 embeds/s measured on seed-mcp).

Verified against the live pool: 64-text happy path, dead-endpoint
rotation, and a forced 512-text 400 on .0.125 that split and completed.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
justin merged commit 53b9d348d8 into main 2026-06-10 15:47:02 -04:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: justin/hvm-docs#8