Make evals fast and hang-proof (timeout, retry, concurrency) (#44)

The "Run evals" step ran 24 API calls sequentially with no request timeout, so
it was slow and could stall indefinitely if one call hung.

- bin/lib/anthropic.mjs: per-request timeout (120s) via AbortController + retry
  (2x, backoff) on 429/5xx/timeout. Fails fast on 4xx (bad key/model).
- evals/run-evals.mjs: run (case × model) tasks through a concurrency pool
  (default 4, --concurrency to tune); preserves result order.
- eval-leaderboard.yml: job timeout-minutes: 20 as a safety net.

Applies to the next run. The hardening also benefits the Action runner and
`generate`, which share the client.


Claude-Session: https://claude.ai/code/session_016JWn5jRD5tcEFKrubjQ6Px

Co-authored-by: Claude <noreply@anthropic.com>

This commit is contained in:

mohitagw15856

2026-06-18 13:30:06 +01:00

committed by

GitHub

parent edb663ad72

commit 827d7f62ec

4 changed files with 83 additions and 30 deletions

									
										.github/workflows/eval-leaderboard.yml
									
		+1
		
												View File
												
				@@ -29,6 +29,7 @@ concurrency:

				jobs:

				  evaluate:

				    runs-on: ubuntu-latest

				    timeout-minutes: 20

				    steps:

				      - name: Checkout

				        uses: actions/checkout@v4