Make evals fast and hang-proof (timeout, retry, concurrency) (#44)

The "Run evals" step ran 24 API calls sequentially with no request timeout, so it was slow and could stall indefinitely if one call hung. - bin/lib/anthropic.mjs: per-request timeout (120s) via AbortController + retry (2x, backoff) on 429/5xx/timeout. Fails fast on 4xx (bad key/model). - evals/run-evals.mjs: run (case × model) tasks through a concurrency pool (default 4, --concurrency to tune); preserves result order. - eval-leaderboard.yml: job timeout-minutes: 20 as a safety net. Applies to the next run. The hardening also benefits the Action runner and `generate`, which share the client. Claude-Session: https://claude.ai/code/session_016JWn5jRD5tcEFKrubjQ6Px Co-authored-by: Claude <noreply@anthropic.com>
2026-06-18 13:30:06 +01:00
parent edb663ad72
commit 827d7f62ec
4 changed files with 83 additions and 30 deletions
@@ -9,6 +9,12 @@ each new wave of skills bumps the **major** version, extensions and fixes bump

 ## [Unreleased]

+### Changed
+- **Faster, hang-proof evals.** The Anthropic client now has a per-request timeout (120s)
+  and limited retries (429/5xx/timeout); the eval harness runs cases concurrently
+  (default 4). The leaderboard workflow has a 20-minute job timeout. A 24-call run that
+  was sequential now finishes in a few minutes and can't stall a job indefinitely.
+
 ### Added
 - **One-click leaderboard updates in CI** — `.github/workflows/eval-leaderboard.yml`
  ("Update Skill Leaderboard") runs the evals with the `ANTHROPIC_API_KEY` secret, commits