CI workflow to run evals and update the leaderboard (#43)

Lets the leaderboard show real numbers without a local key: the new
"Update Skill Leaderboard" workflow (workflow_dispatch) runs the eval harness
with the ANTHROPIC_API_KEY secret, commits evals/results.json, and the Pages
deploy re-renders the public leaderboard with real data.

- .github/workflows/eval-leaderboard.yml: manual trigger, contents: write,
  runs run-evals.mjs + build-leaderboard.mjs, commits results.json.
- deploy-playground.yml: also trigger on evals/results.json (and the build
  scripts) so the committed results refresh the live page.
- evals/README + CHANGELOG document the CI route.


Claude-Session: https://claude.ai/code/session_016JWn5jRD5tcEFKrubjQ6Px

Co-authored-by: Claude <noreply@anthropic.com>
This commit is contained in:
mohitagw15856
2026-06-18 12:58:45 +01:00
committed by GitHub
parent 3ccfd6b5c7
commit edb663ad72
4 changed files with 83 additions and 1 deletions
+6 -1
View File
@@ -9,7 +9,12 @@ each new wave of skills bumps the **major** version, extensions and fixes bump
## [Unreleased]
_Nothing yet._
### Added
- **One-click leaderboard updates in CI** — `.github/workflows/eval-leaderboard.yml`
("Update Skill Leaderboard") runs the evals with the `ANTHROPIC_API_KEY` secret, commits
`evals/results.json`, and the Pages deploy re-renders the public leaderboard with real
numbers — no local key needed. The deploy workflow now also triggers on
`evals/results.json`.
## [20.0.0] — Agentic Tooling — 2026-06-18