Leaderboard workflow: open a PR instead of pushing to protected main (#45)

The eval run worked (12 scored runs) but the final step failed: it pushed
evals/results.json directly to main, which the branch ruleset blocks
("Changes must be made through a pull request").

- eval-leaderboard.yml: replace the direct commit/push with
  peter-evans/create-pull-request@v7 (branch eval-results), add
  pull-requests: write. Merging that PR triggers the Pages deploy (which
  watches evals/results.json) to publish real numbers.
- evals/README documents the PR flow + the required "Allow GitHub Actions to
  create and approve pull requests" setting.


Claude-Session: https://claude.ai/code/session_016JWn5jRD5tcEFKrubjQ6Px

Co-authored-by: Claude <noreply@anthropic.com>
This commit is contained in:
mohitagw15856
2026-06-18 13:33:15 +01:00
committed by GitHub
parent 827d7f62ec
commit 4209963cff
3 changed files with 23 additions and 15 deletions
+14 -12
View File
@@ -21,6 +21,7 @@ on:
permissions:
contents: write
pull-requests: write
concurrency:
group: eval-leaderboard
@@ -54,15 +55,16 @@ jobs:
- name: Build the leaderboard page (sanity check)
run: node scripts/build-leaderboard.mjs
- name: Commit results
run: |
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
git add evals/results.json
if git diff --cached --quiet; then
echo "No change in results."
else
git commit -m "chore(evals): refresh leaderboard results"
git push
echo "Committed evals/results.json — the Pages deploy will render real numbers."
fi
- name: Open a PR with the refreshed results
uses: peter-evans/create-pull-request@v7
with:
add-paths: evals/results.json
branch: eval-results
delete-branch: true
commit-message: "chore(evals): refresh leaderboard results"
title: "chore(evals): refresh leaderboard results"
body: |
Auto-generated by the **Update Skill Leaderboard** workflow.
Merging this publishes the **real** numbers on the live leaderboard — the
Pages deploy is triggered by changes to `evals/results.json`.
+2
View File
@@ -10,6 +10,8 @@ each new wave of skills bumps the **major** version, extensions and fixes bump
## [Unreleased]
### Changed
- **Leaderboard workflow opens a PR** instead of pushing to `main` (which the branch
ruleset blocks). After it runs, merge the auto-created results PR to publish real numbers.
- **Faster, hang-proof evals.** The Anthropic client now has a per-request timeout (120s)
and limited retries (429/5xx/timeout); the eval harness runs cases concurrently
(default 4). The leaderboard workflow has a 20-minute job timeout. A 24-call run that
+7 -3
View File
@@ -30,9 +30,13 @@ back to `results.example.json` (clearly labelled) so the page renders before you
### No local key? Run it in CI
Add an `ANTHROPIC_API_KEY` repo secret, then go to **Actions → "Update Skill Leaderboard"
→ Run workflow**. It runs the evals, commits `evals/results.json`, and the Pages deploy
re-renders the public leaderboard with real numbers — no laptop required.
1. Add an `ANTHROPIC_API_KEY` repo secret.
2. Enable **Settings → Actions → General → Workflow permissions → "Allow GitHub Actions to
create and approve pull requests"** (so the workflow can open its results PR — `main`
requires PRs).
3. **Actions → "Update Skill Leaderboard" → Run workflow.** It runs the evals and opens a
PR with `evals/results.json`. **Merge that PR** and the Pages deploy re-renders the
public leaderboard with real numbers — no laptop required.
## Add a case