Leaderboard workflow: open a PR instead of pushing to protected main (#45)
The eval run worked (12 scored runs) but the final step failed: it pushed
evals/results.json directly to main, which the branch ruleset blocks
("Changes must be made through a pull request").
- eval-leaderboard.yml: replace the direct commit/push with
peter-evans/create-pull-request@v7 (branch eval-results), add
pull-requests: write. Merging that PR triggers the Pages deploy (which
watches evals/results.json) to publish real numbers.
- evals/README documents the PR flow + the required "Allow GitHub Actions to
create and approve pull requests" setting.
Claude-Session: https://claude.ai/code/session_016JWn5jRD5tcEFKrubjQ6Px
Co-authored-by: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -21,6 +21,7 @@ on:
|
||||
|
||||
permissions:
|
||||
contents: write
|
||||
pull-requests: write
|
||||
|
||||
concurrency:
|
||||
group: eval-leaderboard
|
||||
@@ -54,15 +55,16 @@ jobs:
|
||||
- name: Build the leaderboard page (sanity check)
|
||||
run: node scripts/build-leaderboard.mjs
|
||||
|
||||
- name: Commit results
|
||||
run: |
|
||||
git config user.name "github-actions[bot]"
|
||||
git config user.email "github-actions[bot]@users.noreply.github.com"
|
||||
git add evals/results.json
|
||||
if git diff --cached --quiet; then
|
||||
echo "No change in results."
|
||||
else
|
||||
git commit -m "chore(evals): refresh leaderboard results"
|
||||
git push
|
||||
echo "Committed evals/results.json — the Pages deploy will render real numbers."
|
||||
fi
|
||||
- name: Open a PR with the refreshed results
|
||||
uses: peter-evans/create-pull-request@v7
|
||||
with:
|
||||
add-paths: evals/results.json
|
||||
branch: eval-results
|
||||
delete-branch: true
|
||||
commit-message: "chore(evals): refresh leaderboard results"
|
||||
title: "chore(evals): refresh leaderboard results"
|
||||
body: |
|
||||
Auto-generated by the **Update Skill Leaderboard** workflow.
|
||||
|
||||
Merging this publishes the **real** numbers on the live leaderboard — the
|
||||
Pages deploy is triggered by changes to `evals/results.json`.
|
||||
|
||||
@@ -10,6 +10,8 @@ each new wave of skills bumps the **major** version, extensions and fixes bump
|
||||
## [Unreleased]
|
||||
|
||||
### Changed
|
||||
- **Leaderboard workflow opens a PR** instead of pushing to `main` (which the branch
|
||||
ruleset blocks). After it runs, merge the auto-created results PR to publish real numbers.
|
||||
- **Faster, hang-proof evals.** The Anthropic client now has a per-request timeout (120s)
|
||||
and limited retries (429/5xx/timeout); the eval harness runs cases concurrently
|
||||
(default 4). The leaderboard workflow has a 20-minute job timeout. A 24-call run that
|
||||
|
||||
+7
-3
@@ -30,9 +30,13 @@ back to `results.example.json` (clearly labelled) so the page renders before you
|
||||
|
||||
### No local key? Run it in CI
|
||||
|
||||
Add an `ANTHROPIC_API_KEY` repo secret, then go to **Actions → "Update Skill Leaderboard"
|
||||
→ Run workflow**. It runs the evals, commits `evals/results.json`, and the Pages deploy
|
||||
re-renders the public leaderboard with real numbers — no laptop required.
|
||||
1. Add an `ANTHROPIC_API_KEY` repo secret.
|
||||
2. Enable **Settings → Actions → General → Workflow permissions → "Allow GitHub Actions to
|
||||
create and approve pull requests"** (so the workflow can open its results PR — `main`
|
||||
requires PRs).
|
||||
3. **Actions → "Update Skill Leaderboard" → Run workflow.** It runs the evals and opens a
|
||||
PR with `evals/results.json`. **Merge that PR** and the Pages deploy re-renders the
|
||||
public leaderboard with real numbers — no laptop required.
|
||||
|
||||
## Add a case
|
||||
|
||||
|
||||
Reference in New Issue
Block a user