Leaderboard workflow: open a PR instead of pushing to protected main (#45)

The eval run worked (12 scored runs) but the final step failed: it pushed
evals/results.json directly to main, which the branch ruleset blocks
("Changes must be made through a pull request").

- eval-leaderboard.yml: replace the direct commit/push with
  peter-evans/create-pull-request@v7 (branch eval-results), add
  pull-requests: write. Merging that PR triggers the Pages deploy (which
  watches evals/results.json) to publish real numbers.
- evals/README documents the PR flow + the required "Allow GitHub Actions to
  create and approve pull requests" setting.


Claude-Session: https://claude.ai/code/session_016JWn5jRD5tcEFKrubjQ6Px

Co-authored-by: Claude <noreply@anthropic.com>
This commit is contained in:
mohitagw15856
2026-06-18 13:33:15 +01:00
committed by GitHub
parent 827d7f62ec
commit 4209963cff
3 changed files with 23 additions and 15 deletions
+14 -12
View File
@@ -21,6 +21,7 @@ on:
permissions:
contents: write
pull-requests: write
concurrency:
group: eval-leaderboard
@@ -54,15 +55,16 @@ jobs:
- name: Build the leaderboard page (sanity check)
run: node scripts/build-leaderboard.mjs
- name: Commit results
run: |
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
git add evals/results.json
if git diff --cached --quiet; then
echo "No change in results."
else
git commit -m "chore(evals): refresh leaderboard results"
git push
echo "Committed evals/results.json — the Pages deploy will render real numbers."
fi
- name: Open a PR with the refreshed results
uses: peter-evans/create-pull-request@v7
with:
add-paths: evals/results.json
branch: eval-results
delete-branch: true
commit-message: "chore(evals): refresh leaderboard results"
title: "chore(evals): refresh leaderboard results"
body: |
Auto-generated by the **Update Skill Leaderboard** workflow.
Merging this publishes the **real** numbers on the live leaderboard — the
Pages deploy is triggered by changes to `evals/results.json`.