From 4209963cff4ba7202f3ac2aea3a829771b8e9156 Mon Sep 17 00:00:00 2001 From: mohitagw15856 <119053560+mohitagw15856@users.noreply.github.com> Date: Thu, 18 Jun 2026 13:33:15 +0100 Subject: [PATCH] Leaderboard workflow: open a PR instead of pushing to protected main (#45) The eval run worked (12 scored runs) but the final step failed: it pushed evals/results.json directly to main, which the branch ruleset blocks ("Changes must be made through a pull request"). - eval-leaderboard.yml: replace the direct commit/push with peter-evans/create-pull-request@v7 (branch eval-results), add pull-requests: write. Merging that PR triggers the Pages deploy (which watches evals/results.json) to publish real numbers. - evals/README documents the PR flow + the required "Allow GitHub Actions to create and approve pull requests" setting. Claude-Session: https://claude.ai/code/session_016JWn5jRD5tcEFKrubjQ6Px Co-authored-by: Claude --- .github/workflows/eval-leaderboard.yml | 26 ++++++++++++++------------ CHANGELOG.md | 2 ++ evals/README.md | 10 +++++++--- 3 files changed, 23 insertions(+), 15 deletions(-) diff --git a/.github/workflows/eval-leaderboard.yml b/.github/workflows/eval-leaderboard.yml index ff24cdc..e6cefff 100644 --- a/.github/workflows/eval-leaderboard.yml +++ b/.github/workflows/eval-leaderboard.yml @@ -21,6 +21,7 @@ on: permissions: contents: write + pull-requests: write concurrency: group: eval-leaderboard @@ -54,15 +55,16 @@ jobs: - name: Build the leaderboard page (sanity check) run: node scripts/build-leaderboard.mjs - - name: Commit results - run: | - git config user.name "github-actions[bot]" - git config user.email "github-actions[bot]@users.noreply.github.com" - git add evals/results.json - if git diff --cached --quiet; then - echo "No change in results." - else - git commit -m "chore(evals): refresh leaderboard results" - git push - echo "Committed evals/results.json — the Pages deploy will render real numbers." - fi + - name: Open a PR with the refreshed results + uses: peter-evans/create-pull-request@v7 + with: + add-paths: evals/results.json + branch: eval-results + delete-branch: true + commit-message: "chore(evals): refresh leaderboard results" + title: "chore(evals): refresh leaderboard results" + body: | + Auto-generated by the **Update Skill Leaderboard** workflow. + + Merging this publishes the **real** numbers on the live leaderboard — the + Pages deploy is triggered by changes to `evals/results.json`. diff --git a/CHANGELOG.md b/CHANGELOG.md index 32be794..6033eaa 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,6 +10,8 @@ each new wave of skills bumps the **major** version, extensions and fixes bump ## [Unreleased] ### Changed +- **Leaderboard workflow opens a PR** instead of pushing to `main` (which the branch + ruleset blocks). After it runs, merge the auto-created results PR to publish real numbers. - **Faster, hang-proof evals.** The Anthropic client now has a per-request timeout (120s) and limited retries (429/5xx/timeout); the eval harness runs cases concurrently (default 4). The leaderboard workflow has a 20-minute job timeout. A 24-call run that diff --git a/evals/README.md b/evals/README.md index 0bf62c2..e7d03ed 100644 --- a/evals/README.md +++ b/evals/README.md @@ -30,9 +30,13 @@ back to `results.example.json` (clearly labelled) so the page renders before you ### No local key? Run it in CI -Add an `ANTHROPIC_API_KEY` repo secret, then go to **Actions → "Update Skill Leaderboard" -→ Run workflow**. It runs the evals, commits `evals/results.json`, and the Pages deploy -re-renders the public leaderboard with real numbers — no laptop required. +1. Add an `ANTHROPIC_API_KEY` repo secret. +2. Enable **Settings → Actions → General → Workflow permissions → "Allow GitHub Actions to + create and approve pull requests"** (so the workflow can open its results PR — `main` + requires PRs). +3. **Actions → "Update Skill Leaderboard" → Run workflow.** It runs the evals and opens a + PR with `evals/results.json`. **Merge that PR** and the Pages deploy re-renders the + public leaderboard with real numbers — no laptop required. ## Add a case