diff --git a/.github/workflows/eval-leaderboard.yml b/.github/workflows/eval-leaderboard.yml index ff24cdc..e6cefff 100644 --- a/.github/workflows/eval-leaderboard.yml +++ b/.github/workflows/eval-leaderboard.yml @@ -21,6 +21,7 @@ on: permissions: contents: write + pull-requests: write concurrency: group: eval-leaderboard @@ -54,15 +55,16 @@ jobs: - name: Build the leaderboard page (sanity check) run: node scripts/build-leaderboard.mjs - - name: Commit results - run: | - git config user.name "github-actions[bot]" - git config user.email "github-actions[bot]@users.noreply.github.com" - git add evals/results.json - if git diff --cached --quiet; then - echo "No change in results." - else - git commit -m "chore(evals): refresh leaderboard results" - git push - echo "Committed evals/results.json — the Pages deploy will render real numbers." - fi + - name: Open a PR with the refreshed results + uses: peter-evans/create-pull-request@v7 + with: + add-paths: evals/results.json + branch: eval-results + delete-branch: true + commit-message: "chore(evals): refresh leaderboard results" + title: "chore(evals): refresh leaderboard results" + body: | + Auto-generated by the **Update Skill Leaderboard** workflow. + + Merging this publishes the **real** numbers on the live leaderboard — the + Pages deploy is triggered by changes to `evals/results.json`. diff --git a/CHANGELOG.md b/CHANGELOG.md index 32be794..6033eaa 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,6 +10,8 @@ each new wave of skills bumps the **major** version, extensions and fixes bump ## [Unreleased] ### Changed +- **Leaderboard workflow opens a PR** instead of pushing to `main` (which the branch + ruleset blocks). After it runs, merge the auto-created results PR to publish real numbers. - **Faster, hang-proof evals.** The Anthropic client now has a per-request timeout (120s) and limited retries (429/5xx/timeout); the eval harness runs cases concurrently (default 4). The leaderboard workflow has a 20-minute job timeout. A 24-call run that diff --git a/evals/README.md b/evals/README.md index 0bf62c2..e7d03ed 100644 --- a/evals/README.md +++ b/evals/README.md @@ -30,9 +30,13 @@ back to `results.example.json` (clearly labelled) so the page renders before you ### No local key? Run it in CI -Add an `ANTHROPIC_API_KEY` repo secret, then go to **Actions → "Update Skill Leaderboard" -→ Run workflow**. It runs the evals, commits `evals/results.json`, and the Pages deploy -re-renders the public leaderboard with real numbers — no laptop required. +1. Add an `ANTHROPIC_API_KEY` repo secret. +2. Enable **Settings → Actions → General → Workflow permissions → "Allow GitHub Actions to + create and approve pull requests"** (so the workflow can open its results PR — `main` + requires PRs). +3. **Actions → "Update Skill Leaderboard" → Run workflow.** It runs the evals and opens a + PR with `evals/results.json`. **Merge that PR** and the Pages deploy re-renders the + public leaderboard with real numbers — no laptop required. ## Add a case