pm-claude-skills

justin/pm-claude-skills

Fork 0

Commit Graph

Author	SHA1	Message	Date
mohitagw15856	4209963cff	Leaderboard workflow: open a PR instead of pushing to protected main (#45 ) The eval run worked (12 scored runs) but the final step failed: it pushed evals/results.json directly to main, which the branch ruleset blocks ("Changes must be made through a pull request"). - eval-leaderboard.yml: replace the direct commit/push with peter-evans/create-pull-request@v7 (branch eval-results), add pull-requests: write. Merging that PR triggers the Pages deploy (which watches evals/results.json) to publish real numbers. - evals/README documents the PR flow + the required "Allow GitHub Actions to create and approve pull requests" setting. Claude-Session: https://claude.ai/code/session_016JWn5jRD5tcEFKrubjQ6Px Co-authored-by: Claude <noreply@anthropic.com>	2026-06-18 13:33:15 +01:00
mohitagw15856	edb663ad72	CI workflow to run evals and update the leaderboard (#43 ) Lets the leaderboard show real numbers without a local key: the new "Update Skill Leaderboard" workflow (workflow_dispatch) runs the eval harness with the ANTHROPIC_API_KEY secret, commits evals/results.json, and the Pages deploy re-renders the public leaderboard with real data. - .github/workflows/eval-leaderboard.yml: manual trigger, contents: write, runs run-evals.mjs + build-leaderboard.mjs, commits results.json. - deploy-playground.yml: also trigger on evals/results.json (and the build scripts) so the committed results refresh the live page. - evals/README + CHANGELOG document the CI route. Claude-Session: https://claude.ai/code/session_016JWn5jRD5tcEFKrubjQ6Px Co-authored-by: Claude <noreply@anthropic.com>	2026-06-18 12:58:45 +01:00
mohitagw15856	51bf4be52f	AI-powered tooling: GitHub Action, generate command, evals + leaderboard (#41 ) Three features riding 2026 trends (agentic CI, codegen, evals), sharing one dependency-free Anthropic client (bin/lib/anthropic.mjs). 1. GitHub Action (action/) — run any skill in a consumer repo's CI: uses: mohitagw15856/pm-claude-skills/action@main. Composite action + run.mjs (loads the bundled SKILL.md, calls the API, exposes result as a step output / file). Docs with auto-PR-description example. 2. generate command — `npx pm-claude-skills generate --from <url\|file>` turns a team's docs into a SKILL.md following the authoring standard (bin/generate.mjs, wired into the CLI; needs ANTHROPIC_API_KEY). 3. Skill evals + Leaderboard — evals/run-evals.mjs runs each case across models and scores output with an LLM judge (structure/completeness/usefulness/ grounding); scripts/build-leaderboard.mjs renders web/leaderboard.html (built in the Pages deploy, falls back to clearly-labelled example data). Linked from README, catalog, and playground. Offline-testable parts verified (prompt building, skill loading, graceful errors, leaderboard render). SkillCheck/audit/exports all green. Claude-Session: https://claude.ai/code/session_016JWn5jRD5tcEFKrubjQ6Px Co-authored-by: Claude <noreply@anthropic.com>	2026-06-18 08:37:40 +01:00

Author

SHA1

Message

Date

mohitagw15856

4209963cff

Leaderboard workflow: open a PR instead of pushing to protected main (#45 )

The eval run worked (12 scored runs) but the final step failed: it pushed
evals/results.json directly to main, which the branch ruleset blocks
("Changes must be made through a pull request").

- eval-leaderboard.yml: replace the direct commit/push with
  peter-evans/create-pull-request@v7 (branch eval-results), add
  pull-requests: write. Merging that PR triggers the Pages deploy (which
  watches evals/results.json) to publish real numbers.
- evals/README documents the PR flow + the required "Allow GitHub Actions to
  create and approve pull requests" setting.


Claude-Session: https://claude.ai/code/session_016JWn5jRD5tcEFKrubjQ6Px

Co-authored-by: Claude <noreply@anthropic.com>

2026-06-18 13:33:15 +01:00

mohitagw15856

edb663ad72

CI workflow to run evals and update the leaderboard (#43 )

Lets the leaderboard show real numbers without a local key: the new
"Update Skill Leaderboard" workflow (workflow_dispatch) runs the eval harness
with the ANTHROPIC_API_KEY secret, commits evals/results.json, and the Pages
deploy re-renders the public leaderboard with real data.

- .github/workflows/eval-leaderboard.yml: manual trigger, contents: write,
  runs run-evals.mjs + build-leaderboard.mjs, commits results.json.
- deploy-playground.yml: also trigger on evals/results.json (and the build
  scripts) so the committed results refresh the live page.
- evals/README + CHANGELOG document the CI route.


Claude-Session: https://claude.ai/code/session_016JWn5jRD5tcEFKrubjQ6Px

Co-authored-by: Claude <noreply@anthropic.com>

2026-06-18 12:58:45 +01:00

mohitagw15856

51bf4be52f

AI-powered tooling: GitHub Action, generate command, evals + leaderboard (#41 )

Three features riding 2026 trends (agentic CI, codegen, evals), sharing one
dependency-free Anthropic client (bin/lib/anthropic.mjs).

1. GitHub Action (action/) — run any skill in a consumer repo's CI:
   uses: mohitagw15856/pm-claude-skills/action@main. Composite action +
   run.mjs (loads the bundled SKILL.md, calls the API, exposes result as a
   step output / file). Docs with auto-PR-description example.

2. generate command — `npx pm-claude-skills generate --from <url|file>` turns
   a team's docs into a SKILL.md following the authoring standard
   (bin/generate.mjs, wired into the CLI; needs ANTHROPIC_API_KEY).

3. Skill evals + Leaderboard — evals/run-evals.mjs runs each case across models
   and scores output with an LLM judge (structure/completeness/usefulness/
   grounding); scripts/build-leaderboard.mjs renders web/leaderboard.html
   (built in the Pages deploy, falls back to clearly-labelled example data).
   Linked from README, catalog, and playground.

Offline-testable parts verified (prompt building, skill loading, graceful
errors, leaderboard render). SkillCheck/audit/exports all green.


Claude-Session: https://claude.ai/code/session_016JWn5jRD5tcEFKrubjQ6Px

Co-authored-by: Claude <noreply@anthropic.com>

2026-06-18 08:37:40 +01:00

3 Commits