Lets the leaderboard show real numbers without a local key: the new
"Update Skill Leaderboard" workflow (workflow_dispatch) runs the eval harness
with the ANTHROPIC_API_KEY secret, commits evals/results.json, and the Pages
deploy re-renders the public leaderboard with real data.
- .github/workflows/eval-leaderboard.yml: manual trigger, contents: write,
runs run-evals.mjs + build-leaderboard.mjs, commits results.json.
- deploy-playground.yml: also trigger on evals/results.json (and the build
scripts) so the committed results refresh the live page.
- evals/README + CHANGELOG document the CI route.
Claude-Session: https://claude.ai/code/session_016JWn5jRD5tcEFKrubjQ6Px
Co-authored-by: Claude <noreply@anthropic.com>
Three features riding 2026 trends (agentic CI, codegen, evals), sharing one
dependency-free Anthropic client (bin/lib/anthropic.mjs).
1. GitHub Action (action/) — run any skill in a consumer repo's CI:
uses: mohitagw15856/pm-claude-skills/action@main. Composite action +
run.mjs (loads the bundled SKILL.md, calls the API, exposes result as a
step output / file). Docs with auto-PR-description example.
2. generate command — `npx pm-claude-skills generate --from <url|file>` turns
a team's docs into a SKILL.md following the authoring standard
(bin/generate.mjs, wired into the CLI; needs ANTHROPIC_API_KEY).
3. Skill evals + Leaderboard — evals/run-evals.mjs runs each case across models
and scores output with an LLM judge (structure/completeness/usefulness/
grounding); scripts/build-leaderboard.mjs renders web/leaderboard.html
(built in the Pages deploy, falls back to clearly-labelled example data).
Linked from README, catalog, and playground.
Offline-testable parts verified (prompt building, skill loading, graceful
errors, leaderboard render). SkillCheck/audit/exports all green.
Claude-Session: https://claude.ai/code/session_016JWn5jRD5tcEFKrubjQ6Px
Co-authored-by: Claude <noreply@anthropic.com>
Closes the remaining gaps vs alirezarezvani/claude-skills across trust, content
types, discoverability, and community.
Security (trust signal + useful):
- scripts/skill-audit.mjs scans skills/*/SKILL.md + each skill's scripts/ for
prompt injection, exfiltration, dynamic code exec, destructive shell, secrets,
and hidden text. HIGH fails CI (.github/workflows/skill-audit.yml) + a badge.
- New skill-security-auditor skill teaches the same review (production tier).
Content types:
- output-styles/ — 4 personas (Startup CTO, Growth Marketer, Solo Founder,
Product Leader) as Claude Code output styles; --agent claude installs them too.
- ORCHESTRATION.md — Skill Chain / Multi-Agent Handoff / Domain Deep-Dive /
Solo Sprint patterns.
Discoverability:
- scripts/build-docs.mjs generates a server-rendered, SEO-indexable
web/catalog.html of all skills (built in the Pages deploy; gitignored).
Linked from README + playground.
Community:
- ROADMAP.md (now/next/later + good-first-issues).
README badges/sections, TIERS (47 production), CHANGELOG, package.json files,
and exports/web index all updated. SkillCheck + security audit + exports verified.
Claude-Session: https://claude.ai/code/session_016JWn5jRD5tcEFKrubjQ6Px
Co-authored-by: Claude <noreply@anthropic.com>
On push to main, rebuild web/skills.json from the SKILL.md files and publish
web/ to GitHub Pages, so the live site always reflects the current skill
library. Manual runs supported via workflow_dispatch.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>