The eval run worked (12 scored runs) but the final step failed: it pushed
evals/results.json directly to main, which the branch ruleset blocks
("Changes must be made through a pull request").
- eval-leaderboard.yml: replace the direct commit/push with
peter-evans/create-pull-request@v7 (branch eval-results), add
pull-requests: write. Merging that PR triggers the Pages deploy (which
watches evals/results.json) to publish real numbers.
- evals/README documents the PR flow + the required "Allow GitHub Actions to
create and approve pull requests" setting.
Claude-Session: https://claude.ai/code/session_016JWn5jRD5tcEFKrubjQ6Px
Co-authored-by: Claude <noreply@anthropic.com>
The "Run evals" step ran 24 API calls sequentially with no request timeout, so
it was slow and could stall indefinitely if one call hung.
- bin/lib/anthropic.mjs: per-request timeout (120s) via AbortController + retry
(2x, backoff) on 429/5xx/timeout. Fails fast on 4xx (bad key/model).
- evals/run-evals.mjs: run (case × model) tasks through a concurrency pool
(default 4, --concurrency to tune); preserves result order.
- eval-leaderboard.yml: job timeout-minutes: 20 as a safety net.
Applies to the next run. The hardening also benefits the Action runner and
`generate`, which share the client.
Claude-Session: https://claude.ai/code/session_016JWn5jRD5tcEFKrubjQ6Px
Co-authored-by: Claude <noreply@anthropic.com>
Lets the leaderboard show real numbers without a local key: the new
"Update Skill Leaderboard" workflow (workflow_dispatch) runs the eval harness
with the ANTHROPIC_API_KEY secret, commits evals/results.json, and the Pages
deploy re-renders the public leaderboard with real data.
- .github/workflows/eval-leaderboard.yml: manual trigger, contents: write,
runs run-evals.mjs + build-leaderboard.mjs, commits results.json.
- deploy-playground.yml: also trigger on evals/results.json (and the build
scripts) so the committed results refresh the live page.
- evals/README + CHANGELOG document the CI route.
Claude-Session: https://claude.ai/code/session_016JWn5jRD5tcEFKrubjQ6Px
Co-authored-by: Claude <noreply@anthropic.com>
- .github/workflows/pr-description.yml: uses our own Action (uses: ./action)
to auto-write this repo's PR descriptions when a PR opens empty; skips
quietly without ANTHROPIC_API_KEY and on forks. A living demo.
- Version -> 20.0.0 (Agentic Tooling): bundles the GitHub Action, generate
command, and evals/leaderboard for npm. README badge + What's New (v19
collapsed), CHANGELOG [Unreleased] -> [20.0.0], SECURITY table.
Claude-Session: https://claude.ai/code/session_016JWn5jRD5tcEFKrubjQ6Px
Co-authored-by: Claude <noreply@anthropic.com>
Three features riding 2026 trends (agentic CI, codegen, evals), sharing one
dependency-free Anthropic client (bin/lib/anthropic.mjs).
1. GitHub Action (action/) — run any skill in a consumer repo's CI:
uses: mohitagw15856/pm-claude-skills/action@main. Composite action +
run.mjs (loads the bundled SKILL.md, calls the API, exposes result as a
step output / file). Docs with auto-PR-description example.
2. generate command — `npx pm-claude-skills generate --from <url|file>` turns
a team's docs into a SKILL.md following the authoring standard
(bin/generate.mjs, wired into the CLI; needs ANTHROPIC_API_KEY).
3. Skill evals + Leaderboard — evals/run-evals.mjs runs each case across models
and scores output with an LLM judge (structure/completeness/usefulness/
grounding); scripts/build-leaderboard.mjs renders web/leaderboard.html
(built in the Pages deploy, falls back to clearly-labelled example data).
Linked from README, catalog, and playground.
Offline-testable parts verified (prompt building, skill loading, graceful
errors, leaderboard render). SkillCheck/audit/exports all green.
Claude-Session: https://claude.ai/code/session_016JWn5jRD5tcEFKrubjQ6Px
Co-authored-by: Claude <noreply@anthropic.com>
Closes the remaining gaps vs alirezarezvani/claude-skills across trust, content
types, discoverability, and community.
Security (trust signal + useful):
- scripts/skill-audit.mjs scans skills/*/SKILL.md + each skill's scripts/ for
prompt injection, exfiltration, dynamic code exec, destructive shell, secrets,
and hidden text. HIGH fails CI (.github/workflows/skill-audit.yml) + a badge.
- New skill-security-auditor skill teaches the same review (production tier).
Content types:
- output-styles/ — 4 personas (Startup CTO, Growth Marketer, Solo Founder,
Product Leader) as Claude Code output styles; --agent claude installs them too.
- ORCHESTRATION.md — Skill Chain / Multi-Agent Handoff / Domain Deep-Dive /
Solo Sprint patterns.
Discoverability:
- scripts/build-docs.mjs generates a server-rendered, SEO-indexable
web/catalog.html of all skills (built in the Pages deploy; gitignored).
Linked from README + playground.
Community:
- ROADMAP.md (now/next/later + good-first-issues).
README badges/sections, TIERS (47 production), CHANGELOG, package.json files,
and exports/web index all updated. SkillCheck + security audit + exports verified.
Claude-Session: https://claude.ai/code/session_016JWn5jRD5tcEFKrubjQ6Px
Co-authored-by: Claude <noreply@anthropic.com>
Lets the package ship to npm without a local npm install: publish a GitHub
Release and CI runs `npm publish` using an NPM_TOKEN repo secret.
- .github/workflows/npm-publish.yml: triggers on release published (and manual
dispatch), verifies the release tag matches package.json version, then
publishes with provenance (id-token: write) to the public registry.
One-time setup by the maintainer: create an npm Automation token and add it as
the NPM_TOKEN repository secret. Documented in the workflow header.
Claude-Session: https://claude.ai/code/session_016JWn5jRD5tcEFKrubjQ6Px
Co-authored-by: Claude <noreply@anthropic.com>
Three more learnings from alirezarezvani/claude-skills, applied:
1. SkillCheck validator (scripts/skillcheck.mjs) — validates every SKILL.md
against the authoring standard (frontmatter, name/folder match, trigger +
produces clauses, required headings) plus tier referential integrity.
Errors fail CI; --strict fails on warnings too. New skillcheck.yml workflow
and a SkillCheck status badge in the README. Current: 0 errors / 14 advisory
warnings across 172 skills.
2. Cursor export platform — build-exports.mjs now generates
exports/cursor/<bundle>/<skill>/<skill>.mdc rule files. The PLATFORMS
registry now supports per-skill filenames (file as a function).
3. Per-agent installers — scripts/install.sh unifies install for
claude/hermes/codex/openclaw/cursor (--link, --target, --dry-run, --list).
Curl-able one-liners codex-install.sh, openclaw-install.sh, and
cursor-install.sh clone the library and install in a single command.
README documents the one-line installs and Cursor exports; CHANGELOG and the
authoring standard updated.
Claude-Session: https://claude.ai/code/session_016JWn5jRD5tcEFKrubjQ6Px
Co-authored-by: Claude <noreply@anthropic.com>
Make the library multi-platform without duplicating content. Each
skills/<name>/SKILL.md body remains the single source of truth; a new
generator renders platform-ready exports from it.
- scripts/build-exports.mjs — dependency-free Node generator with a PLATFORMS
registry so new platforms (Gemini, Cursor, …) are a few lines. Ships ChatGPT
exports at exports/chatgpt/<bundle>/<skill>/SYSTEM_PROMPT.md (172 skills),
plus generated index READMEs. Supports --platform and --check.
- exports/ — generated ChatGPT system prompts, ready to paste into a Custom GPT.
- .github/workflows/check-generated.yml — fails a PR if exports or
web/skills.json drift from the source skills.
- README "Works With" now documents the ready-to-use exports and regen command.
- CHANGELOG + SKILL-AUTHORING-STANDARD note the generated artifacts.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016JWn5jRD5tcEFKrubjQ6Px
On push to main, rebuild web/skills.json from the SKILL.md files and publish
web/ to GitHub Pages, so the live site always reflects the current skill
library. Manual runs supported via workflow_dispatch.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>