feat: compare-mode demo GIF, expanded eval cases, sample-generation workflow
- Add compare-mode demo GIF + its Playwright recorder; embed in README eval section - Expand evals/cases.json (6 → 15 flagship skills) so more skills can be eval-scored and sample-generated - Add --generate-missing mode to build-samples.mjs - Add generate-samples.yml: workflow_dispatch job that generates real sample outputs via the ANTHROPIC_API_KEY secret (key never leaves GitHub) and commits Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -121,6 +121,10 @@ The flagship skills score consistently high (out of 5):
|
||||
|
||||
These scores show up as badges in the [Playground](https://mohitagw15856.github.io/pm-claude-skills/) and the [🏆 leaderboard](https://mohitagw15856.github.io/pm-claude-skills/leaderboard.html). Coverage is expanding — run it yourself with `node evals/run-evals.mjs` (needs an API key). *Honest note: 6 skills are eval-scored today; the rest are reviewed against the [authoring standard](SKILL-AUTHORING-STANDARD.md) but not yet auto-scored.*
|
||||
|
||||
**See the difference for yourself.** The Playground's *Compare* toggle runs the same inputs with and without the skill, side by side — structured, shippable output on the left; generic mush on the right:
|
||||
|
||||
[](https://mohitagw15856.github.io/pm-claude-skills/)
|
||||
|
||||
---
|
||||
|
||||
## Contents
|
||||
|
||||
Reference in New Issue
Block a user