feat: compare-mode demo GIF, expanded eval cases, sample-generation workflow

- Add compare-mode demo GIF + its Playwright recorder; embed in README eval section
- Expand evals/cases.json (6 → 15 flagship skills) so more skills can be
  eval-scored and sample-generated
- Add --generate-missing mode to build-samples.mjs
- Add generate-samples.yml: workflow_dispatch job that generates real sample
  outputs via the ANTHROPIC_API_KEY secret (key never leaves GitHub) and commits

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Mohit
2026-06-19 10:05:17 +01:00
parent 3f9c319b79
commit 7b02261a3c
6 changed files with 189 additions and 4 deletions
+4
View File
@@ -121,6 +121,10 @@ The flagship skills score consistently high (out of 5):
These scores show up as badges in the [Playground](https://mohitagw15856.github.io/pm-claude-skills/) and the [🏆 leaderboard](https://mohitagw15856.github.io/pm-claude-skills/leaderboard.html). Coverage is expanding — run it yourself with `node evals/run-evals.mjs` (needs an API key). *Honest note: 6 skills are eval-scored today; the rest are reviewed against the [authoring standard](SKILL-AUTHORING-STANDARD.md) but not yet auto-scored.*
**See the difference for yourself.** The Playground's *Compare* toggle runs the same inputs with and without the skill, side by side — structured, shippable output on the left; generic mush on the right:
[![Compare mode — the same prompt with and without the skill, side by side](web/docs-assets/compare-demo.gif)](https://mohitagw15856.github.io/pm-claude-skills/)
---
## Contents