05b6d799f0
Three more learnings from alirezarezvani/claude-skills, applied: 1. SkillCheck validator (scripts/skillcheck.mjs) — validates every SKILL.md against the authoring standard (frontmatter, name/folder match, trigger + produces clauses, required headings) plus tier referential integrity. Errors fail CI; --strict fails on warnings too. New skillcheck.yml workflow and a SkillCheck status badge in the README. Current: 0 errors / 14 advisory warnings across 172 skills. 2. Cursor export platform — build-exports.mjs now generates exports/cursor/<bundle>/<skill>/<skill>.mdc rule files. The PLATFORMS registry now supports per-skill filenames (file as a function). 3. Per-agent installers — scripts/install.sh unifies install for claude/hermes/codex/openclaw/cursor (--link, --target, --dry-run, --list). Curl-able one-liners codex-install.sh, openclaw-install.sh, and cursor-install.sh clone the library and install in a single command. README documents the one-line installs and Cursor exports; CHANGELOG and the authoring standard updated. Claude-Session: https://claude.ai/code/session_016JWn5jRD5tcEFKrubjQ6Px Co-authored-by: Claude <noreply@anthropic.com>
171 lines
6.8 KiB
Plaintext
171 lines
6.8 KiB
Plaintext
---
|
|
description: "Structure AI and ML product decisions with the rigour of any product decision. Use when building AI-powered features, evaluating LLM integrations, designing AI products, or assessing AI readiness. Produces a complete AI product canvas covering problem definition, model approach, data requirements, evaluation framework, UX design, responsible AI checklist, and launch monitoring plan."
|
|
globs:
|
|
alwaysApply: false
|
|
---
|
|
|
|
# AI Product Canvas Skill
|
|
|
|
Define AI products with the same rigour as any product decision — but with additional layers for data, model, evaluation, and responsible AI. This canvas prevents the most common AI product failure: building a technically impressive feature that doesn't solve a real problem.
|
|
|
|
## AI Product Anti-Patterns to Check First
|
|
|
|
Before building, flag if any of these apply:
|
|
- ❌ "We should add AI to [existing feature]" — with no user problem defined
|
|
- ❌ Accuracy target undefined before build begins
|
|
- ❌ No plan for what happens when the model is wrong
|
|
- ❌ User-facing AI output with no human review or fallback
|
|
- ❌ Training data not audited for bias or quality
|
|
- ❌ No evaluation metric — "we'll know it when we see it"
|
|
|
|
---
|
|
|
|
## AI Product Canvas Output Format
|
|
|
|
### AI Product Canvas — [Feature Name] — [Date]
|
|
|
|
**PM Owner:** [Name]
|
|
**ML/AI Lead:** [Name]
|
|
**Status:** Discovery / Design / Build / Evaluation / Live
|
|
|
|
---
|
|
|
|
#### 1. Problem Definition
|
|
**User problem being solved:**
|
|
> [What specific situation is the user in? What job are they trying to get done?]
|
|
|
|
**Why AI?**
|
|
> [What makes this problem require AI vs a deterministic solution? If the answer is "because we can," stop here.]
|
|
|
|
**Success for the user looks like:**
|
|
> [What outcome does the user experience when the AI feature is working well?]
|
|
|
|
---
|
|
|
|
#### 2. AI Approach
|
|
|
|
**Task type:**
|
|
- [ ] Classification
|
|
- [ ] Generation (text, image, code)
|
|
- [ ] Summarisation / extraction
|
|
- [ ] Recommendation
|
|
- [ ] Search / retrieval
|
|
- [ ] Prediction / forecasting
|
|
- [ ] Conversation / agent
|
|
|
|
**Model approach:**
|
|
- [ ] LLM API (GPT-4, Claude, Gemini, etc.) — specify: [Model name + version]
|
|
- [ ] Fine-tuned model on own data
|
|
- [ ] Custom model trained from scratch
|
|
- [ ] RAG (retrieval-augmented generation)
|
|
- [ ] Embedding + vector search
|
|
|
|
**Rationale for chosen approach:** [Why this, not alternatives]
|
|
|
|
---
|
|
|
|
#### 3. Data Requirements
|
|
|
|
| Data Type | Source | Volume | Quality Status | Bias Risk |
|
|
|---|---|---|---|---|
|
|
| [Training data] | [Where it comes from] | [Volume] | [Audit status] | H/M/L |
|
|
| [Evaluation data] | [Where it comes from] | [Volume] | [Audit status] | H/M/L |
|
|
|
|
**Data gaps:** [What's missing and plan to get it]
|
|
**Privacy considerations:** [Any PII in training or inference data]
|
|
**Data ownership:** [Do we own this data? Can we use it for training?]
|
|
|
|
---
|
|
|
|
#### 4. Evaluation Framework
|
|
|
|
**Primary metric:** [The number that defines success — accuracy, F1, BLEU, user rating, task completion rate]
|
|
**Minimum acceptable threshold:** [Below X, the feature does not ship]
|
|
**Human evaluation plan:** [How will humans review model outputs? Sampling rate? Review panel?]
|
|
|
|
| Evaluation Type | Method | Cadence | Owner |
|
|
|---|---|---|---|
|
|
| Offline (pre-launch) | [Test set, benchmark] | Pre-launch | ML Lead |
|
|
| Online (post-launch) | [A/B test, user feedback] | Weekly | PM + ML |
|
|
| Adversarial | [Red-team, edge cases] | Pre-launch | Safety reviewer |
|
|
|
|
---
|
|
|
|
#### 5. User Experience Design
|
|
|
|
**How is AI output presented?**
|
|
- [ ] Direct output shown to user (high trust required)
|
|
- [ ] AI-assisted with user confirmation
|
|
- [ ] Suggestion user can accept/reject
|
|
- [ ] Background action with audit log
|
|
|
|
**Confidence and uncertainty handling:**
|
|
- What happens when confidence is low? [Show alternative, ask for clarification, fallback to manual]
|
|
- How is uncertainty communicated to the user? [UI pattern]
|
|
|
|
**Fallback plan:**
|
|
- If the model fails or returns an error: [Specific fallback behaviour]
|
|
- If accuracy degrades below threshold: [Kill switch or graceful degradation plan]
|
|
|
|
---
|
|
|
|
#### 6. Responsible AI Checklist
|
|
|
|
- [ ] Bias audit completed on training data
|
|
- [ ] Demographic fairness evaluated (does performance differ by user group?)
|
|
- [ ] Hallucination / confabulation risk assessed and mitigated
|
|
- [ ] User can see and correct AI output
|
|
- [ ] Opt-out mechanism exists (can user disable the AI feature?)
|
|
- [ ] Output provenance visible when relevant (does user know AI generated this?)
|
|
- [ ] PII not used in ways user didn't consent to
|
|
- [ ] Regulatory review completed (GDPR, AI Act, sector-specific)
|
|
- [ ] Model cards / documentation completed
|
|
|
|
---
|
|
|
|
#### 7. Launch & Monitoring Plan
|
|
|
|
**Rollout:** [% of users, with staged expansion criteria]
|
|
**Monitoring metrics:**
|
|
- Model performance: [Metric + alert threshold]
|
|
- User engagement with AI output: [Acceptance rate, override rate, feedback score]
|
|
- Error rate: [% of failed inferences]
|
|
- Latency: [P95 target]
|
|
|
|
**Model refresh cadence:** [How often is the model retrained or updated?]
|
|
**Drift detection:** [How will you know when model performance degrades in production?]
|
|
|
|
---
|
|
|
|
## Guidelines
|
|
|
|
- Never skip the "Why AI?" section — it's the most important question in AI product development
|
|
- The fallback UX is not optional — what happens when AI fails defines your product's trustworthiness
|
|
- Responsible AI checklist must be completed before launch, not after
|
|
- Include latency in success metrics — a 5-second AI response is often worse than no AI at all
|
|
- Recommend starting with a human-in-the-loop design and automating only when accuracy is proven
|
|
|
|
## Required Inputs
|
|
|
|
Ask the user for these if not provided:
|
|
- **Feature or product description** (what the AI is intended to do)
|
|
- **User problem** (what problem the AI is solving for users)
|
|
- **Available data** (what training/inference data exists)
|
|
- **ML/AI lead** (who owns the technical implementation)
|
|
|
|
## Anti-Patterns
|
|
|
|
- [ ] Do not skip the "Why AI?" question — if the answer is "we want to use AI," stop and reframe around the user problem first
|
|
- [ ] Do not launch with an undefined accuracy threshold — "good enough" is not a threshold; set a number before build begins
|
|
- [ ] Do not design the UX to hide AI-generated output as if it were system truth — users need to know when AI is involved so they can override it
|
|
- [ ] Do not defer the Responsible AI checklist to post-launch — bias and privacy issues are far harder to fix in production than in design
|
|
- [ ] Do not treat model latency as a post-launch optimisation — a 6-second AI response that replaces a 1-second rule-based response is a regression, not a feature
|
|
|
|
## Quality Checks
|
|
|
|
- [ ] "Why AI?" is answered clearly (not "because we can")
|
|
- [ ] Minimum acceptable accuracy threshold is defined before build begins
|
|
- [ ] Fallback UX is specified for model failures or low-confidence outputs
|
|
- [ ] Responsible AI checklist is completed (not deferred to post-launch)
|
|
- [ ] Monitoring plan includes both model performance and user engagement metrics
|