Files
pm-claude-skills/exports/aider/pm-advanced/ai-product-canvas/ai-product-canvas.md
T
mohitagw15856 036511ab3e Windsurf + Aider targets, MCP server, and demo placement (#33)
Broadens both reach (more tools) and content types (an MCP server), continuing
the multi-platform story.

Windsurf + Aider:
- build-exports.mjs gains two platforms: exports/windsurf/*.md (workspace rules,
  trigger: model_decision) and exports/aider/*.md (conventions for `aider --read`).
  Now 5 platforms (ChatGPT, Gemini, Cursor, Windsurf, Aider).
- install.sh + bin/cli.mjs install both (windsurf -> .windsurf/rules, aider ->
  .aider/skills with a --read hint); generated README index is excluded from copies.
- One-line windsurf-install.sh / aider-install.sh wrappers for parity.

MCP server (new content type):
- mcp/server.mjs — zero-dependency stdio MCP server exposing list_skills,
  search_skills, get_skill. Published as a second bin (pm-claude-skills-mcp).
  Logs to stderr; reads bundled skills/ at startup. mcp/README.md documents
  client config.

Also: README hero "See it in action" demo placement (ready to swap in a GIF;
recording guide in web/docs-assets/README.md), Works-With table + exports +
install docs updated, CHANGELOG Unreleased. package.json files/bin updated.


Claude-Session: https://claude.ai/code/session_016JWn5jRD5tcEFKrubjQ6Px

Co-authored-by: Claude <noreply@anthropic.com>
2026-06-17 23:15:38 +01:00

6.4 KiB

AI Product Canvas Skill

Define AI products with the same rigour as any product decision — but with additional layers for data, model, evaluation, and responsible AI. This canvas prevents the most common AI product failure: building a technically impressive feature that doesn't solve a real problem.

AI Product Anti-Patterns to Check First

Before building, flag if any of these apply:

  • "We should add AI to [existing feature]" — with no user problem defined
  • Accuracy target undefined before build begins
  • No plan for what happens when the model is wrong
  • User-facing AI output with no human review or fallback
  • Training data not audited for bias or quality
  • No evaluation metric — "we'll know it when we see it"

AI Product Canvas Output Format

AI Product Canvas — [Feature Name] — [Date]

PM Owner: [Name] ML/AI Lead: [Name] Status: Discovery / Design / Build / Evaluation / Live


1. Problem Definition

User problem being solved:

[What specific situation is the user in? What job are they trying to get done?]

Why AI?

[What makes this problem require AI vs a deterministic solution? If the answer is "because we can," stop here.]

Success for the user looks like:

[What outcome does the user experience when the AI feature is working well?]


2. AI Approach

Task type:

  • Classification
  • Generation (text, image, code)
  • Summarisation / extraction
  • Recommendation
  • Search / retrieval
  • Prediction / forecasting
  • Conversation / agent

Model approach:

  • LLM API (GPT-4, Claude, Gemini, etc.) — specify: [Model name + version]
  • Fine-tuned model on own data
  • Custom model trained from scratch
  • RAG (retrieval-augmented generation)
  • Embedding + vector search

Rationale for chosen approach: [Why this, not alternatives]


3. Data Requirements

Data Type Source Volume Quality Status Bias Risk
[Training data] [Where it comes from] [Volume] [Audit status] H/M/L
[Evaluation data] [Where it comes from] [Volume] [Audit status] H/M/L

Data gaps: [What's missing and plan to get it] Privacy considerations: [Any PII in training or inference data] Data ownership: [Do we own this data? Can we use it for training?]


4. Evaluation Framework

Primary metric: [The number that defines success — accuracy, F1, BLEU, user rating, task completion rate] Minimum acceptable threshold: [Below X, the feature does not ship] Human evaluation plan: [How will humans review model outputs? Sampling rate? Review panel?]

Evaluation Type Method Cadence Owner
Offline (pre-launch) [Test set, benchmark] Pre-launch ML Lead
Online (post-launch) [A/B test, user feedback] Weekly PM + ML
Adversarial [Red-team, edge cases] Pre-launch Safety reviewer

5. User Experience Design

How is AI output presented?

  • Direct output shown to user (high trust required)
  • AI-assisted with user confirmation
  • Suggestion user can accept/reject
  • Background action with audit log

Confidence and uncertainty handling:

  • What happens when confidence is low? [Show alternative, ask for clarification, fallback to manual]
  • How is uncertainty communicated to the user? [UI pattern]

Fallback plan:

  • If the model fails or returns an error: [Specific fallback behaviour]
  • If accuracy degrades below threshold: [Kill switch or graceful degradation plan]

6. Responsible AI Checklist

  • Bias audit completed on training data
  • Demographic fairness evaluated (does performance differ by user group?)
  • Hallucination / confabulation risk assessed and mitigated
  • User can see and correct AI output
  • Opt-out mechanism exists (can user disable the AI feature?)
  • Output provenance visible when relevant (does user know AI generated this?)
  • PII not used in ways user didn't consent to
  • Regulatory review completed (GDPR, AI Act, sector-specific)
  • Model cards / documentation completed

7. Launch & Monitoring Plan

Rollout: [% of users, with staged expansion criteria] Monitoring metrics:

  • Model performance: [Metric + alert threshold]
  • User engagement with AI output: [Acceptance rate, override rate, feedback score]
  • Error rate: [% of failed inferences]
  • Latency: [P95 target]

Model refresh cadence: [How often is the model retrained or updated?] Drift detection: [How will you know when model performance degrades in production?]


Guidelines

  • Never skip the "Why AI?" section — it's the most important question in AI product development
  • The fallback UX is not optional — what happens when AI fails defines your product's trustworthiness
  • Responsible AI checklist must be completed before launch, not after
  • Include latency in success metrics — a 5-second AI response is often worse than no AI at all
  • Recommend starting with a human-in-the-loop design and automating only when accuracy is proven

Required Inputs

Ask the user for these if not provided:

  • Feature or product description (what the AI is intended to do)
  • User problem (what problem the AI is solving for users)
  • Available data (what training/inference data exists)
  • ML/AI lead (who owns the technical implementation)

Anti-Patterns

  • Do not skip the "Why AI?" question — if the answer is "we want to use AI," stop and reframe around the user problem first
  • Do not launch with an undefined accuracy threshold — "good enough" is not a threshold; set a number before build begins
  • Do not design the UX to hide AI-generated output as if it were system truth — users need to know when AI is involved so they can override it
  • Do not defer the Responsible AI checklist to post-launch — bias and privacy issues are far harder to fix in production than in design
  • Do not treat model latency as a post-launch optimisation — a 6-second AI response that replaces a 1-second rule-based response is a regression, not a feature

Quality Checks

  • "Why AI?" is answered clearly (not "because we can")
  • Minimum acceptable accuracy threshold is defined before build begins
  • Fallback UX is specified for model failures or low-confidence outputs
  • Responsible AI checklist is completed (not deferred to post-launch)
  • Monitoring plan includes both model performance and user engagement metrics