fix: sync all skill updates and new skills into plugin bundles

- Synced 97 existing skill SKILL.md files from skills/ to their plugin bundle copies - Added 7 new skills to plugin bundles: - seo-content-brief, media-pitch -> pm-gtm - tax-planning-checklist -> pm-finance - change-management-plan -> pm-hr - sales-forecasting-model -> pm-sales - workshop-facilitation-guide -> pm-operations - teaching-lesson-plan -> pm-cross Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 21:00:00 +01:00
parent d7f6c2cd05
commit 513e1d3ce7
67 changed files with 1851 additions and 507 deletions
@@ -1,6 +1,6 @@
 ---
 name: ai-product-canvas
-description: Structures AI and ML product decisions including model selection, data requirements, evaluation frameworks, and responsible AI considerations. Use when building AI-powered features, evaluating LLM integrations, designing AI products, or assessing AI readiness. Triggers on "AI product", "LLM feature", "AI canvas", "build with AI", "AI integration", "AI-powered", "machine learning feature".
+description: "Structure AI and ML product decisions with the rigour of any product decision. Use when building AI-powered features, evaluating LLM integrations, designing AI products, or assessing AI readiness. Produces a complete AI product canvas covering problem definition, model approach, data requirements, evaluation framework, UX design, responsible AI checklist, and launch monitoring plan."
 ---

 # AI Product Canvas Skill
@@ -143,3 +143,19 @@ Before building, flag if any of these apply:
 - Responsible AI checklist must be completed before launch, not after
 - Include latency in success metrics — a 5-second AI response is often worse than no AI at all
 - Recommend starting with a human-in-the-loop design and automating only when accuracy is proven
+
+## Required Inputs
+
+Ask the user for these if not provided:
+- **Feature or product description** (what the AI is intended to do)
+- **User problem** (what problem the AI is solving for users)
+- **Available data** (what training/inference data exists)
+- **ML/AI lead** (who owns the technical implementation)
+
+## Quality Checks
+
+- [ ] "Why AI?" is answered clearly (not "because we can")
+- [ ] Minimum acceptable accuracy threshold is defined before build begins
+- [ ] Fallback UX is specified for model failures or low-confidence outputs
+- [ ] Responsible AI checklist is completed (not deferred to post-launch)
+- [ ] Monitoring plan includes both model performance and user engagement metrics
@@ -1,13 +1,20 @@
 ---
 name: design-handoff-brief
-description: Transform feature briefs into structured design briefs that give designers the context they need
-tool_integration: Figma, Notion
+description: "Transform feature briefs into structured design briefs that give designers the context they need before opening Figma. Use when asked to write a design brief, create a design handoff, brief a designer on a new feature, or translate a PRD into design requirements. Produces a brief with user goal, emotional context, success criteria, constraints, edge cases, and out-of-scope boundaries."
 ---
+
 # Design Handoff Brief Skill

-## Purpose
 Produce a design brief that sets designers up for success — grounding them in user context and constraints before they open Figma, not after they've gone in the wrong direction.

+## Required Inputs
+
+Ask the user for these if not provided:
+- **Feature brief or PRD** (even rough notes work)
+- **Designer's name or team** (for personalisation)
+- **Technical constraints** (any engineering limitations already known)
+- **Timeline** (when does design need to be done?)
+
 ## What Designers Actually Need (and PMs Often Skip)
 - The user's goal, not the feature name
 - The emotional state of the user at this moment in the journey
@@ -23,8 +30,9 @@ Produce a design brief that sets designers up for success — grounding them in
 4. List edge cases the design must handle
 5. Define success criteria the design should be evaluated against
 6. Write a "not in scope" section to prevent scope creep in design
+7. **Validate** — Confirm every edge case listed is specific enough to design for, and every out-of-scope item is concrete enough to say "no" to

-## Output Format
+## Output Structure

 ### Design Brief: [Feature Name]

@@ -57,3 +65,11 @@ Produce a design brief that sets designers up for success — grounding them in
 - User research: [link]
 - Existing patterns: [Figma component library link]
 - Competitor examples: [links if relevant]
+
+## Quality Checks
+
+- [ ] User goal is written in user language (not feature/product language)
+- [ ] At least one edge case covers an error or failure state
+- [ ] Success criteria are measurable or observable (not "looks good")
+- [ ] Out-of-scope section names at least one thing that might seem in scope but isn't
+- [ ] Technical constraints are specific enough for an engineer to confirm
@@ -1,55 +1,69 @@
 ---
 name: experiment-designer
-description: Designs A/B tests from hypotheses and interprets experiment results 
-with statistical rigour. Use when user says "run an experiment", "design an A/B 
-test", "test this feature", "interpret these results", "was this experiment 
-successful", or "what sample size do I need".
-metadata:
-  author: Mohit Aggarwal
-  version: 1.0.0
-  category: data-and-metrics
-  tags: [experimentation, data, analytics, ab-testing]
-  documentation: https://github.com/mohitagw15856/pm-claude-skills
+description: "Design statistically rigorous A/B tests and interpret experiment results. Use when asked to design an experiment, run an A/B test, calculate sample size, interpret test results, or assess whether an experiment was successful. Produces a complete experiment design with hypothesis, sample size, run time, success criteria, and risk flags — or a results interpretation with ship/iterate/kill recommendation."
 ---
+
 # Experiment Designer Skill

-## Purpose
-Produce rigorous experiment designs from product hypotheses, and interpret 
-results with statistical and practical significance — so you can defend every 
-decision to a sceptical engineering lead or data scientist.
+Produce rigorous experiment designs from product hypotheses, and interpret results with statistical and practical significance — so you can defend every decision to a sceptical engineering lead or data scientist.
+
+## Required Inputs
+
+Ask the user for these if not provided:
+**For experiment design:**
+- Hypothesis (what change, what metric, what expected movement)
+- Current baseline metric value
+- Minimum detectable effect (MDE) — the smallest lift worth caring about
+- Available daily sample size
+
+**For results interpretation:**
+- Control and variant results (raw numbers or percentages)
+- P-value or confidence interval
+- Run duration (days)
+- Any anomalies observed during the test

 ## Two-Phase Process

 ### Phase 1: Experiment Design
-**Required inputs:** hypothesis, primary metric, current baseline, minimum 
-detectable effect (MDE), available sample size per day.
-
-**Output:**
- Hypothesis restated as: "If we [change], we expect [metric] to [move by X%] 
-  because [reason]"
- Control and variant definitions
- Primary metric (one only)
- Secondary guardrail metrics (2-3 max)
- Required sample size (calculated from MDE and baseline)
- Estimated run time in days
- Pre-defined success criteria (before the test runs — no moving goalposts)
- Design risk flags: novelty effects, seasonal confounds, multiple testing issues,
-  network effects, sample ratio mismatch risks
+1. Restate hypothesis as: "If we [change], we expect [metric] to [move by X%] because [reason]"
+2. Define control and variant clearly
+3. Select primary metric (one only) and secondary guardrail metrics (2-3 max)
+4. Calculate required sample size from MDE and baseline
+5. Estimate run time in days
+6. Set pre-defined success criteria before the test runs — no moving goalposts
+7. Flag design risks: novelty effects, seasonal confounds, multiple testing issues, network effects, sample ratio mismatch

 ### Phase 2: Results Interpretation
-**Required inputs:** control results, variant results, p-value or raw numbers, 
-run duration, any anomalies observed.
+1. Assess statistical significance (p < 0.05 threshold)
+2. Assess practical significance: was the lift meaningful for the business, not just real?
+3. Interpret confidence intervals
+4. Investigate confounding factors
+5. Recommend: Ship / Iterate / Kill / Run follow-up test
+6. **Validate** — Confirm the test ran for the full planned duration. Flag if it was stopped early (peeking problem). Confirm sample ratio mismatch did not occur.

-**Output:**
- Statistical significance assessment (p < 0.05 threshold)
- Practical significance: was the lift meaningful for the business, not just real?
- Confidence interval interpretation
- Confounding factors to investigate
- Recommendation: Ship / Iterate / Kill / Run follow-up test
- If "Iterate": specific hypotheses to test next
+## Output Structure
+
+**[Design or Results header based on phase]**
+
+*Hypothesis:* "If we [change], we expect [metric] to [move by X%] because [reason]"
+
+*Primary metric:* [One metric only]
+*Guardrail metrics:* [2-3 max]
+*Required sample size:* [n per variant]
+*Estimated run time:* [days]
+*Pre-defined success threshold:* [specific number]
+*Design risk flags:* [any concerns]
+
+**Results (Phase 2 only):**
+*Statistical significance:* [p-value and conclusion]
+*Practical significance:* [lift size vs. business threshold]
+*Recommendation:* Ship / Iterate / Kill / Follow-up — [rationale]

 ## Quality Checks
- Never interpret results from an underpowered test without flagging it
- Always distinguish statistical from practical significance
- Flag if test was stopped early (peeking problem)
- Note if sample ratio mismatch occurred
+
+- [ ] Hypothesis specifies the change, the metric, the direction, and the reason
+- [ ] Primary metric is singular — guardrail metrics are secondary
+- [ ] Success criteria are defined before the test launches (not after seeing results)
+- [ ] Test was not stopped early (or flagged clearly if it was)
+- [ ] Practical significance assessed separately from statistical significance
+- [ ] Sample ratio mismatch is checked in results interpretation
@@ -1,62 +1,62 @@
 ---
 name: multi-source-signal-synthesiser
-description: Synthesises user signals from multiple research sources into a 
-unified insight brief, reconciling conflicting feedback. Use when user has data 
-from multiple sources, needs to "make sense of all this user data", "what are 
-users really telling us", "synthesise our research", or has conflicting feedback 
-from different channels.
-metadata:
-  author: Mohit Aggarwal
-  version: 1.0.0
-  category: discovery
-  tags: [user-research, synthesis, discovery, insights]
-  documentation: https://github.com/mohitagw15856/pm-claude-skills
+description: "Synthesise user signals from multiple research sources into a unified insight brief, reconciling conflicting feedback. Use when asked to make sense of data from multiple sources, synthesise user research, reconcile conflicting feedback, or when the user says 'what are users really telling us' or 'make sense of all this user data'. Produces ranked insights with confidence ratings, divergent signal analysis, and research gap identification."
 ---
+
 # Multi-Source Signal Synthesiser Skill

-## Purpose
-Reconcile user signals from multiple sources — interviews, support tickets, NPS, 
-app reviews, sales calls — into a unified, weighted insight brief that surfaces 
-the underlying need rather than the surface-level request.
+Reconcile user signals from multiple sources — interviews, support tickets, NPS, app reviews, sales calls — into a unified, weighted insight brief that surfaces the underlying need rather than the surface-level request.

-## Source Weighting (default — adapt to your context)
- Direct research (interviews, usability tests): weight 5
- Support tickets (unprompted pain signals): weight 4
- NPS verbatims: weight 3
- App store reviews: weight 2
- Sales call summaries (filtered through sales lens): weight 2
- Anecdote or single report: weight 1
+## Required Inputs
+
+Ask the user for these if not provided:
+- **Signal sources** (interviews, support tickets, NPS verbatims, app reviews, sales calls, analytics — any combination)
+- **Time period** covered by the data
+- **Product area or feature** the signals relate to (if scoped)
+
+## Source Weighting (default — adapt to context)
+
+| Source | Weight | Rationale |
+|--------|--------|-----------|
+| Direct research (interviews, usability tests) | 5 | Highest-fidelity, structured |
+| Support tickets (unprompted pain signals) | 4 | Real pain, unfiltered |
+| NPS verbatims | 3 | Broad but shallow |
+| App store reviews | 2 | Public, self-selected |
+| Sales call summaries | 2 | Filtered through sales lens |
+| Anecdote or single report | 1 | Low confidence alone |

 ## Process
-1. Accept inputs from any combination of the source types above
-2. Tag each signal by source and apply weight
-3. Look for CONVERGENCE: same underlying need appearing across 3+ sources
-4. Look for DIVERGENCE: contradictory signals suggesting user segmentation
-5. Distinguish surface request from underlying need
-   (e.g. "faster export" may mean "I don't trust the data will be there when 
-   I need it")
-6. Produce ranked insights by weighted frequency
+1. Tag each signal by source and apply weight
+2. Look for **convergence**: same underlying need appearing across 3+ sources
+3. Look for **divergence**: contradictory signals suggesting user segmentation
+4. Distinguish surface request from underlying need (e.g. "faster export" may mean "I don't trust the data will be there when I need it")
+5. Produce ranked insights by weighted frequency
+6. **Validate** — Confirm each insight has evidence from at least 2 source types. Flag any insight resting on a single source as low-confidence.

-## Output Format
+## Output Structure

 ### User Signal Synthesis — [Date / Period]
-**Sources included:** [list]
+**Sources included:** [list with count per source]
 **Total signals processed:** [n]

 #### Insight 1: [Underlying need, not feature request]
 - **Confidence:** High / Medium / Low (based on source diversity and weight)
 - **Evidence:** [Signals from each source supporting this]
 - **Conflicting signals:** [Any contradicting evidence and how to interpret it]
- **Product implication:** [Specific, not generic]
+- **Product implication:** [Specific next step, not generic]

 [Repeat for top 3-5 insights]

 #### Divergent Signals (Possible Segmentation)
-[Where user groups appear to have genuinely different needs]
+[Where user groups appear to have genuinely different needs — specify which segments]

 #### What the Data Does NOT Tell Us
 [Gaps that require further research before acting]

-## OpenClaw Configuration
-Connect to: Notion (research docs), support inbox, NPS tool, app review feed.
-Schedule: weekly synthesis run, diff output showing new signals only.
+## Quality Checks
+
+- [ ] Every insight references at least 2 distinct source types
+- [ ] Surface requests are translated to underlying needs (not just echoed)
+- [ ] Divergent signals identify the specific user segments, not just "some users disagree"
+- [ ] Confidence ratings are consistent with source diversity and weighting
+- [ ] "What the data does NOT tell us" section is honest about gaps