Files

T

mohitagw15856 f3b9d008fe feat: 100 skills milestone — 7 new skills + quality improvements across all 93

New skills added:
- teaching-lesson-plan: structured lesson plans for any subject/audience/setting
- seo-content-brief: complete SEO briefs with intent, competitor gaps, and outline
- media-pitch: story-first journalist pitches with angle development framework
- change-management-plan: stakeholder analysis, comms strategy, adoption metrics
- workshop-facilitation-guide: activity instructions, decision protocols, facilitator moves
- sales-forecasting-model: pipeline model, scenario analysis, assumption log
- tax-planning-checklist: year-end tax planning across income, pension, CGT, reliefs

Quality improvements across all 93 existing skills:
- Standardised description format: "Verb the thing. Use when X. Produces Y."
- Added Required Inputs section to all skills missing it (prompts for missing info)
- Added Quality Checks section to all skills missing it (specific, not generic)
- Fixed broken multiline YAML descriptions
- Removed non-standard frontmatter keys (tool_integration, metadata blocks)

README updated to v6.0.0 with 100-skill count, new skill tables, and article series

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-20 20:52:31 +01:00

3.2 KiB

Raw Blame History

name, description

name	description
experiment-designer	Design statistically rigorous A/B tests and interpret experiment results. Use when asked to design an experiment, run an A/B test, calculate sample size, interpret test results, or assess whether an experiment was successful. Produces a complete experiment design with hypothesis, sample size, run time, success criteria, and risk flags — or a results interpretation with ship/iterate/kill recommendation.

name

description

experiment-designer

Design statistically rigorous A/B tests and interpret experiment results. Use when asked to design an experiment, run an A/B test, calculate sample size, interpret test results, or assess whether an experiment was successful. Produces a complete experiment design with hypothesis, sample size, run time, success criteria, and risk flags — or a results interpretation with ship/iterate/kill recommendation.

Experiment Designer Skill

Produce rigorous experiment designs from product hypotheses, and interpret results with statistical and practical significance — so you can defend every decision to a sceptical engineering lead or data scientist.

Required Inputs

Ask the user for these if not provided: For experiment design:

Hypothesis (what change, what metric, what expected movement)
Current baseline metric value
Minimum detectable effect (MDE) — the smallest lift worth caring about
Available daily sample size

For results interpretation:

Control and variant results (raw numbers or percentages)
P-value or confidence interval
Run duration (days)
Any anomalies observed during the test

Two-Phase Process

Phase 1: Experiment Design

Restate hypothesis as: "If we [change], we expect [metric] to [move by X%] because [reason]"
Define control and variant clearly
Select primary metric (one only) and secondary guardrail metrics (2-3 max)
Calculate required sample size from MDE and baseline
Estimate run time in days
Set pre-defined success criteria before the test runs — no moving goalposts
Flag design risks: novelty effects, seasonal confounds, multiple testing issues, network effects, sample ratio mismatch

Phase 2: Results Interpretation

Assess statistical significance (p < 0.05 threshold)
Assess practical significance: was the lift meaningful for the business, not just real?
Interpret confidence intervals
Investigate confounding factors
Recommend: Ship / Iterate / Kill / Run follow-up test
Validate — Confirm the test ran for the full planned duration. Flag if it was stopped early (peeking problem). Confirm sample ratio mismatch did not occur.

Output Structure

[Design or Results header based on phase]

Hypothesis: "If we [change], we expect [metric] to [move by X%] because [reason]"

Primary metric: [One metric only] Guardrail metrics: [2-3 max] Required sample size: [n per variant] Estimated run time: [days] Pre-defined success threshold: [specific number] Design risk flags: [any concerns]

Results (Phase 2 only): Statistical significance: [p-value and conclusion] Practical significance: [lift size vs. business threshold] Recommendation: Ship / Iterate / Kill / Follow-up — [rationale]

Quality Checks

Hypothesis specifies the change, the metric, the direction, and the reason
Primary metric is singular — guardrail metrics are secondary
Success criteria are defined before the test launches (not after seeing results)
Test was not stopped early (or flagged clearly if it was)
Practical significance assessed separately from statistical significance
Sample ratio mismatch is checked in results interpretation

3.2 KiB Raw Blame History