83 lines
3.5 KiB
Markdown
83 lines
3.5 KiB
Markdown
---
|
||
name: ab-test-analysis
|
||
description: "Analyze A/B test results with statistical significance, sample size validation, confidence intervals, and ship/extend/stop recommendations. Use when evaluating experiment results, checking if a test reached significance, interpreting split test data, or deciding whether to ship a variant."
|
||
---
|
||
|
||
## A/B Test Analysis
|
||
|
||
Evaluate A/B test results with statistical rigor and translate findings into clear product decisions.
|
||
|
||
### Context
|
||
|
||
You are analyzing A/B test results for **$ARGUMENTS**.
|
||
|
||
If the user provides data files (CSV, Excel, or analytics exports), read and analyze them directly. Generate Python scripts for statistical calculations when needed.
|
||
|
||
### Instructions
|
||
|
||
1. **Understand the experiment**:
|
||
- What was the hypothesis?
|
||
- What was changed (the variant)?
|
||
- What is the primary metric? Any guardrail metrics?
|
||
- How long did the test run?
|
||
- What is the traffic split?
|
||
|
||
2. **Validate the test setup**:
|
||
- **Sample size**: Is the sample large enough for the expected effect size?
|
||
- Use the formula: n = (Z²α/2 × 2 × p × (1-p)) / MDE²
|
||
- Flag if the test is underpowered (<80% power)
|
||
- **Duration**: Did the test run for at least 1-2 full business cycles?
|
||
- **Randomization**: Any evidence of sample ratio mismatch (SRM)?
|
||
- **Novelty/primacy effects**: Was there enough time to wash out initial behavior changes?
|
||
|
||
3. **Calculate statistical significance**:
|
||
- **Conversion rate** for control and variant
|
||
- **Relative lift**: (variant - control) / control × 100
|
||
- **p-value**: Using a two-tailed z-test or chi-squared test
|
||
- **Confidence interval**: 95% CI for the difference
|
||
- **Statistical significance**: Is p < 0.05?
|
||
- **Practical significance**: Is the lift meaningful for the business?
|
||
|
||
If the user provides raw data, generate and run a Python script to calculate these.
|
||
|
||
4. **Check guardrail metrics**:
|
||
- Did any guardrail metrics (revenue, engagement, page load time) degrade?
|
||
- A winning primary metric with degraded guardrails may not be a true win
|
||
|
||
5. **Interpret results**:
|
||
|
||
| Outcome | Recommendation |
|
||
|---|---|
|
||
| Significant positive lift, no guardrail issues | **Ship it** — roll out to 100% |
|
||
| Significant positive lift, guardrail concerns | **Investigate** — understand trade-offs before shipping |
|
||
| Not significant, positive trend | **Extend the test** — need more data or larger effect |
|
||
| Not significant, flat | **Stop the test** — no meaningful difference detected |
|
||
| Significant negative lift | **Don't ship** — revert to control, analyze why |
|
||
|
||
6. **Provide the analysis summary**:
|
||
```
|
||
## A/B Test Results: [Test Name]
|
||
|
||
**Hypothesis**: [What we expected]
|
||
**Duration**: [X days] | **Sample**: [N control / M variant]
|
||
|
||
| Metric | Control | Variant | Lift | p-value | Significant? |
|
||
|---|---|---|---|---|---|
|
||
| [Primary] | X% | Y% | +Z% | 0.0X | Yes/No |
|
||
| [Guardrail] | ... | ... | ... | ... | ... |
|
||
|
||
**Recommendation**: [Ship / Extend / Stop / Investigate]
|
||
**Reasoning**: [Why]
|
||
**Next steps**: [What to do]
|
||
```
|
||
|
||
Think step by step. Save as markdown. Generate Python scripts for calculations if raw data is provided.
|
||
|
||
---
|
||
|
||
### Further Reading
|
||
|
||
- [A/B Testing 101 + Examples](https://www.productcompass.pm/p/ab-testing-101-for-pms)
|
||
- [Testing Product Ideas: The Ultimate Validation Experiments Library](https://www.productcompass.pm/p/the-ultimate-experiments-library)
|
||
- [Are You Tracking the Right Metrics?](https://www.productcompass.pm/p/are-you-tracking-the-right-metrics)
|