A/B Testing Calculator | Sample Size, Duration, and Significance
Know how many visitors you need before you start, then check statistical significance when you're done. Plan your test duration, avoid stopping early, and know when you have a real winner.
Use the "Sample Size" tab to figure out how many visitors you need and how long to run. Enter your current conversion rate, the improvement you want to detect, and your daily traffic.
Once you start, commit to running for the calculated number of days. Don't peek at results early or stop when it "looks like a winner."
Use the "Significance" tab to interpret what happened. Enter your final numbers to see if you have a real winner or just noise.
If you have a winner, implement the change. If inconclusive, either run longer or accept that the change probably doesn't make a meaningful difference.
Calculate Sample Size for Your A/B Test
Your Test Requirements
- Statistical power of 80% (if a real improvement exists, this test has an 80% chance of catching it)
- Your daily traffic stays roughly consistent during the test
- Traffic is split evenly between versions (50/50 for A/B)
Check Statistical Significance of Your Results
Results
The probability of seeing results this extreme if there was no real difference between versions.
Lower is better. Below 0.05 = statistically significant at 95% confidence.
| Version | Visitors | Conversions | Rate |
|---|---|---|---|
| A - Control | - | - | - |
| B - Variant | - | - | - |
Need help with your testing strategy?
Knowing what to test is half the battle. If you need help identifying high-impact opportunities or building a testing roadmap, let's talk.
Get in TouchRelated Tools
Lead-to-Revenue Calculator
Work backward from your revenue goal to figure out how many leads you need at each stage of your funnel.
Try calculator →Campaign Breakeven Calculator
Know what you need to make back before you commit budget. Calculate your breakeven point, 2x, and 3x return targets.
Try calculator →Frequently Asked Questions
Common questions about A/B testing sample size, statistical significance, and test duration.
An A/B test compares two versions of a page, email, or ad to see which performs better. You split your traffic between version A (the original) and version B (the variant), then measure which one converts more visitors into customers, subscribers, or whatever action you're optimizing for.
Sample size depends on your baseline conversion rate, the minimum effect you want to detect, and your desired confidence level. The formula uses z-scores for your significance level and statistical power (typically 80%).
Most A/B tests need at least 1,000 visitors per variation to detect a 10-20% relative improvement. Use the calculator above to get your specific number.
Run your test long enough to reach statistical significance, typically at least 7 days to capture weekly traffic patterns. The exact duration depends on your traffic volume and the size of improvement you're trying to detect.
Use the Sample Size tab above to calculate your specific duration before you start.
Statistical significance tells you how confident you can be that your results aren't due to random chance. A 95% confidence level means there's only a 5% probability that the difference you're seeing happened by luck.
Most businesses use 95% as their threshold for making decisions.
MDE is the smallest improvement you want your test to be able to detect. It can be expressed as:
Relative: A percentage of your current rate (e.g., 10% relative on 3% baseline = looking for 3% → 3.3%)
Absolute: Percentage points added (e.g., 1pp absolute on 3% baseline = looking for 3% → 4%)
Smaller MDE requires more traffic and longer tests.
Early results are unreliable because random variation can make one version look like a winner when there's no real difference. If you check results daily and stop when it looks good, you're essentially cherry-picking a lucky moment.
This inflates your false positive rate from 5% to potentially 20-30%.
A p-value represents the probability of seeing results as extreme as yours if there was actually no difference between versions.
A p-value of 0.03 means there's a 3% chance the results are due to random variation. Lower p-values indicate stronger evidence that the difference is real. Below 0.05 is typically considered significant.