Free tool · Sample size
A/B Test Sample Size Calculator
A test is only trustworthy once it has collected enough visitors to tell a real effect apart from random noise. Enter your baseline conversion rate and the smallest lift worth catching, and this calculator returns the sample size you need in each variant.
Most A/B tests are called — long before they should be
The moment one variant edges ahead, the temptation is to ship it. But early leads are usually noise, and a test read before it has the numbers is a coin flip dressed up as a decision.
Peeking inflates false positives
Checking the dashboard daily and stopping the first time you see significance is the most common way to ship a losing variant. Every extra look raises the chance of a false win. Fix the sample size up front and only read the result once it is reached.
Low traffic, low power
On a page with modest traffic, a test that looks flat may simply be too small to detect the effect you care about. Underpowered tests miss real wins and leave you concluding "no difference" when there was one.
The effect size you choose drives everything
Detecting a tiny lift needs vastly more traffic than detecting a large one. Decide the smallest improvement that would actually change your decision, then size the test for that — not for a lift you have no realistic chance of catching.
A/B test sample size
Find how many visitors per variant you need for a statistically valid test.
Your current conversion rate for the control — the page or flow you’re testing against.
The smallest relative lift worth detecting. A 10% MDE on a 3% baseline means catching a move to 3.3%.
How sure you want to be that a winner is real, not noise. 95% is the standard for most marketing tests.
The chance of detecting a real effect when one exists. 80% is conventional; raise it to miss fewer true wins.
Total visitors per day entering the test, across all variants. Add it to estimate how long the test will run.
Sample size per variant
53,209
visitors needed in each group
Total sample
106,418
control + variant combined
This is the number of visitors each of the control and variant must collect before you can read the result at 95% confidence and 80% power.
Two-sided test. n per group = (Zα + Zβ)² × [p₁(1−p₁) + p₂(1−p₂)] ÷ (p₂ − p₁)².
What is sample size?
Sample size is the number of visitors each variant in an A/B test needs before the result is statistically reliable. It is set in advance from four inputs: your baseline conversion rate, the minimum effect you want to detect, your confidence level, and your statistical power.
Calculating it up front turns testing from a guessing game into a plan. You know before launch how many visitors — and roughly how many days — the test will take, and you commit to reading the result only once that target is hit, which is what keeps the statistics honest.
Sample size per variant
n = (Zα + Zβ)² × [p₁(1−p₁) + p₂(1−p₂)] ÷ (p₂ − p₁)²
Baseline p₁ = 3% (0.03), a 10% relative lift gives variant p₂ = 0.033. At 95% confidence (Zα = 1.96) and 80% power (Zβ = 0.8416): n = (1.96 + 0.8416)² × [0.03×0.97 + 0.033×0.967] ÷ (0.003)² ≈ 53,209 visitors per variant.
Test duration
Days = (n × 2) ÷ Daily traffic
At ~53,209 per variant, the total is ~106,418 visitors. With 5,000 visitors per day entering the test → 106,418 ÷ 5,000 ≈ 22 days.
How to use this sample size calculator
Decide the effect you care about first — the rest follows.
Enter your baseline conversion rate
Use the current rate of the control — the version you are testing against. Pull it from your analytics over a representative recent window, not a single unusual day.
Set the minimum detectable effect
This is the smallest relative lift that would actually change your decision. Be honest: a 1% MDE needs enormous traffic, while a 10–20% MDE is detectable on most pages. Choose the threshold worth the test.
Lock in confidence and power, then commit
Leave confidence at 95% and power at 80% unless you have a reason to change them. Add your daily traffic to see the duration, then run the test to the full sample size before reading the result.
What is the minimum detectable effect (MDE)?
The MDE is the smallest difference between variants that your test is designed to catch, expressed as a relative lift over the baseline. A 10% MDE on a 3% baseline means the test is sized to reliably detect a move to 3.3% or better. Smaller MDEs require dramatically larger samples, so set it to the smallest improvement that would genuinely change what you do.
Why can’t I just stop the test once it shows significance?
Because conversion data fluctuates, a test will frequently cross the significance line by chance early on and then drift back. Stopping the first time you see a winner — known as peeking — multiplies your false-positive rate well beyond the 5% you think you are running at. Fixing the sample size in advance and reading the result only once is what makes the p-value mean what it claims.
What do confidence level and statistical power actually control?
Confidence level (typically 95%) controls how often you wrongly call a winner when there is no real difference — a false positive. Power (typically 80%) controls how often you detect a real effect when one genuinely exists — avoiding a false negative. Raising either one increases the sample size you need, because you are demanding stronger evidence.
What if my required sample size is bigger than my traffic?
You have three levers. Increase the MDE so you are only trying to detect larger, more achievable lifts; relax power or confidence slightly if the decision is low-stakes; or run the test longer to accumulate the visitors. If none of those are realistic, the honest conclusion is that this page does not have enough traffic to A/B test reliably, and you should test higher up the funnel or on a higher-traffic surface.
One metric is a number — Multiply connects them all
When every campaign metric, brief, and account signal lives in one AI operating system, you stop calculating in spreadsheets and start acting on the full picture. That's Multiply.