Free tool · Sample size

A/B Test Sample Size Calculator

A test is only trustworthy once it has collected enough visitors to tell a real effect apart from random noise. Enter your baseline conversion rate and the smallest lift worth catching, and this calculator returns the sample size you need in each variant.

Jump to calculator Read the guide

The challenge

Most A/B tests are called — long before they should be

The moment one variant edges ahead, the temptation is to ship it. But early leads are usually noise, and a test read before it has the numbers is a coin flip dressed up as a decision.

Peeking inflates false positives
Checking the dashboard daily and stopping the first time you see significance is the most common way to ship a losing variant. Every extra look raises the chance of a false win. Fix the sample size up front and only read the result once it is reached.
Low traffic, low power
On a page with modest traffic, a test that looks flat may simply be too small to detect the effect you care about. Underpowered tests miss real wins and leave you concluding "no difference" when there was one.
The effect size you choose drives everything
Detecting a tiny lift needs vastly more traffic than detecting a large one. Decide the smallest improvement that would actually change your decision, then size the test for that — not for a lift you have no realistic chance of catching.

Free tool

A/B test sample size

Find how many visitors per variant you need for a statistically valid test.

Baseline conversion rate

Your current conversion rate for the control — the page or flow you’re testing against.

Minimum detectable effect

The smallest relative lift worth detecting. A 10% MDE on a 3% baseline means catching a move to 3.3%.

Confidence level

How sure you want to be that a winner is real, not noise. 95% is the standard for most marketing tests.

Statistical power

The chance of detecting a real effect when one exists. 80% is conventional; raise it to miss fewer true wins.

Daily traffic (optional)

Total visitors per day entering the test, across all variants. Add it to estimate how long the test will run.

Sample size per variant

53,209

visitors needed in each group

Total sample

106,418

control + variant combined

This is the number of visitors each of the control and variant must collect before you can read the result at 95% confidence and 80% power.

Two-sided test. n per group = (Zα + Zβ)² × [p₁(1−p₁) + p₂(1−p₂)] ÷ (p₂ − p₁)².

Definition

What is sample size?

Sample size is the number of visitors each variant in an A/B test needs before the result is statistically reliable. It is set in advance from four inputs: your baseline conversion rate, the minimum effect you want to detect, your confidence level, and your statistical power.

Calculating it up front turns testing from a guessing game into a plan. You know before launch how many visitors — and roughly how many days — the test will take, and you commit to reading the result only once that target is hit, which is what keeps the statistics honest.

Sample size per variant
n = (Zα + Zβ)² × [p₁(1−p₁) + p₂(1−p₂)] ÷ (p₂ − p₁)²
Baseline p₁ = 3% (0.03), a 10% relative lift gives variant p₂ = 0.033. At 95% confidence (Zα = 1.96) and 80% power (Zβ = 0.8416): n = (1.96 + 0.8416)² × [0.03×0.97 + 0.033×0.967] ÷ (0.003)² ≈ 53,209 visitors per variant.
Test duration
Days = (n × 2) ÷ Daily traffic
At ~53,209 per variant, the total is ~106,418 visitors. With 5,000 visitors per day entering the test → 106,418 ÷ 5,000 ≈ 22 days.

How to use it

How to use this sample size calculator

Decide the effect you care about first — the rest follows.

Enter your baseline conversion rate
Use the current rate of the control — the version you are testing against. Pull it from your analytics over a representative recent window, not a single unusual day.
Set the minimum detectable effect
This is the smallest relative lift that would actually change your decision. Be honest: a 1% MDE needs enormous traffic, while a 10–20% MDE is detectable on most pages. Choose the threshold worth the test.
Lock in confidence and power, then commit
Leave confidence at 95% and power at 80% unless you have a reason to change them. Add your daily traffic to see the duration, then run the test to the full sample size before reading the result.

FAQ

Sample size questions

Still stuck? Book a walkthrough and we’ll go through your numbers together.

What is the minimum detectable effect (MDE)?

The MDE is the smallest difference between variants that your test is designed to catch, expressed as a relative lift over the baseline. A 10% MDE on a 3% baseline means the test is sized to reliably detect a move to 3.3% or better. Smaller MDEs require dramatically larger samples, so set it to the smallest improvement that would genuinely change what you do.

Why can’t I just stop the test once it shows significance?

Because conversion data fluctuates, a test will frequently cross the significance line by chance early on and then drift back. Stopping the first time you see a winner — known as peeking — multiplies your false-positive rate well beyond the 5% you think you are running at. Fixing the sample size in advance and reading the result only once is what makes the p-value mean what it claims.

What do confidence level and statistical power actually control?

Confidence level (typically 95%) controls how often you wrongly call a winner when there is no real difference — a false positive. Power (typically 80%) controls how often you detect a real effect when one genuinely exists — avoiding a false negative. Raising either one increases the sample size you need, because you are demanding stronger evidence.

What if my required sample size is bigger than my traffic?

You have three levers. Increase the MDE so you are only trying to detect larger, more achievable lifts; relax power or confidence slightly if the decision is low-stakes; or run the test longer to accumulate the visitors. If none of those are realistic, the honest conclusion is that this page does not have enough traffic to A/B test reliably, and you should test higher up the funnel or on a higher-traffic surface.

One metric is a number — Multiply connects them all

When every campaign metric, brief, and account signal lives in one AI operating system, you stop calculating in spreadsheets and start acting on the full picture. That's Multiply.

Book a demo Explore more tools

Free tool · Sample size

A/B Test Sample Size Calculator

Jump to calculator Read the guide

The challenge

Most A/B tests are called — long before they should be

The moment one variant edges ahead, the temptation is to ship it. But early leads are usually noise, and a test read before it has the numbers is a coin flip dressed up as a decision.

Peeking inflates false positives
Checking the dashboard daily and stopping the first time you see significance is the most common way to ship a losing variant. Every extra look raises the chance of a false win. Fix the sample size up front and only read the result once it is reached.
Low traffic, low power
On a page with modest traffic, a test that looks flat may simply be too small to detect the effect you care about. Underpowered tests miss real wins and leave you concluding "no difference" when there was one.
The effect size you choose drives everything
Detecting a tiny lift needs vastly more traffic than detecting a large one. Decide the smallest improvement that would actually change your decision, then size the test for that — not for a lift you have no realistic chance of catching.

Free tool

A/B test sample size

Find how many visitors per variant you need for a statistically valid test.

Baseline conversion rate

Your current conversion rate for the control — the page or flow you’re testing against.

Minimum detectable effect

The smallest relative lift worth detecting. A 10% MDE on a 3% baseline means catching a move to 3.3%.

Confidence level

How sure you want to be that a winner is real, not noise. 95% is the standard for most marketing tests.

Statistical power

The chance of detecting a real effect when one exists. 80% is conventional; raise it to miss fewer true wins.

Daily traffic (optional)

Total visitors per day entering the test, across all variants. Add it to estimate how long the test will run.

Sample size per variant

53,209

visitors needed in each group

Total sample

106,418

control + variant combined

This is the number of visitors each of the control and variant must collect before you can read the result at 95% confidence and 80% power.

Two-sided test. n per group = (Zα + Zβ)² × [p₁(1−p₁) + p₂(1−p₂)] ÷ (p₂ − p₁)².

Definition

What is sample size?

Sample size per variant
n = (Zα + Zβ)² × [p₁(1−p₁) + p₂(1−p₂)] ÷ (p₂ − p₁)²
Baseline p₁ = 3% (0.03), a 10% relative lift gives variant p₂ = 0.033. At 95% confidence (Zα = 1.96) and 80% power (Zβ = 0.8416): n = (1.96 + 0.8416)² × [0.03×0.97 + 0.033×0.967] ÷ (0.003)² ≈ 53,209 visitors per variant.
Test duration
Days = (n × 2) ÷ Daily traffic
At ~53,209 per variant, the total is ~106,418 visitors. With 5,000 visitors per day entering the test → 106,418 ÷ 5,000 ≈ 22 days.

How to use it

How to use this sample size calculator

Decide the effect you care about first — the rest follows.

Enter your baseline conversion rate
Use the current rate of the control — the version you are testing against. Pull it from your analytics over a representative recent window, not a single unusual day.
Set the minimum detectable effect
This is the smallest relative lift that would actually change your decision. Be honest: a 1% MDE needs enormous traffic, while a 10–20% MDE is detectable on most pages. Choose the threshold worth the test.
Lock in confidence and power, then commit
Leave confidence at 95% and power at 80% unless you have a reason to change them. Add your daily traffic to see the duration, then run the test to the full sample size before reading the result.

FAQ

Sample size questions

Still stuck? Book a walkthrough and we’ll go through your numbers together.

What is the minimum detectable effect (MDE)?

Why can’t I just stop the test once it shows significance?

What do confidence level and statistical power actually control?

What if my required sample size is bigger than my traffic?

One metric is a number — Multiply connects them all

When every campaign metric, brief, and account signal lives in one AI operating system, you stop calculating in spreadsheets and start acting on the full picture. That's Multiply.

Book a demo Explore more tools

A/B Test Sample Size Calculator

Most A/B tests are called — long before they should be

Peeking inflates false positives

Low traffic, low power

The effect size you choose drives everything

A/B test sample size

What is sample size?

Sample size per variant

Test duration

How to use this sample size calculator

Enter your baseline conversion rate

Set the minimum detectable effect

Lock in confidence and power, then commit

Sample size questions

Keep going — more free calculators

One metric is a number — Multiply connects them all

A/B Test Sample Size Calculator

Most A/B tests are called — long before they should be

Peeking inflates false positives

Low traffic, low power

The effect size you choose drives everything

A/B test sample size

What is sample size?

Sample size per variant

Test duration

How to use this sample size calculator

Enter your baseline conversion rate

Set the minimum detectable effect

Lock in confidence and power, then commit

Sample size questions

Keep going — more free calculators

One metric is a number — Multiply connects them all