Should I A/B test this or just ship it?

You're here when: You have a product change ready to go. Testing means delay. Shipping blind feels risky. You need to decide whether the experiment is worth the wait.

The Heuristic

A/B testing is a tool, not a religion. It's the right call when the stakes are high, the change is reversible, and you have enough traffic to learn something real.

Is the change easily reversible? If you can roll it back in minutes, the downside of shipping without a test is low. Test when rollback is expensive or impossible.
Do you have enough traffic? You need roughly 1,000 conversions per variant to detect meaningful differences. Below that, the test either takes months or produces noise you'll misread as signal.
Is the cost of being wrong high? Changes to pricing, checkout flows, or core activation paths deserve tests. A new icon on a settings page does not.

Decision Tree

Loading visualization...

Traffic Thresholds

The number one mistake is running a test without enough traffic. Here's what the math actually requires:

Loading visualization...

If you're below these numbers, an A/B test will either take months or give you a false positive that feels like insight but is actually noise.

Quick Example

Netflix runs thousands of A/B tests annually because they have hundreds of millions of users and changes to the recommendation algorithm directly affect retention. When they shifted from star ratings to thumbs up/down, they tested it rigorously and saw 200% more ratings from users. The volume justified the test. A ten-person startup with 200 daily users running the same test would wait months for inconclusive results and learn nothing.

The 70% Rule

Most decisions should be made with about 70% of the information you wish you had. If you wait for 90%, you're too slow. An A/B test is a way of getting from 70% to 90% certainty, but that extra certainty has a cost: time. If you're good at course correcting, being wrong is less expensive than being slow. For reversible changes, the cost of running a test often exceeds the cost of just shipping, watching the metrics, and rolling back if needed.

The sweet spot: reserve A/B tests for the 20% of decisions where the stakes justify the delay. Ship the other 80% with monitoring and a rollback plan.

The Anti-Pattern

Testing Theater. Running A/B tests on everything because it feels scientific, even when traffic is too low for significance. The pattern looks like this:

Declare a winner after three days with 47 conversions per variant
Claim a "15% lift"
Ship confidently based on random noise dressed up as data

If you don't have the volume, just ship it and watch the metrics.

Loading visualization...

Written with ❤️ by a human (still)