You're here when: You have a product change ready to go. Testing means delay. Shipping blind feels risky. You need to decide whether the experiment is worth the wait.
The Heuristic
A/B testing is a tool, not a religion. It's the right call when the stakes are high, the change is reversible, and you have enough traffic to learn something real.
- Is the change easily reversible? If you can roll it back in minutes, the downside of shipping without a test is low. Test when rollback is expensive or impossible.
- Do you have enough traffic? You need roughly 1,000 conversions per variant to detect meaningful differences. Below that, the test either takes months or produces noise you'll misread as signal.
- Is the cost of being wrong high? Changes to pricing, checkout flows, or core activation paths deserve tests. A new icon on a settings page does not.
Decision Tree
Traffic Thresholds
The number one mistake is running a test without enough traffic. Here's what the math actually requires:
If you're below these numbers, an A/B test will either take months or give you a false positive that feels like insight but is actually noise.
Quick Example
Netflix runs thousands of A/B tests annually because they have hundreds of millions of users and changes to the recommendation algorithm directly affect retention. When they shifted from star ratings to thumbs up/down, they tested it rigorously and saw 200% more ratings from users. The volume justified the test. A ten-person startup with 200 daily users running the same test would wait months for inconclusive results and learn nothing.
The 70% Rule
Most decisions should be made with about 70% of the information you wish you had. If you wait for 90%, you're too slow. An A/B test is a way of getting from 70% to 90% certainty, but that extra certainty has a cost: time. If you're good at course correcting, being wrong is less expensive than being slow. For reversible changes, the cost of running a test often exceeds the cost of just shipping, watching the metrics, and rolling back if needed.
The sweet spot: reserve A/B tests for the 20% of decisions where the stakes justify the delay. Ship the other 80% with monitoring and a rollback plan.
The Anti-Pattern
Testing Theater. Running A/B tests on everything because it feels scientific, even when traffic is too low for significance. The pattern looks like this:
- Declare a winner after three days with 47 conversions per variant
- Claim a "15% lift"
- Ship confidently based on random noise dressed up as data
If you don't have the volume, just ship it and watch the metrics.
Written with ❤️ by a human (still)