You're here when: You have a product change you believe in. Testing costs time. Shipping blind feels risky. You need to decide whether the experiment is worth the delay.
The Heuristic
A/B testing is a tool, not a religion. It's the right call when the stakes are high, the change is reversible, and you have enough traffic to learn something real.
- Is the change easily reversible? If you can roll it back in minutes, the downside of shipping without a test is low. Test when rollback is expensive or impossible.
- Do you have enough traffic for statistical significance? Most teams need 1,000+ conversions per variant to detect meaningful differences. If you don't have the volume, the test will either take months or produce noise you'll misread as signal.
- Are the stakes high? Changes to pricing, checkout flows, or core activation paths deserve tests. A new icon on a settings page does not.
- Is this a Type 1 or Type 2 decision? Type 1 decisions are irreversible (pricing tiers, platform migrations). Type 2 are reversible (UI changes, copy tweaks). Test Type 1. Ship Type 2.
Decision Tree
Quick Example
Netflix runs thousands of A/B tests annually because they have hundreds of millions of users and changes to the recommendation algorithm directly affect retention. But when Netflix decided to shift from star ratings to thumbs up/down, they tested it rigorously: the result was 200% more ratings from users. The volume justified the test. A ten-person startup with 200 daily users running the same test would wait months for inconclusive results and learn nothing.
The 70% Rule
Most decisions should be made with about 70% of the information you wish you had. If you wait for 90%, you're too slow. This applies directly to the "test or ship" question. An A/B test is a way of getting from 70% to 90% certainty, but that extra certainty has a cost: time. If you're good at course correcting, being wrong is less expensive than being slow. For reversible changes (Type 2 decisions, or two-way doors), the cost of running a test often exceeds the cost of just shipping, watching the metrics, and rolling back if needed.
The Anti-Pattern
The Testing Theater. Running A/B tests on everything because it feels scientific, even when traffic is too low for significance. The pattern looks like this:
- Declare a winner after three days with 47 conversions per variant
- Claim a "15% lift"
- Ship confidently based on random noise dressed up as data
If you don't have the volume, just ship it and watch the metrics.
Written with ❤️ by a human (still)