Should I A/B test this or just ship it?

You're here when: You have a product change you believe in. Testing costs time. Shipping blind feels risky. You need to decide whether the experiment is worth the delay.

The Heuristic

A/B testing is a tool, not a religion. It's the right call when the stakes are high, the change is reversible, and you have enough traffic to learn something real.

Is the change easily reversible? If you can roll it back in minutes, the downside of shipping without a test is low. Test when rollback is expensive or impossible.
Do you have enough traffic for statistical significance? Most teams need 1,000+ conversions per variant to detect meaningful differences. If you don't have the volume, the test will either take months or produce noise you'll misread as signal.
Are the stakes high? Changes to pricing, checkout flows, or core activation paths deserve tests. A new icon on a settings page does not.
Is this a Type 1 or Type 2 decision? Type 1 decisions are irreversible (pricing tiers, platform migrations). Type 2 are reversible (UI changes, copy tweaks). Test Type 1. Ship Type 2.

Decision Tree

Loading visualization...

Quick Example

A massive consumer product can justify rigorous A/B testing on core experience changes because the volume is there and the stakes are high. A ten-person startup with 200 daily users running the same test would wait months for inconclusive results and learn nothing. The point isn't that testing is bad. It's that testing without enough signal is theater.

The 70% Rule

Most decisions should be made with about 70% of the information you wish you had. If you wait for 90%, you're too slow. This applies directly to the "test or ship" question. An A/B test is a way of getting from 70% to 90% certainty, but that extra certainty has a cost: time. If you're good at course correcting, being wrong is less expensive than being slow. For reversible changes (Type 2 decisions, or two-way doors), the cost of running a test often exceeds the cost of just shipping, watching the metrics, and rolling back if needed.

The Anti-Pattern

The Testing Theater. Running A/B tests on everything because it feels scientific, even when traffic is too low for significance. The pattern looks like this:

Declare a winner after three days with 47 conversions per variant
Claim a "15% lift"
Ship confidently based on random noise dressed up as data

If you don't have the volume, just ship it and watch the metrics.

Loading visualization...

Written with ❤️ by a human (still)