🧱 argil.io

Should I A/B test this or just ship it?

2 min read
Last updated March 30, 2026

You're here when: You have a product change ready to go. Testing means delay. Shipping blind feels risky. You need to decide whether the experiment is worth the wait.

The Heuristic

A/B testing is a tool, not a religion. It's the right call when the stakes are high, the change is reversible, and you have enough traffic to learn something real.

  • Is the change easily reversible? If you can roll it back in minutes, the downside of shipping without a test is low. Test when rollback is expensive or impossible.
  • Do you have enough traffic? You need roughly 1,000 conversions per variant to detect meaningful differences. Below that, the test either takes months or produces noise you'll misread as signal.
  • Is the cost of being wrong high? Changes to pricing, checkout flows, or core activation paths deserve tests. A new icon on a settings page does not.

Decision Tree

Loading visualization...

Traffic Thresholds

The number one mistake is running a test without enough traffic. Here's what the math actually requires:

Loading visualization...

If you're below these numbers, an A/B test will either take months or give you a false positive that feels like insight but is actually noise.

Quick Example

Netflix runs thousands of A/B tests annually because they have hundreds of millions of users and changes to the recommendation algorithm directly affect retention. When they shifted from star ratings to thumbs up/down, they tested it rigorously and saw 200% more ratings from users. The volume justified the test. A ten-person startup with 200 daily users running the same test would wait months for inconclusive results and learn nothing.

The 70% Rule

Most decisions should be made with about 70% of the information you wish you had. If you wait for 90%, you're too slow. An A/B test is a way of getting from 70% to 90% certainty, but that extra certainty has a cost: time. If you're good at course correcting, being wrong is less expensive than being slow. For reversible changes, the cost of running a test often exceeds the cost of just shipping, watching the metrics, and rolling back if needed.

The sweet spot: reserve A/B tests for the 20% of decisions where the stakes justify the delay. Ship the other 80% with monitoring and a rollback plan.

The Anti-Pattern

Testing Theater. Running A/B tests on everything because it feels scientific, even when traffic is too low for significance. The pattern looks like this:

  • Declare a winner after three days with 47 conversions per variant
  • Claim a "15% lift"
  • Ship confidently based on random noise dressed up as data

If you don't have the volume, just ship it and watch the metrics.

Loading visualization...

Written with ❤️ by a human (still)