A/B testing (sometimes also called split testing or bucket testing) is comparing two (or more) variations of a product in order to determine which one performs better. This is done through offering your customer base the different variations at the same time. Generally, one of those variations is called the “Control”, and shows the original experience, without showing the new changes. Once that’s done, statistical analysis is used to determine which variation performs better for a given conversion goal. The variation with the highest conversion rate wins.
A/B testing

A/B Testing can be applied to any product, whether we’re talking about web sites, mobile apps, logo design, fruit baskets etc. Mostly, however, the IT industry conducts the bulk of A/B testing, with big players like Google, Facebook, Amazon, LinkedIn, Netflix and others using it extensively and constantly. Many other companies are following suit in a quest to get increasingly better conversion rates.

How A/B Testing works

To establish a common dictionary, we’ll be referring to A/B Tests as experiments and their variations will be called variants in the rest of the article. The variants may be “Control” (the original experience) and “Treatment” (the new experience).

The first thing you need to do before you A/B test anything is determine what your expected outcome is. Do you want an increase in overall traffic, net new customers, or just an increase in return rates for existing customers? Or do you simply want to increase conversion rates? It’s of the utmost importance to determine the desired outcome, otherwise you won’t really know when to stop the experiment and evaluate the data you managed to collect.

Speaking of data collection, in order to evaluate the trend of your A/B test variants, you need to have an adequate tracking system in place so that you can understand how your users are interacting with the app. Depending on the experiment, you may want to track things like clicks, pageviews, initiated purchases, completed purchases, abandoned purchases and so on. This data, besides giving you relevant info on how your experiment is behaving, might also provide insight into other experiments you may want to run to further improve other parts of your app.

You will also need to establish an amount of time you want this experiment to run before you compile your data and evaluate the results. This amount of time can vary greatly depending on factors like customer base size, feature complexity, variant difference, conversion delta milestones and so on. If you A/B test a font change on your front page, you could get away with running the experiment for as little as a couple of days, while a complete overhaul of your checkout flow might need to have the experiment run for a couple of months or even a full quarter before getting sufficient relevant data.

Another thing to factor in is the rollout percentage of the experiment. If you’re A/B testing a risky change which could potentially break the app or is an aggressive rebranding and you’re not sure how your users will receive it, you may not want to expose your entire customer base to it. In this case, it’s generally best to do a staged rollout. Start with 1% of your customers, let that roll for a few days, then increase to 5%, then 10%, then 25% and so on. Sometimes it’s better to be more conservative, especially with complex changes which touch a lot of code, and A/B testing provides both the medium and the appropriate safety net so you can get away unnoticed if something goes bad.

Finally, at the end of your set evaluation period, it’s time to evaluate your results and declare a winner. Sometimes the treatment variant(s) may be better than the control, in which case the treatment variant becomes integrated in the code (it basically, becomes the new standard or control for any other future experiments). Sometimes the treatment variant(s) may not be better, in which case, they get pulled from the code and the control experience is left in place.
variants in A/B testing

Downsides of A/B testing

Like any tool, it’s important to use it when needed and when appropriate. There are cases, where A/B testing is not recommended:

When being first is more important than being optimized
Our client was the first one to offer Apple Pay when it was first introduced back in 2014. While we did have some lead time and we implemented an A/B to control this feature, the fact that we were showcased by Apple made it impossible to do a staged rollout. We could turn off the feature if we really needed to, but for the most part, we had to release it and just hope for the best.
When you are not certain of your hypothesis (though, ideally, this shouldn’t ever be the case, with sufficient planning beforehand)
When you don’t have enough users (and you’re unable to get a reliable result from running an A/B test)
When what you’re testing contradicts your brand or design principles (this could mean any number of things — from rebranding to expanding or contracting the inventory you’re offering, to changing button sizes and so on)

Conclusion

A/B Testing, while it may be difficult to implement effectively, is an incredibly powerful tool, which can really help drive your business. It doesn’t come without its fair share of challenges and risks, but knowing how and when and how long to use it has the ability to make your good app, great.