Why you should A/B test your app

Google changed the blue colour of their links and increased their ad revenue by $200m¹.

How did they find out that changing the link colour was the right change to make?

The only way was to try lots of different colours and see if any of them made a difference to the click through rate of their ads. They ran a series of experiments by showing one colour (A) to one group of users and another colour (B) to a different group of users and measured if there was a change in the behaviour of each group. They applied a testing approach called A/B Testing.

A/B Testing is an approach to allow you to learn about what works and what doesn’t with your users. It’s an approach you can use on your product or service to test if the change you think you want to make is actually going to generate the outcome you hope it will.

Feature Toggling / Feature Flags

A/B testing relies on a technical design approach called Feature Toggling or Feature Flags. A flag is a value that can be true or false and is used to signal to a program which path, or branch, of the code to execute.

Feature Toggles are switches built into your application that let you turn a feature on or off based on some rules that you define. These rules can live in configuration files or be provided from a backend service API.

For example, imagine you want to show a login page on your mobile app with St Patrick’s day branding for your mobile users based in Ireland but your normal branding for your mobile users in India. You would deploy an app that has both variants of the login page built in. When the app starts, it would send the location of the user to the Feature Toggle service API and depending on the location of that specific user, the service would tell the app to show either the St Patrick’s day login screen or the unbranded login screen.

You may be wondering why use a service rather than just a configuration file to control the Feature Toggle? Using a service to control the Feature Toggles means that the rules can be changed at any time to target a different segment of users or try different bundles of features without a full redeploy of the app. A service like this isn’t tied to one product or channel. It could just as easily be used for your mobile app as your website.

That’s just one example of a feature toggle. How you choose to segment your user base and which features they get shown is up to you. This can open up a lot of options for your mobile and web based product beyond just A/B testing.

How you can use Feature Toggles

Here are some ideas of how you can use Feature Toggles beyond A/B testing. These apply just as well to your mobile app as to your website or digital product.

Early Access

You could run beta programs on your live application by explicitly including the people you want to see a new feature. Whether it’s a person or organization or even country. It’s often good practice to roll out to your smaller markets before your bigger ones. Canada before the United States or New Zealand before Australia.

Kill Switch

A features controlled by a Features Toggle can also be turned off if it’s performing poorly. This can be the difference between a public relations disaster and minimal impact. Combine this with percentage rollouts to say rollout to 1% of your users and get feedback. If something goes wrong you’ve disrupted the smallest possible audience. If you’re doing this right, you can greatly reduce the risk of a feature rollout.

Opt In

You can allow users to opt-in for early access to new features. There’s essentially a contract between you and the early user. Users gets something new and cool and in return you expect them to be more tolerant of the feature early stage in development. Google Labs is a great example of this kind of approach..

Incremental Roll Outs

Performance testing is hard. Trying to build a mirror of your production systems, with the same load is difficult and costly. Instead use phased rollouts to percentages of your users to verify there are no scalability issues. You can roll out to a random percentage of users to verify to get feedback. This is particularly good for infrastructure projects. If you’re bringing on a new back-end and want to make sure it can handle load in the real world. Instead of releasing the new functionality at 4 am in case it doesn’t work well, you can release during normal business hours. Ramp up to 2% at 9 am to ensure it’s all working properly and keep ramping up during the day. You can even turn it completely off to leave for lunch when you aren’t around to monitor it.

Block Users

While a feature flag can enable a specific user or user group to access some element of your site, there is also the capability to protect features from users by excluding them from ever seeing them. One example would be regulations that differ by state or by country. This is applicable to drinks companies or drug companies that have different laws across countries so will want to serve up different experiences in those different geographical areas.

You may also want to block an IP or anyone in a certain domain. It is also common practice to exclude anyone from certain tech blog sites from seeing new functionality so a new feature won’t be seen prematurely.

Hypothesis Driven Development (A/B Testing)

Feature flags help you Run A/B tests of features to see which features perform better. If you’re switching you can get measurements and KPIs to see if this change supports your hypothesis.

Bucketing users

  • By attribute e.g. All users with a gmail.com username
  • By percentage e.g. 1% of your users see the new feature
  • By combining percentage and attribute e.g. 1% of your gmail users
  • By location e.g. Only UK users
  • By digital channel e.g. Mobile users vs web users
  • AI Allocation (e.g. Adobe Target). Trying different attributes and the learning what works and what doesn’t then giving the right screen to the right group

Calendar Driven Launches

Hitting an exact date for launch is stressful. You have to flip on many assets and functionality at the same time, and see it the same time as your end users. No matter how often you test in staging, the real world is the final judge. However, with feature flags, you can push all of your functionality “live” to production – but turned off for everyone but your QA and internal testers. This gives you an opportunity and time to flush out any issues, well before your actual launch.

Subscription

It’s often more simple and certainly more scalable to control subscription plans by bundling features with feature flags. Perhaps you have users on different subscription plans and there are different features for each plan. If you have an interface that allows people on the admin side to access, then they can easily handle changes to visibility. For instance, a product or sales person can give access to a big customer who would like to check out a feature from the next level of subscription. And then once the customer is happy with it, move them on up. The old way involves an engineer and days of work.

Newbie vs Power User

It’s likely that you want to show expert users and beginner users different features of your product entirely. For instance, a new and a “power” user have different needs from your product. You want to make them both happy. Without a compromise, you can give two different experiences. Feature flags make it easier to tailor your application to your user base.

Maintenance Mode

Put portions (or your entire application) into maintenance mode by simply turning off features for all your users.

Sunset

Turning off features is often an afterthought during the development process. If your system has been around for a while, you start to accumulate old features, which could be costing you more than their worth in maintenance and QA. It’s expensive to have around hidden features, as you have to QA and make sure they aren’t conflicting with new features, Have to make sure these old features work on new platforms. Old features are a huge hidden tax on software releases. Sometimes the best things to do with old features is to cleanly sunset them.

What ever change you make it’s key that you know can prove that it really is the change that has (or hasn’t) made a difference. To do that you need a statistically significant number of people to use each version.

There is a formula to work out how many users you need and it relies on 3 things:

  1. Your base conversion rate, the number of people who used the feature over the total number of people visiting your service
  2. The size of the change you want to test for. The bigger the change you count as a win the easier it is to spot in the noise of the data and the fewer people you need to use your service
  3. How statistically reliable you need the results to be. If you only need to be probably 80% likely to prove a feature has made a change in your metric you need fewer people to use it than if you need to be 95% sure that it made a difference.

There is a calculator here that you can use to see how many users you need for your feature.

The numbers of users needed are perhaps higher than you might think but you can still learn from doing an A/B test with fewer people as long as you realise you might not be fully able to trust the results.

That might not be as important as you think. Depending on the stage of the project you will likely be looking for different things from your users.

The UK Government has broken down their user research into their agile governance stages of a project.²

The aim of user research in the discovery phase is to find out:

  • who your likely users are and what they’re trying to do
  • how they do it currently (for example, what services or channels they use)
  • the problems or frustrations they experience
  • what users need from your service to achieve their goal

The aim of user research in the alpha phase is to:

  • improve the team’s understanding of your users and their needs
  • test different design ideas and service prototypes with likely users
  • learn how to build or improve your service so that it helps users achieve their goal

The aim of user research in the beta phase is to:

  • test the developing service with likely users to make sure it meets their needs
  • understand and resolve usability issues

The aim of user research in the live phase is to:

  • assess people’s experience of using your service
  • understand evolving user needs
  • test new features, changes or improvements to your service

About Waracle

However you choose to approach A/B testing, your users will no doubt surprise you – which is part of the fun when doing this kind of user research. Hopefully you’ll find that nugget of gold to change your metric and delight your users.  A good A/B process will broadly entail ongoing analysis/measurement, split testing, push messaging, advertising, updates and measurement. You need to use analytics to get to grips with who is using your mobile app and why and split test your app experience to find out what works. Why not contact us at Waracle and find out how we could help optimise your app!

If you would like to learn more about other trends affecting mobile app development, download one of our free whitepapers on mobile trends and mobile marketing.

Subscribe to Our Thoughts