Sunday, March 10, 2013

How to do a simple A/B Test

In digital marketing we often want to test to see what works best. What is most attractive to a potential customer? What message resonates best? What advertising campaign is most likely to get attention? How can we maximize the completion rate on a website landing page?

These kinds of questions often involve a simple test of two competing ideas or approaches. For example, we might have two banner advertisements or two registration pages, and we want to know which results in the most click-throughs or which is most likely to result in more leads. This is called an A/B test. Here’s how to do one right and not have to worry about the stats too much.

What is an A/B Test?
In marketing, an A/B test is a simple way to understand if there’s a meaningful difference between two competing approaches to a problem, such as a banner advertisement or a direct mail campaign. Using the test, we can measure the results of the two approaches and decide if there’s a real (statistically significant) difference.

A/B testing isn’t a new idea. I can remember decades ago talking to an executive at AMEX where they performed extremely elaborate versions of A/B testing on their direct mail campaigns supporting new credit card applications. They were looking for tiny differences in response rates across millions of mailings, and tested everything down to the color of ink. Usually we’re not nearly as sophisticated but the same principles apply.

How do I set up the Test?
Setting up the test isn’t that hard. Normally, you’ll have two approaches to a problem and a desired outcome. Here’s a few examples:


  • Your agency comes back with a new creative idea for a banner ad, and you'd like to test it against an existing design. You want to see if the new creative idea gets more click throughs over the other.
  • You have a new layouts for a web page. Which keeps people on the page the longest, the new idea or the old one you already have in place?
  • You have an email newsletter. You want to test a different subject line to see which gets the best open rate
  • You’ve a new direct mail campaign, but can’t agree whether to feature the discount offer again on the envelope, or picture of the new product. Which gets the best response to your order hotline?

In each case, you have a couple of alternative approaches to a problem, and an obvious way of measuring (counting) which is best.

What else should I know to make sure the test will work?
Here’s a few things to be careful about:

  •  Ideally, make sure you’re actually measuring the right thing. For example, if you have two banner ads, and each features a different offer, different graphics and is placed on different websites – well, it’s really hard to know what is affecting click-throughs. Is it the websites that make a difference, or the offer or the graphic? The test will not tell you the answer. As much as possible, try and keep everything constant across each test except the specific thing you’re testing.  
  • Try and randomize as much as possible. What does this mean? Here’s some examples:
    •  If you’re testing two direct mail pieces across a sample of your database (say, 500 people), then make sure to randomly select people from the list and randomly place them in each group. This way, the selection process can’t influence the outcome.
    •  If you’re running two different banner ads on a website to see which performs best, try and make sure they’re seen randomly, or as near as you can get.
  • Try and make sure that you’re testing across a big enough sample. Bigger is usually better. Statistically, the power of the A/B test is related to the amount of data you gather, and hence the sample size. And remember, your data is often measured in terms of response rates, which in marketing can be very low. There’s no magic number, but try and get large.

How do I perform the test, and do I need to understand statistics?
Most people who do A/B testing never perform an actual test – they “eyeball” the data instead. As we’ll see, this can be a big mistake. It’s absolutely worth doing the tests because it’s easy to come to a wrong conclusion. The test involves statistics, but luckily for us there are online tools that do all the math for you. Here’s two examples:

Example One: Banner Advertising
You run two banner ads on the same website page that appear randomly to viewers (Banner A is old, and you think Banner B might do better). You use 5,000 impressions, 2,500 of each banner, and here’s the results:


Impressions
Click Thrus
Banner A
2,500
25 (1%)
Banner B
2,500
32 (1.28%)

Eyeball this data, and most people would conclude that Banner B is easily better than Banner A. So let’s run a test to see if this is statistically the case – I’m going to use a simple tool supplied by the good people at Optimizer. (The tool is designed for a specific example for a website registration page, but we can easily adapt it for our use.)

To use the tool, let Control = Banner A and Variation = Banner B (remember, in this example you’ve a hunch that B is better than A). The “Number of Visitors” can equal the number of impressions, which is 2,500 in each case. The number of Conversions is the number of click-thrus for each banner (25 and 32 in our example).

Ignore the “p-value” unless you’re a stats geek. Hit Calculate and the tool will tell you if the difference in click throughs is really statistically significant. And the answer is – NO!

How can this be??? Surely, if there’s a difference in the results, it must mean our Banner B is better than Banner A!

Well, imagine you’re flipping a coin, and you flip it 100 times – assuming the coin is fair, you should get 50 heads and 50 tails, but you wouldn’t be surprised if you got 48 heads and 52 tails. Or even 45 heads and 55 tails. The same random chance with coin flips can influence our simple A/B test, so even though there’s a small difference in results, we can’t be certain this isn’t just random. You could be making a mistake going with Banner B. This is why it is so important to actually do a test – especially in situations where “responses” are very low, as is usual in most marketing activities.

Example Two: Website registration
You now run a test of two landing pages where visitors are asked to register. Page A is the old page you’ve used for a long time, and Page B is “variation” you want to test. You want to see which gets the most registrations, and here’s the data:


Impressions/Visitors
Registrations
Landing Page A
13,000
83 (0.64%)
Landing Page B
8,900
78 (0.87%)

Plug in the numbers and… the answer is YES! But what does this mean?

It means, Page B performs better than Page A, within the margin of error of the statistical test. Let’s explore this.

In this case, we should note that the number of visitors on each page is very different (which is not to say that they aren’t random across each page, we assume they are. It’s pretty common to have different numbers given the practical realities of market research). Page A has 13,000 visitors, compared to only 8,900 on Page B. Page B has a better registration rate of 0.87% compared to 0.64% for Page A (the absolute numbers of registrations are very close, but are misleading). Eyeballing the data could be confusing, but the test makes it clear that Page B is the better bet.

Happy testing.