Three reasons to stop A/B testing

Successful (online) companies are organised to experiment continuously. This is preaching to the choir, something we all agree on.

But what exactly is the best way to experiment as a company? We don’t necessarily agree on the answer to that question.

This article deals with one of the most used types of online experiments, A/B tests.

I would like to explain why you should stop running them.

AB testing broken heart1. Individual visitors are not the same

We optimize for individuals. Opposed to other A/B or MVS testing applications we keep track of individual customers, and will model them individually.

This means that although version A might work best for the majority of your customers (and thus is the first choice for a new customer) we adapt if a specific – recurring – customer never responds to A.

We will try B for him or her, and stick with it if it works.

2. Test the Why instead of the What

We are not interested in button color or size of the font on your page (the What?). We optimize the influence strategies (the Why?) that are used on a site.

Thus, should you present customer reviews, or discounts, or expert endorsements, or limited time offers… etc. This enables us to create a profile of individual customers based on these influence strategies which can be used for future pitches - pitches that are different implementations of the same strategies.

The What’s? stay dumb and the Why’s change the game.

3. Don’t exploit when it’s time to explore

We adopt a bayesian paradigm to optimize the explore-exploit tradeoff. Thus, we don’t tell people to use version A after A and B have been visited by a 1,000 visitors. We will keep pitching B every now and then if the uncertainty in our estimates is too high.

This enables us to explore whether a version becomes popular at a later point in time, or for a specific market segment. We thus  decide in real time which version to show to a new customer. We believe we are the first to provide this option commercially.

Image credit: mil8 via Flickr

Maurits Kaptein is Chief Science Officer at Science Rockstars and a guest blogger on Econsultancy. 

Add your own

Reader comments (8)

  1. Stuart McMillan Stuart McMillan Silver

    Deputy Head of Ecommerce at Schuh

    2:07PM on 11th December 2012

    Maurtis,
    I was wondering about your point 3, am I to understand you use a linear Bayes method to work out the value of epsilon in an epsilon-greedy solution to the multi-armed bandit problem, as I am guessing a more traditional Bayes approach was too expensive? Does this mean that you've had to make some computational hard choices due to some sort of performance problem?

    I guess my real question is, how much of an impact does this have on page load speed, given its real time nature?

  2. Avatar-blank-50x50 Maurits Kaptein

    2:22PM on 11th December 2012

    Hi Stuart,

    For the multi armed bandit problem we use Randomized Probability Matching as introduced by Scott in 2010 (http://www.economics.uci.edu/~ivan/asmb.874.pdf). We use estimates from an hierarchical bayesian beta-binomial model. To reduce page load burden we actually use only the individual level model for within session updates which is extremely fast since we do not revert back to the original data: all is captured in the priors.

    Does that help at all?

  3. Jacob Ajwani Jacob Ajwani

    VP of Client Services at Cognitive Match

    3:47PM on 11th December 2012

    Valid preservative Maurtis, I appreciate the title. Keep in mind what you are advocating requires a significant of conversions to wash out noise. Ideal for high traffic areas of a site, but not the long tail. A/B testing has a major role to play in discovering aesthetics and layout....but yes, then get granular with a 1to1 approach. see: Touch Clarity, Omniture, Cognitive Match.

  4. Stuart McMillan Stuart McMillan Silver

    Deputy Head of Ecommerce at Schuh

    4:48PM on 11th December 2012

    Hi Maurtis,
    Thanks for the fuller explanation, it's now getting a bit beyond me!

    Stuart

  5. James Gurd James Gurd Silver

    Owner at Digital Juggler

    10:11AM on 12th December 2012

    Hi Maurits,

    Thanks for an interesting article, even if it's too early for my brain to cope with statistical theory and modelling!

    I take the point about optimising for the individual. My question to that is, how practical is this for a small business? What is the cost/complexity of modelling to the individual instead of using AB/MVT to determine 'optimal' blend of page content at a generic level?

    Also, i'm intrigued that you say you're not interested in the button colour or size etc as you look at the influence strategies. Well, every component of the page influences a customer, whether consciously or subconsciously and colour is one of these influences.

    Please can you clarify what you mean by 'influence strategies' - in my experience colour, location, size all have an influence on outcomes.

    And finally, how do you keep track of individual cross-browser? So if I come via my desktop on a fixed IP, then come back via an iPad on a 3G connection, surely you can't correlate? Or would you argue it doesn't matter because you're treating each visit as an individual regardless?

    Thanks
    james

  6. Brewster Barclay Brewster Barclay

    Consultant at B. F. Barclay & Associates

    12:00PM on 12th December 2012

    Hi Maurits,

    Good ideas but it seems that what you are really saying is not to stop A/B testing but rather to:
    - Deliver different content to different segments and ideally down to a segment size of one
    - Keep on testing as tastes and influences change and your segments and optimal content strategies may change.
    Regards,
    Brewster

  7. dan barker dan barker

    E-Business Consultant at Dan Barker

    12:00PM on 12th December 2012

    Here is a silly example 5-step process for A/B testing that I think (firstly) can be used by an average business & that (secondly) gets around a lot of the issues talked about here:

    Step 1: Figure out what the factors are on your site that influence whether/not your ideal audience accomplishes the behaviour you wish them to on the site. (addresses reason 2)
    Step 2: Plan your A/B test(s) based on those factors, and on hypotheses around those.
    Step 3: Whatever the results, dig into them to see if there is big variation among individuals/segments, and therefore what that means to your hypotheses. (addresses reason 1)
    Step 4: If there is variation, understand why and see whether you can address it either by making other changes to the page & retesting, or by creating new paths/elements on the site to cater for those individuals/groups.
    Step 5: Rerun your tests from time to time to revalidate. (your reason 3)

    Not perfect, but a good start?

  8. Avatar-blank-50x50 Peter Ellen

    11:21AM on 13th December 2012

    Interesting comments but what you test and how you serve need to be backed by pragmatism. Sure, if your content is designed to appeal to different audiences then its likely they'll respond differently. If you are trying to work out how best to help someone find something or do something then an aggregate approach might be better. If your honestly suggesting that A/B testing is not a valid way to iterate customer experiences I think you are trying to boil the ocean - the evidence to support "test and learn" in all aspects of marketing is overwhelming just as the evidence to support segmentation of different customer requirements proven. Any target group of customers must be substantial, accessible, unique, appropriate and stable. If that can be achieved across a number of target groups then great - offer them something different. But when they all respond in broadly the same way then why bother. Neither of these scenarios suggest that your first attempt to target will be the best you can do. To improve you have to test and learn.. or be extraordinarily lucky.

Log in to post a comment