Successful (online) companies are organised to experiment continuously. This is preaching to the choir, something we all agree on.

But what exactly is the best way to experiment as a company? We don’t necessarily agree on the answer to that question.

This article deals with one of the most used types of online experiments, A/B tests.

I would like to explain why you should stop running them.

AB testing broken heart1. Individual visitors are not the same

We optimize for individuals. Opposed to other A/B or MVS testing applications we keep track of individual customers, and will model them individually.

This means that although version A might work best for the majority of your customers (and thus is the first choice for a new customer) we adapt if a specific – recurring – customer never responds to A.

We will try B for him or her, and stick with it if it works.

2. Test the Why instead of the What

We are not interested in button color or size of the font on your page (the What?). We optimize the influence strategies (the Why?) that are used on a site.

Thus, should you present customer reviews, or discounts, or expert endorsements, or limited time offers… etc. This enables us to create a profile of individual customers based on these influence strategies which can be used for future pitches - pitches that are different implementations of the same strategies.

The What’s? stay dumb and the Why’s change the game.

3. Don’t exploit when it’s time to explore

We adopt a bayesian paradigm to optimize the explore-exploit tradeoff. Thus, we don’t tell people to use version A after A and B have been visited by a 1,000 visitors. We will keep pitching B every now and then if the uncertainty in our estimates is too high.

This enables us to explore whether a version becomes popular at a later point in time, or for a specific market segment. We thus  decide in real time which version to show to a new customer. We believe we are the first to provide this option commercially.

Image credit: mil8 via Flickr

Maurits Kaptein

Published 11 December, 2012 by Maurits Kaptein

Maurits Kaptein is Chief Science Officer at Science Rockstars and a contributor to Econsultancy. 

4 more posts from this author

Comments (8)

Save or Cancel
Stuart McMillan

Stuart McMillan, Deputy Head of Ecommerce at Schuh

I was wondering about your point 3, am I to understand you use a linear Bayes method to work out the value of epsilon in an epsilon-greedy solution to the multi-armed bandit problem, as I am guessing a more traditional Bayes approach was too expensive? Does this mean that you've had to make some computational hard choices due to some sort of performance problem?

I guess my real question is, how much of an impact does this have on page load speed, given its real time nature?

over 4 years ago


Maurits Kaptein

Hi Stuart,

For the multi armed bandit problem we use Randomized Probability Matching as introduced by Scott in 2010 ( We use estimates from an hierarchical bayesian beta-binomial model. To reduce page load burden we actually use only the individual level model for within session updates which is extremely fast since we do not revert back to the original data: all is captured in the priors.

Does that help at all?

over 4 years ago

Jacob Ajwani

Jacob Ajwani, VP of Strategy at

Valid preservative Maurtis, I appreciate the title. Keep in mind what you are advocating requires a significant of conversions to wash out noise. Ideal for high traffic areas of a site, but not the long tail. A/B testing has a major role to play in discovering aesthetics and layout....but yes, then get granular with a 1to1 approach. see: Touch Clarity, Omniture, Cognitive Match.

over 4 years ago

Stuart McMillan

Stuart McMillan, Deputy Head of Ecommerce at Schuh

Hi Maurtis,
Thanks for the fuller explanation, it's now getting a bit beyond me!


over 4 years ago

James Gurd

James Gurd, Owner at Digital JugglerSmall Business Multi-user

Hi Maurits,

Thanks for an interesting article, even if it's too early for my brain to cope with statistical theory and modelling!

I take the point about optimising for the individual. My question to that is, how practical is this for a small business? What is the cost/complexity of modelling to the individual instead of using AB/MVT to determine 'optimal' blend of page content at a generic level?

Also, i'm intrigued that you say you're not interested in the button colour or size etc as you look at the influence strategies. Well, every component of the page influences a customer, whether consciously or subconsciously and colour is one of these influences.

Please can you clarify what you mean by 'influence strategies' - in my experience colour, location, size all have an influence on outcomes.

And finally, how do you keep track of individual cross-browser? So if I come via my desktop on a fixed IP, then come back via an iPad on a 3G connection, surely you can't correlate? Or would you argue it doesn't matter because you're treating each visit as an individual regardless?


over 4 years ago


Brewster Barclay, Consultant at B. F. Barclay & Associates

Hi Maurits,

Good ideas but it seems that what you are really saying is not to stop A/B testing but rather to:
- Deliver different content to different segments and ideally down to a segment size of one
- Keep on testing as tastes and influences change and your segments and optimal content strategies may change.

over 4 years ago

dan barker

dan barker, E-Business Consultant at Dan Barker

Here is a silly example 5-step process for A/B testing that I think (firstly) can be used by an average business & that (secondly) gets around a lot of the issues talked about here:

Step 1: Figure out what the factors are on your site that influence whether/not your ideal audience accomplishes the behaviour you wish them to on the site. (addresses reason 2)
Step 2: Plan your A/B test(s) based on those factors, and on hypotheses around those.
Step 3: Whatever the results, dig into them to see if there is big variation among individuals/segments, and therefore what that means to your hypotheses. (addresses reason 1)
Step 4: If there is variation, understand why and see whether you can address it either by making other changes to the page & retesting, or by creating new paths/elements on the site to cater for those individuals/groups.
Step 5: Rerun your tests from time to time to revalidate. (your reason 3)

Not perfect, but a good start?

over 4 years ago


Peter Ellen

Interesting comments but what you test and how you serve need to be backed by pragmatism. Sure, if your content is designed to appeal to different audiences then its likely they'll respond differently. If you are trying to work out how best to help someone find something or do something then an aggregate approach might be better. If your honestly suggesting that A/B testing is not a valid way to iterate customer experiences I think you are trying to boil the ocean - the evidence to support "test and learn" in all aspects of marketing is overwhelming just as the evidence to support segmentation of different customer requirements proven. Any target group of customers must be substantial, accessible, unique, appropriate and stable. If that can be achieved across a number of target groups then great - offer them something different. But when they all respond in broadly the same way then why bother. Neither of these scenarios suggest that your first attempt to target will be the best you can do. To improve you have to test and learn.. or be extraordinarily lucky.

over 4 years ago

Save or Cancel

Enjoying this article?

Get more just like this, delivered to your inbox.

Keep up to date with the latest analysis, inspiration and learning from the Econsultancy blog with our free Daily Pulse newsletter. Each weekday, you ll receive a hand-picked digest of the latest and greatest articles, as well as snippets of new market data, best practice guides and trends research.