Multivariate tests, whilst marvellous things, are becoming "quick and dirty". The ease of deployment, WYSIWYG variant creation, and on-demand "live" results means that these supposedly scientific tests are being created, executed and reported on in a fashion at odds with their scientific underpinnings.

In this post, I'll try to go through what makes MVT a scientific methodology, the pitfalls of quick testing, and how to get the best out of your tests.

### Multivariate testing is all about Maths

Your website is a mathematical model. Run with me on this analogy, it holds up I promise.

You have a series of variables, X1, X2, X3, X9million, which denote things on your website that can change. Bits of text, colours, positions, whether a picture of a person is smiling or not, and so on.
So you know that some function of X1,X2,X3 etc = Y1, your conversion rate.

Multivariate testing comes from a field of statistics known as multivariate analysis, designed to evaluate the importance of a variable on a result, so that you can get from a set of hundreds of variables in your model to just a handful. MVT goes one step further, in that not only can we measure significance of a variable, but also test multiple values of it.

That's the most boring maths-y part of the post and I promise it won't get more maths-y that that. It might get more boring, but that's up to you.

You should proceed as if it were a scientific test. When you are going to create your next MVT, sit down and write out a specification document, which should cover:

Introduction

• Why are you testing?
• What tests have been conducted previously that have relevance to this one?
• What is the objective of the test?
• What is your success criteria - how will the winner be declared?

Method

• The tool that you're using.
• The variants created, and their reason for creation.
• Any segmentations and limiters in place.

Results

• The time the test took to run.
• The ultimate sample size.
• The measured rate of required activity.
• The confidence rate you will accept.

Discussion

• What was your "hunch" before running the test`/
• Did the test agree with your hunch?
• Why do you think the winning variant won?
• How could the winning variant be improved further?

Done? Great! You're ready to run your test. However before you do, bear in mind.....

(be warned, I rant a bit here)

### There are always external factors

In any test, there's always external factors which will affect your results. Your tests aren't being performed in isolation, but you have marketing campaigns, PR, new content, sales and promotions which will draw different types of visitors to your site.

If you have a content variant that leans itself to a particularly visitor type, your results will be skewed by it. Even by performing the test itself, with download times and extra javascript, you're going to be affecting the results.

Yes, it's just like the Heisenberg Uncertainty Principle, only for like websites and stuff.

You can't do anything about this, your marketing team is not going to down tools for a week whilst you run a test.

You can partially protect yourself (if your MVT tool allows) by segmenting your test on entries from a particular keyword or campaign, or cookie value if you've performed an RFM segmentation or similar previously, but still you won't get the full picture.

This might sound like there's no point doing Multivariate Testing, since you can never be completely confident that something is working or not. I'm not saying that, of course, but this makes it clear that whatever MVT tool your using it doesn't know all the facts.

### Interpretation is always needed

Since your multivariate testing tool doesn't know everything, whatever result it gives you requires some further analysis. you just can't take it as rote that a particular piece of content performs well exactly on its own.

Sometimes you do have to put your cod psychology hat on and think about what it was about that particular variation that made it work. Was there any merchandising near it that complemented it in terms of tone of voice or imagery, for example.

### You're either confident, or your not

So, your running your test, and hey, within an hour, you've got a 2000% increase in conversion because you've changed the word 'Register' to 'Continue'. Hooray, let's all go home for tea and cucumber sandwiches!

poppycock.

Multivariate tests take time, they really do. If you're going to be thorough, and run a full factorial test (running every content variant against every other content variant), then you're in for the long haul.

Google provides a handy MV test duration calculator. Be prepared to sigh.

The primary danger with multivariate testing, in that you're often shown results live. Some testing engines will even call a winner when it thinks it has statistical significance with 50 or so conversions.

50?! A sample size of 50! You can't even make a spurious claim in a TV ad for hairspray with a sample size of 50!

Just like a hairspray ad, volume is everything.

### One test is rarely enough

When you look at your analytics, you can always see browsing and purchasing patterns. That peak just after payday, that trough when it was sunny that weekend - remember these apply to your test as well.

In any scientific tests, no results are said to be conclusive unless they can be externally recreated and ratified. Now of course, you can't run your test on a different website, but you can run it at a different time, which will bring with it a different set of conditions (see external factors above). If you're to be sure that a content variant will outperform all others, you need to run the exact same test again

In fact, you should take this as an opportunity to run a follow up test, against the original control content, refining the “winning” content with further variants.

### Most magic bullets don't exist

We’ve read fantastic stories about how changing one word massively increased the conversion rate. Look! It's the Three Hundred Million Dollar Button!

But these examples are the anomalies, don’t expect to see the same results, manage your expectations. Huge increases in conversion rate rarely come from a single change, but a larger, transformational event that not only encompasses the site, but also your marketing and merchandising.

This is why MVT is called optimisation, you are using it to finely tweak an already working design. Obviously, you can test out new design ideas, but remember that they might not work as well within the context of the larger site.

### Not every test will work

You're certain that the particular design you've created is going to be the winner. It has to be. It's so obvious…

…but it doesn't happen

Sometimes, if you're lucky, it will impact conversion negatively, which will give you something to further analyse. In the worst cases, it does diddly.

It's quite depressing when it happens but it does. Be prepared for it.

We tend to treat our website users like children, thinking they will blindly follow every link and piece of microcopy to the letter. However, content and usability folks will be the first to admit that most of the text on a website doesn't get read. When it's skimmed, the visitors will infer the meaning, so making some small changes to a piece of microcopy isn't going to change the world for you.

It all sounds pretty negative, huh? Don't be disheartened! Multivariate Testing is a great tool, but before you start posting amazing results all around the internet, make sure those results stand up to further analysis!

Matt Curry is Head of E-commerce for online sex toy retailer LoveHoney. He spends a lot of time working on user experience and customer satisfaction is his highest priority. He frequently has to be penetration tested. You can follow him on Twitter, although he does often talk about dildos. He also has a LinkedIn profile, where he has to act professional.

1. Garious

2:44PM on 20th October 2010

I've been doing multivariant tests myself and I agree with you; you should not treat readers like children.  You may get away with a word or two but just because they clicked doesn't mean they will convert into buying customers.  Still, it's all a trial and error thing out there and the sooner your fail, the better - so you can get more tests running until you get the right mix.  Thanks for the lovely advice here.

2. Tim Watson Enterprise

Founder at Zettasphere

8:40PM on 20th October 2010

All good points, I'd add to "What is your success criteria - how will the winner be declared" that make the success metric as close as possible to your marketing objective. For example, higher page bounce may still give more conversions to the real page objective.

3. Chris Rourke Small Business Multi-user

Managing Director at User Vision

7:39AM on 21st October 2010

Thanks for pointing out some of the benefits, limits and pitfalls of MV testing. In my experience it can be useful and is attractive to management that wants numbers, and assume numbers = The Truth, as opposed to fluffy opinions that need substantiating.  However there are two questions that typically can't be answered with confidence after a test: why? and what could have been better? MVT is typically better at helping find errors of commission than errors of ommission - i.e. it can indicate what visible thing on your page needed to be improved, but less so on the question "what else could we do to help convince you / remove your fear factors / build trust/ make it look better" etc, and many other questions that help you understand your customers.

Those things are best brought out in a good old chat with end users - certainly not in the volume needed for MV testing, but a smaller, representative sample such as for a well run usability test.   The other main drawback is that it needs to be done on a live site to have the volume needed. As is commonly accepted, it is far better to learn your lessons  before the launch with prototypes, than to go live without any user input and fix after.  Some think it is better to simply put some thing up live then tweak the heck out of it through various A/B & MV tests until it is perfect.   Remember each of those that had a bad experience on version A is unlikely to return to see version B later on so learn the lessons before where possible.  So is MV testing useful? Yes especially for refining and doing the icing on the cake of a mostly complete site. Is it a magic bullet? No, and certainly not if you want to know why and really understand your customers

4. Dan Huddart

Head of Analytics & Web Development at RSA Group

2:45PM on 26th October 2010

In regard to the external factors, there is something you can do to isolate this.

Implementing a control group (where you just serve up the existing default web pages) gives a static base against which to measure each tested variant. With a sufficient sample size, this can enable statistically sound results.

I agree that MVT is better suited to an optimisation approach, and have found the greatest value is to be gained from testing alterations to the existing pages before deciding whether to make a permanent change. With a well managed MVT programme, you can ensure that no change you make will ever have a negative impact on conversion.

5. Nate

4:21PM on 27th October 2010

Playing off what Dan Huddart says, you can take a control and an experimental group and change multiple variables within the experimental group simultaneously. After an amount of time determined by sample variance and size (compared to population), one "only" needs to fit the experimental variables by estimation and use the coefficients on the estimated model to determine what to do. I'd also caution that plenty of websites have 300 million dollar buttons, in the sense that if one was to remove them, the business's revenue would drop 300 million dollars (that final button to submit payment comes to mind...). I'd also caution that a sample size of 50 might be fine, provided it produces unanimous results. The trouble is, it always seems like interesting questions usually don't result in consistent answers. In my mind experimental setup, hypothesis testing and multivariate estimation are all clearly definable and delineated, but "Multivariate Testing" always seem to be a hazy combination of these things. I think, probably, refering to "Multivariate Testing" is something that we should only ever do in the context of a specific problem. All in all, love a conversation that makes me think!