Showing posts 11 - 13 of 13
1. Director at Quayside Clothing Limited

06 August 2008 21:07pm

Thanks Matt and TMGuys

I thought there would be a standard calculation/methodology. How wrong I was !

I asked the question on http://www.measuringusability.com/ and Jeff very kindly answered my question:

Question : I am doing A/B testing on an ecommerce web site. I have been working on rule of thumb of 500 vistors through each page or 50 conversions on one of the page before ending the test and deciding which is the best page. I assume there is more scientific way of deciding when to end the test. How is this calculated

So what you are doing is comparing two proportions and there is indeed a more "scientific" way of determining if there is a significant difference. If your sample is sufficiently large (~ above 100) you can use the normal distribution to make inferences about the difference you observe. What you want to know is the difference you see between two pages greater than what would be expected by chance alone.

I'll walk through an example. Lets assume you observe 50/200 in one version, then 30/300 in another version. The two proportions you are calculating are then .25 and .10 and the difference is .15. Given your sample size of 500 (200 in one page and 300 in another) can we conclude the difference of .15 is greater than chance?

You divide this difference by the square-root of a denominator which accounts for the chance which will provide you with a z-score. The denominator is

1         1
--   +  --     * PQ
n1       n2

Where P = (x1 + x2)/(n1+n2)  Q is 1-P. The x's are just the numver of conversions and the n's are the sample size.

P = (50+30)/(200+300) = .16
Q = 1-.16 = .84
PQ = .84*.16 = .1344

1/n1 + 1/n2 = 1/200 + 1/300 = .0083

So multiply .1344 * .0083 = 0.00112

No the square root of this is SQRT(0.00112) = .03346

So the equation is the observed difference .15/.03346 = 4.482

That result is the z-score which is your test-statistic. You now look this value up using the z-score to percentile calculator using the 2-sided area. You should get 0.0007 or there is less than a .07% chance the difference is due to chance.  With this data I'd conclude with a lot of confidence that the difference is statistically significant. The next question is: is the difference of 15 percentage conversion points good enough? The answer depends on what your goals are, but that sounds pretty good to me.

2. Founder and President at Maxymiser

07 August 2008 11:11am

Hi mc33 – We have a test calculator as part of our technology which can give an estimation of test duration based upon number of variants, visitor level and existing conversion rate. I would be happy to run your figures through that for you. I don’t think this forum allows private messages so please submit our webform at http://www.maxymiser.com/contact-us.htm and mark it FAO Alasdair or give me a call on 0207 149 3730.

It varies very much dependant upon the relative success of different variants but as a very rough rule, you need around about 2000 conversions / visitor actions per variant to get to a good confidence level for making a business decision.

Alasdair

3. Enterprise

Founding Partner & CEO at Essence

07 August 2008 12:10pm

Hi Michael,

As I said in my previous post, it is complicated!

What Jeff has described to you is a standard methodology.  Obviously it is a bit of a pain to calculate manually but if you want the spreadsheet still happy to give it to you.

I am reassured that Jeff can so readily pull up the details.  On the other hand I find it scary that there is so much debate on these issues without referring to the statistical basis for tests of this type.  Much of the debate here has been about 'rules of thumb' for estimating test duration etc.  If that was what you were after then you've got a heap of options already - none of them is perfect for the simple reason that there is no valid statistical basis for them!

However, the key thing to realise is that as soon as you start your test you should stop relying on your 'estimator' and use the proper statistical test to determine if you have reached your desired confidence level.

Now, while we've all been debating here you could have got your test live and be collecting data.  The real key to successful site testing/optimisation is:

1. New design ideas based on user insights/needs
What changes do you think will actually change user behaviour?
2. Proper test design
An A/B split test is pretty simple.  If you get into multi-variate/partial factorial it gets more complicated
3. Just do it
There's no better learning than to do some tests and see what happens.  You'll make a few mistakes but it will all be progress.  Never fear - if your test design makes things worse you can just switch it off and revert to your previous version.

Note, while this may sound like me encouraging you to get to step 3. rapidly in reality your results will only be as good as your original ideas - i.e. if your ideas are poor you can run tests forever and you will still get no uplift.

But I can say from personal experience that uplifts of >30% on individual tests are perfectly achievable and sustainable.  And this is from seriously large scale tests (at Essence we run TalkTalk's site optimisation programme).

Good luck!

Matt
Essence - online marketing agency

On 21:07:55 6 August 2008 mc33 wrote:

Thanks Matt and TMGuys

I thought there would be a standard calculation/methodology. How wrong I was !

I asked the question on http://www.measuringusability.com/ and Jeff very kindly answered my question:

Question : I am doing A/B testing on an ecommerce web site. I have been working on rule of thumb of 500 vistors through each page or 50 conversions on one of the page before ending the test and deciding which is the best page. I assume there is more scientific way of deciding when to end the test. How is this calculated

So what you are doing is comparing two proportions and there is indeed a more "scientific" way of determining if there is a significant difference. If your sample is sufficiently large (~ above 100) you can use the normal distribution to make inferences about the difference you observe. What you want to know is the difference you see between two pages greater than what would be expected by chance alone.

I'll walk through an example. Lets assume you observe 50/200 in one version, then 30/300 in another version. The two proportions you are calculating are then .25 and .10 and the difference is .15. Given your sample size of 500 (200 in one page and 300 in another) can we conclude the difference of .15 is greater than chance?

You divide this difference by the square-root of a denominator which accounts for the chance which will provide you with a z-score. The denominator is

1         1
--   +  --     * PQ
n1       n2

Where P = (x1 + x2)/(n1+n2)  Q is 1-P. The x's are just the numver of conversions and the n's are the sample size.

P = (50+30)/(200+300) = .16
Q = 1-.16 = .84
PQ = .84*.16 = .1344

1/n1 + 1/n2 = 1/200 + 1/300 = .0083

So multiply .1344 * .0083 = 0.00112

No the square root of this is SQRT(0.00112) = .03346

So the equation is the observed difference .15/.03346 = 4.482

That result is the z-score which is your test-statistic. You now look this value up using the z-score to percentile calculator using the 2-sided area. You should get 0.0007 or there is less than a .07% chance the difference is due to chance.  With this data I'd conclude with a lot of confidence that the difference is statistically significant. The next question is: is the difference of 15 percentage conversion points good enough? The answer depends on what your goals are, but that sounds pretty good to me.