More than 60% of website tests result in false positives because the testing methodology is incorrect, according to a report from QuBit.
The research flagged up a number of common issues for misleading results, including not having a sensible hypothesis prior to testing, not calculating the required sample size and abandoning tests as soon as a result is seen rather than until completion.
Simon Jaffrery, Qubit senior business and financial analyst, said,”There is no rigid framework for testing in place. There has been a discussion around science versus art for years – when it comes to website design it used to be art, but it has become more about science with the advent of data, but testing is still a black hole.”
QuBit simulated the outcomes of four common testing strategies over a sample of 10,000 experiments in order to compile the research. The tests were carried out on a purpose built website run by a fictional character called Mr Bean (pictured), which uses Google Analytics and has a visitor conversion rate of 5% and 5,000 visits a day.
The tests targeted 50% of all users for simplicity and also assumed that they used conversion rate as the end metric and use a proportion of means test to calculate p-value, which is the probability that a result will occur if there is no true difference between the results of a test.
When conducting tests many companies carry out too many at the same time and look for results on a daily basis, abandoning the test as soon as a positive result is seen rather than letting the test run its course, according to Jaffery, which could lead to the wrong changes being made to the site.
The sample test revealed that in this situation 63% of the positive test results did not have a positive effect when implemented on the site.
Jaffrey said, “It’s important to make sure tests run to completion before any conclusions are made.”
Another simulation test highlighted the importance of calculating the sample size prior to beginning tests or else positive results will be missed.
The simulated test showed that only four out of ten positive results actually created an uplift when implemented on the site. Plus, only 64% of the positive results were identified.
It is also vital to formulate a sensible hypothesis before conducting the test, which according to Jaffery will reduce the number of results wrongly labelled as having a positive effect.
He said a good starting point is to gather user feedback and customer opinion, before singling out where this problem occurs and the behavioural traits of those affected by it. When taking these three points into account, the simulated test resulted in more than 95% of positive test results having a positive effect when implemented on the site.