1. Allison Phillips

    Digital Trust

    27 July 2006 15:34pm

    Allison Phillips

    In a practical world of revenue targets, it can be difficult to always slow down long enough to stop and test and then carefully analyze your results. Understanding the validity of your data can help you to quickly make decisions and truly understand what a test is telling you.

    Simply put, if you have a larger variance between two results, then you will need a smaller sample size to achieve a strong degree of confidence.

    Imagine these are the results of a ficticious landing page optimization test:

    Treatment

    Unique Visits

    Leads

    Conversion

    Landing Page A

    4,203

    32

    0.76%

    Landing Page B

    3,454

    534

    15.46%

    In this particular example, the difference between the number of leads is significant. Using our intuition, we can see that Landing Page B outperformed Landing Page A. However the sample size for Landing Page A Leads is still relatively small, so there is a high amount of room for error caused from sampling. There are obviously very complex algorithms for calculating the statistical relevance of a given data sample.

    For a free tool for calculating validity:

    Go to www.marketingexperiments.com/validity.html

  2. Bryan James

    DITIG Inc

    27 July 2006 15:42pm

    Bryan James

    Yea, Marketingexperiments.com is freaking awesome.   This is another one of those cool things they give away for free, why they give it away for free - who knows??? They have free clinics and they even offer a course in online testing - like an online college class.... I definetly recommend them to anyone who wants to know anything about online testing and marketing....

    On 15:34:13 27 July 2006 Aphillips wrote:

    In a practical world of revenue targets, it can be difficult to always slow down long enough to stop and test and then carefully analyze your results. Understanding the validity of your data can help you to quickly make decisions and truly understand what a test is telling you.

    Simply put, if you have a larger variance between two results, then you will need a smaller sample size to achieve a strong degree of confidence.

    Imagine these are the results of a ficticious landing page optimization test:

    Treatment

    Unique Visits

    Leads

    Conversion

    Landing Page A

    4,203

    32

    0.76%

    Landing Page B

    3,454

    534

    15.46%

    In this particular example, the difference between the number of leads is significant. Using our intuition, we can see that Landing Page B outperformed Landing Page A. However the sample size for Landing Page A Leads is still relatively small, so there is a high amount of room for error caused from sampling. There are obviously very complex algorithms for calculating the statistical relevance of a given data sample.

    For a free tool for calculating validity:

    Go to www.marketingexperiments.com/validity.html

  3. dan barker

    E-Business Consultant at Dan Barker

    28 July 2006 11:06am

    dan barker

    a very simple way to enhance the reliability of A/B testing is to switch to A/B/A testing.

    as a simplistic example: you send an email to 999 people. instead of splitting that group into 2, you split it into 3 groups of 333 (group A1, group B, group A2). you use one subject line for both of the A groups, and a different subject line for the B group. This might result in something like the following:

    group A1: 50 emails opened

    group B: 150 emails opened 

    at this stage, it looks like the 'B' subject line has totally outperformed the 'A' subject line. But can you be sure that the results are purely caused by the subject line? Perhaps the email took 3 hours to send, and the 'B' group all got their emails a little later than the 'A1' group. Perhaps the email broadcast engine plucked the 'A1' group from the start of your database (ie. they're older email addresses). Perhaps the 'B' group just coincidentally contained a large group of your best clients. The results of 'A2' can help to answer that question:

    • say group 'A2' resulted in 200 emails opened, you could summarise that there was little difference between results caused by the 2 subject lines, & that your test wasn't totally reliable.
    • say group 'A2' resulted in 100 opened emails, you could summarise that subject B had outperformed subject A, but that the test wasn't totally reliable.
    • or - best case scenario, say group 'A2' resulted in 50 emails opened, you can be fairly happy that the test was reliable, and that subject B had drastically outperformed subject A.

    The system is by no means bulletproof, but it's a really simple add-on to A/B testing that can tell you a lot about the reliability of your results, and can quickly highlight any large problems with the way you're carrying out tests.

Reply to this thread

Log in to reply to this thread or join Econsultancy for free so you can post to our forums along with other benefits.