When Google analytics doesn’t tell you the whole picture (and what you can do)

We’ve been investigating how and when the free version of Google Analytics uses data sampling to give you information on what’s happening on your website, and what that means for your data.

My colleague Adam put together his thoughts on the matter and I wanted to share these with you here.

Briefly, Google Analytics uses sampling in much the same way a research company does – to make assumptions about your site based on a sample of activity.

It only does this once the traffic on your site gets to a certain volume (around the million mark), or if you start changing the standard reports (creating segments, custom reports or secondary dimensions).

Mostly it does it for practical reasons: it takes an enormous amount of time and power to process high volumes of data, so analysing a sample that will still give you a true indication of what’s going on within your site is a good solution.

The way Google Analytics samples works well for most sites. If you have a fairly straightforward site with a relatively small number of sources, and regular flow of traffic, then Google Analytics reports will give you a very accurate picture of your site’s performance. That’s probably enough for a smaller site.

For very big sites however, especially those that have a lot of rapidly rotating content and an elaborate traffic profile, the sampling methods that standard GA uses may not be adequate when you start to use segments, custom reports and secondary dimensions, and can start to throw up some inaccurate results.

We pulled the data from one of our larger e-commerce clients to demonstrate how big a problem this can be (data has been anonymised).

Periscopix - Reworked numbers - for Econsultancy sampling post


The ‘s’ columns show you the results from the sampled report, the 'us' columns show you the results when that same report was unsampled. The really interesting parts are the % difference columns. Those tell you the percentage by which the sampled reports were off.

Have a quick look, and you can see that the sampling is doing a reasonable job for some traffic sources, but with others it’s all over the place.

If you’re playing around with Google Analytics and you notice that you’re having results change as above when you start to customise reports, that’s often a pretty good sign that you’re experiencing the effects of heavy sampling in your account.

If Google Analytics ever needs to sample, there will be a yellow box in the top right of your Google Analytics report telling you so (it’ll say something like ‘this report is based on 249385 visit, 13.46% of visits’).

If the percentage sampled is lower than 15%, it could be too small to give you a true picture of what’s happening on your site. You can change the size of the sample (but there’s still an upper limit on the free Google Analytics).

In some ways, sampling is a nice problem to have – it means that your site is seeing high volumes of traffic. But if you need a more accurate picture there are two basic things you can do to improve the accuracy of your reports: invest in an industrial strength analytics package (like Google Analytics Premium) or use one of these workarounds:

  • Use only standard reports. GA doesn’t start to sample until you modify reports with segments, secondary dimensions, or use custom reporting, so if you can get away with only using standard reports, go for it.
  • If you need to look at just a slice of your data, create a filtered profile (for example, to show ‘email only’ traffic). This will show you unsampled data. But of course, you will need to pre-empt your need and set the profile up proactively.
  • If your traffic is fairly consistent, try looking at a smaller date range. The smaller the range, the more data within that range will be analysed. But be careful that you don’t pick such a small range that you miss things like seasonal trends. You can use this method to pull the data out into a spreadsheet and piece the data from multiple periods together, though you’ll still need to be careful with this because you’ll need to recalculate calculated metrics and your unique visitor counts won’t add up properly.

If none of those things work for you, you should probably upgrade to something like Google Analytics Premium. The most common reason among our clients for using Google Analytics Premium is to get more accurate data for big or complex sites (particularly e-commerce companies).

It gives you faster data processing (in practical terms you get reports every four hours instead of every 24), and has a data sampling limit of around 200 times higher that of Google Analytics (at the moment – that’s likely to increase), enough for a large ecommerce site.

Google Analytics is a great tool for small to medium websites and even many large sites operating in very stable environments, but a paid tool will give you the flexibility and speed required to gain a more accurate view of a site with heavier-volume data in fast-moving markets.

Ben Gott is Head of Web Analytics at Periscopix and a guest blogger on Econsultancy.

Add your own

Reader comments (9)

  1. Nicholas Redding Nicholas Redding

    Senior Web Analyst at Hargreaves Lansdown

    10:01AM on 21st November 2012

    Hi Ben

    That's a very interesting article, and it's good to see this limitation of the free Google Analytics highlighted. It can be a real problem when comparing small segments on a high-traffic site. Often the margin of error from sampling is greater than the difference shown between segments. Not ideal when you're comparing conversion rates!

    One thing, though (and I'd love to be corrected on this). My understanding is that data sampling happens on the full data set, before any profile filters are applied. So I don't think setting up a filtered profile helps.

    Nick

  2. Ben  Gott Ben Gott Gold

    Head of Web Analytics at www.periscopix.co.uk

    9:36PM on 21st November 2012

    Hi Nick,

    Thanks for the comment.

    In a sense, you are correct. Sampling happens at the web property (not account) level so if you customise a report by adding a segment, filter etc the resulting dataset will be sampled based on the data at the web property level.

    The aim of creating profiles is to eradicate the need for customising reports by pre-empting that need and using a pre-aggregated profile instead of custom report.

    For example, if you commonly apply a segment to view mobile only traffic you could create a profile which only contains mobile traffic. Thereby removing the need to apply a segment and invoke sampling.

    Of course it is impossible to anticipate every need like this so it can only be applicable to those customisations you use regularly.

    I hope that helps?

    Ben

  3. Avatar-blank-50x50 Angelo Artuso

    10:18AM on 22nd November 2012

    Actually you are try to measure a lot of sources (variables) using sample so it's quite obvious that the results are inaccurate.
    GA is sampling let's say the first xxx rows of data or rather a random xxx rows of data collected in a period so the result it will display is related to sample.
    If you are going to measure two variables the on a sample of 1000 rows the result it will be obvius more accurate than measuring ten variables because the single variable sample is 5 times lower.
    Much deeper you are trying to perform the analysis more inaccurate it will be.
    It seems a oxymoron but is sadly true.

  4. Avatar-blank-50x50 Frederic Abrard

    11:31AM on 22nd November 2012

    This is a very interesting comment. The limits of traffic sampling are easily reached when you have traffic and conversions that change with the time of the day or the location for example. Depending of the reports you run you could indeed easily mix carrots and apples.
    There are alternatives that provide analysis on all the data. At CANDDi we provide a real time analytics that goes to the individual visitor level and allows full analysis on unsampled data in addition to real time intervention on the website based on the visitor.

  5. Avatar-blank-50x50 Ander Jáuregui

    5:09PM on 22nd November 2012

    Great Post...

    Also you should check if the e Commerce Tracking code is correct and appears on every single "success" page, and verify if the time zone of your CRM and GA are the same... among other thousand factors that can cause these differences... and you most know that GA will improve the sampling %

  6. Avatar-blank-50x50 Ulrik Sandholt

    3:55PM on 23rd November 2012

    Isnt unsampled reports only available as a downloadable CSV file in Premium and not in interface!?

  7. Anna Lewis Anna Lewis

    Digital Marketing Executive at Koozai

    5:12PM on 23rd November 2012

    I've been looking at this issue this week, it's very useful to see your direct comparison between full and sampled data. I think it's a shame that some data gets so heavily sampled, but how else are Google going to sell their Premium product than removing data from their free one. We can't expect everything for free and I'm quite excited about Premium.

  8. Avatar-blank-50x50 Marc Pearson

    9:19PM on 25th November 2012

    I think Google holds a lot of information back from us quite frequently. If they didn't they wouldn't be as powerful as they are. I tend to use Omniture whenever Google Analytics doesn't give me the information I need.

  9. Ben  Gott Ben Gott Gold

    Head of Web Analytics at www.periscopix.co.uk

    3:39PM on 27th November 2012

    @Ulrik Yes as things stand that is the case.

    @Anna, it's a common misconception that the sampling is a new thing post-premium. However it has been there for ages. Google have just made it more obvious when sampling is invoked so more people notice it.

    @Marc - thanks for the comment, I'm always looking for examples of areas other tools win on. Care to share any of the uses you have for Omniture over GA?

Log in to post a comment