1. Chris Baker

    Freelance Project Manager at Chris Baker FPM

    31 March 2004 17:10pm

    Chris Baker

    I've recently come across sites publishing statistics involving "number of distinct hosts", but have found it hard to find a good defintion of how this is measured.

    I gather that it is a count of IP addresses, perhaps with some jiggery-pokery to deal with cached pages, and folks who all have the same IP adress.

    If anyone could give me the low down on how this measurement is made and in what ways it is likely to over- or under count the number of actual people viewing the site, I'd be most grateful.

    Thanks

  2. Dan Zambonini

    Technical Director at Box UK

    31 March 2004 17:27pm

    Dan Zambonini

    From: http://www.mangonet.com/mango/faq/stats_info.html

    "This is the total number of distinct machines which have requested files from the server. This is not the total number of distinct people who have visited the site. A large ISP may have hundreds of people who log into one machine. If all these people access a given web site, then only one distinct host will be reported by this statistic since all the request came from one machine."

    In other words, distinct IP addresses that have visited your site in a given time. This will be an under-count (potentially) of the total number of unique visitors. For example, in a shared facility (university, library, etc), each computer will probably have a distinct IP address. If 20 people visit your site from that machine, it will only register as 1 distinct host.

    Also, depending on the type of 'network translation' protocol being used, a shared internet connection can sometimes send out requests from the same IP address. For example, a large corporation's office in London could have one fat pipe of bandwidth running into it. For proxying reasons, or otherwise, all requests from all internal computers go through an Internet Connection Sharing System. This will - potentially - forward on the requests - from it's own IP address - get the response back, then give it back to the original requestor. In this case, again the same IP address (same distinct host) will be registered for all requests, causing an under-count in your stats.

  3. Chris Baker

    Freelance Project Manager at Chris Baker FPM

    01 April 2004 09:17am

    Chris Baker

    Many thanks, that's realy helpful And I notice the link you give links on to a good, short, geek-free explanation of why Mango's (or anyone's ) statistics on ths point break down: http://www.mangonet.com/mango/faq/stats_accuracy.html

    I have seen the statement that distinct hosts will be an undercount by 25%- 50% beacuse of the factors you explain, but I guess that this will vary hugely - a site that draws customers from big corporations, or from AOL will be more affected than a site with lots of home broadband users.

    I recently saw the results of a user survey done by the medical journal publisher bmj.com. For one week every year, "access to the site entails completion of a questionnaire". Among the calculations they can do is the number of distinct hosts among their questionnaire respondents, and arrive at the number of 1.4 individuals per distinct host. Now I know ehat they mean!

    Their survey results are worth a look for anyone wanting to do this kind of research. If you're interested in journal publishing or in content sites that are moving from free access to premium content (BMJ.com is doing this in January 2005) there are additional reasons to look.
    Links to their survey data (2003 back to 1997) are at the foot of this page http://bmj.bmjjournals.com/aboutsite/visitorstats.shtml

    On 17:27:51 31 March 2004 Dan Zambonini wrote:
    >From: http://www.mangonet.com/mango/faq/stats_info.html
    >
    >"This is the total number of distinct machines which
    >have requested files from the server. This is not the
    >total number of distinct people who have visited the site.
    >A large ISP may have hundreds of people who log into one
    >machine. If all these people access a given web site, then
    >only one distinct host will be reported by this statistic
    >since all the request came from one machine."
    >
    >In other words, distinct IP addresses that have visited
    >your site in a given time. This will be an under-count
    >(potentially) of the total number of unique visitors. For
    >example, in a shared facility (university, library, etc),
    >each computer will probably have a distinct IP address.
    >If 20 people visit your site from that machine, it will
    >only register as 1 distinct host.
    >
    >Also, depending on the type of 'network translation'
    >protocol being used, a shared internet connection can
    >sometimes send out requests from the same IP address. For
    >example, a large corporation's office in London could have
    >one fat pipe of bandwidth running into it. For proxying
    >reasons, or otherwise, all requests from all internal
    >computers go through an Internet Connection Sharing
    >System. This will - potentially - forward on the requests
    >- from it's own IP address - get the response back, then
    >give it back to the original requestor. In this case,
    >again the same IP address (same distinct host) will be
    >registered for all requests, causing an under-count in
    >your stats.

  4. Paul Cook

    Founder at TagMan

    16 April 2004 08:41am

    Paul Cook

    It's probably worth mentioning our study into the accuracy of IP and cookie based analytics, The RedEye Report. We found that IP based stats could over estimate unique users by 7.6 times and cookie based stats could over estimate by 2.3 times. As far as metric definitions go, they are decided by JICWEBS (http://www.jicwebs.org/) and a good list of approved definitions can be found on ABCe's site (http://www.abce.org.uk/cgi-bin/gen5?runprog=abce/abce&type=page&p=definitions.html&menuid=rulesaregs|definitions)

Reply to this thread

Log in to reply to this thread or join Econsultancy for free so you can post to our forums along with other benefits.