Dynamic IP addresses within 'session'
Featured threads
How long should we give our SEO company? 2 replies
Twitter Historical data 2 replies
Agency & supplier input needed! 0 replies
Overseas Delivery Calculation 1 reply
Retail Home Page layout 2 replies
Most viewed threads in last month
Impact of social media on the future of customer service? 464 views
The LinkedIn "Action" Bar - any thoughts? 404 views
Drop off in Forum Posts here 344 views
Looking for French SEO specialist for urgent work 276 views
Agency & supplier input needed! 268 views
Most active threads in last month
Looking for French SEO specialist for urgent work 5 replies
Drop off in Forum Posts here 5 replies
How i can do online marketing for my business? 5 replies
Impact of social media on the future of customer service? 3 replies
Website Analytics help 3 replies
Head of Digital Metrics & Analytics at MindShare Interaction
02 June 2003 12:27pm
I thought other subscribers may be interested in a 'feature' of web-logs which doesn't seem to be common knowledge. About a year ago I discovered that a number of ISPs (AOL in particular) are dynamically assigning IP address within session, meaning that one visit through an 'offending' ISP will generate file requests in the web-log under multiple IP addresses. This obviously has the effect of 'cutting' the visit into multiple shorter visits when the web-logs are parsed for analysis. Depending on the type of site (e.g. consumer) and the type of media being used to promote the site (e.g. banners on AOL) this can have a very negative impact on the quality of data obtained for site analysis.
Since learning of this I have been extremely wary of using any web-log based data, and will only trust the 2nd or 3rd generation Javascript tag generated data. Slightly more worrying is the fact that a couple of (web-log based analysis) vendors I have spoken to since were unaware of this issue.
Does anybody else have similar experience?
Marketing Director at WebAbacus
03 June 2003 12:43pm
Well, this is certainly something we're aware of. It's a particular problem with AOL, which round-robins people through about 30 proxies during their session. However, there's nothing intrinsic about server logs which makes them vulnerable to this kind of problem - the problem is related to how you identify your visitors.
If you use the visitor's IP address (and perhaps their User Agent) to identify them, then this problem (plus all the other proxy goings on on the Internet) will give you very inaccurate visitor numbers and session paths. This is as true for a tagging approach as it is a server log approach.
The thing to do is issue a persistent (or at least session) cookie to the browser and log this, either by getting it written into the server log, or making sure your tag code picks it up and includes it in the tag request. Then you can use the cookie as the session identifier and these problems go away.
On balance, I think it's perhaps slightly neater to go with the tag approach, mainly because it's a one-step operation and doesn't (usually) require any server configuration. And you get the added benefit of cache-busting too, so your PI numbers (though not necessarily your visits) will go up (probably).
Cheers,
Ian Thomas
WebAbacus
www.webacus.com
Engagement Manager at Omniture
03 June 2003 14:32pm
Ian is correct in so far as the degree of inaccuracy caused by dynamic IP addresses depends on whether the IP is being used for unique user identification. However, the vast majority of client-side tracking technologies do employ a cookie-based UUID strategy, if only as a bi-product of their cache-busting tracking technique, so log analysis tools are more likely to suffer at the hands of this phenomenon. On the other hand, organisations that pursue a strict "no cookies" policy for their sites, will only ever approach a log analysis tool vendor for tracking. Downside and upside in the log tool vs. client-side tracking debate. Who'd have thought it?!
Further points.......
IP Masquerading.
Some caching proxies will forward the request that it received to the web server masquerading as the client IP. Others will make the request on behalf of the client, so the IP address of the caching proxy machine is what appears in the webserver log file. So IP masquerading is good for user identification where no cookies are in use. If working for a log analysis tool vendor, a useful question to ask of any prospective client who doesn't employ cookies, might be: does your cache support IP masquerading?
Workaround for more accurate user identification when missing cookies:
This has limited application, I admit. But whilst working for a previous employer (a log analysis tool vendor), we were faced with a bank which wanted to track usage of its intranet by branch. Unfortunately, all branches were assigned dynamic IPs on the company WAN. I cannot take credit for this, but some bright spark mentioned that IE versions 5 and above (and some other browsers) allow for user configuration of the agent string as it appears in the log file. So all workstations at branch one had their browser agent customized with an appeneded "branch=001" and so on. Thus, reports by browser gave you reports by branch. A contrived user identification strategy, but it did work!
The effect of cookies on stats:
The following are all true when moving to cookie-based user identification:-
1 - Your user numbers will go up
2 - Your user numbers will go down
3 - The number of visits will go up
4 - The number of visits will go down
5 - The page impressions per visit will go down
6 - The page impressions per visit will go up
If you consider the greater granularity achieved using cookies instead of proxy IPS (think proxy caches which don’t IP masquerade, AOL, etc.), then 1, 3 and 5 are all true. If you consider dynamic IP assignment intra-session (load-balanced proxy servers, AOL again), then 2, 4 and 6 are correct.
As ever, context is everything.
Regards,
David Brown.
Clickstream Technologies.
On 12:43:40 3 June 2003 thomaid wrote:
>Well, this is certainly something we're aware of. It's a
>particular problem with AOL, which round-robins people
>through about 30 proxies during their session. However,
>there's nothing intrinsic about server logs which makes
>them vulnerable to this kind of problem - the problem is
>related to how you identify your visitors.
>
>If you use the visitor's IP address (and perhaps their
>User Agent) to identify them, then this problem (plus all
>the other proxy goings on on the Internet) will give you
>very inaccurate visitor numbers and session paths. This is
>as true for a tagging approach as it is a server log
>approach.
>
>The thing to do is issue a persistent (or at least
>session) cookie to the browser and log this, either by
>getting it written into the server log, or making sure
>your tag code picks it up and includes it in the tag
>request. Then you can use the cookie as the session
>identifier and these problems go away.
>
>On balance, I think it's perhaps slightly neater to go
>with the tag approach, mainly because it's a one-step
>operation and doesn't (usually) require any server
>configuration. And you get the added benefit of
>cache-busting too, so your PI numbers (though not
>necessarily your visits) will go up (probably).
>
>Cheers,
>
>Ian Thomas
>WebAbacus
>www.webacus.com