Log file management
Job of the week
Featured threads
- How relevant do links need to be? 14 replies
- Tracking Online Response to Marketing/Communications Activities 8 replies
- Behavioural targeting software 4 replies
- Penalty avoidance on English-speaking foreign sites 5 replies
- 3 way linking - good or bad? 21 replies
Most viewed threads in last month
Most active threads in last month
- Best Practice SEO Guide Jan 2012 2 replies
- Social networks for business: to be or not to be… 0 replies
- Acceretle 0 replies
- ZNAP 0 replies
- internet marketing consulting service 0 replies

CEO at Econsultancy
08 October 2000 18:24pm
The log files that your web server generates as users interact with the site are very valuable as you can extract a lot of useful behvioural information from them. Here are some tips on how to manage log files:
1. ECLF or CLF? You can configure most web servers to create Common Log Files or Extended Common Log Files. You get a lot more information with the ECLF format so in general go for this. However, bear in mind that ECLF log files are much larger than their CLF equivalent and more CPU power is taken up creating them
2. Large log files. Although log files are only plain text they can get very large – over 100mb a day on busy sites. Be careful about filling up your disk space! Log files are written to the C drive of the server by default. This could fill up very quickly and cause problems with the proper functioning of the applications. Consider writing log files to a larger non-system drive. Back them up and zip them up to conserve space. Storing log files by month is common practice for month by month analysis and reports.
3. CPU demands. Remember that running a log analysis tool on the server uses up a fair bit of CPU power. This could adversely affect the performance of the site and it might take several hours to run the report. Set the analysis tool to run automatically when the site is least busy. Many software tools allow for real time log analysis. Although exciting to look at, remember the performance overheads this incurs.
4. Corrupt log files. If the log file is there but the analysis software cannot read it then the log file might be corrupt. Try deleting the last and first 10 lines of the log file and re-running the software. Often this cures the problem
5. Missing log files. Are you sure they have not written elsewhere on the system? Have they written to their default location on the system drive? If they are really missing then there is not much you can do. You might be tempted to copy and paste an equivalent day’s log file and change the date by search and replace…?
6. Misleading logs. Be aware how log files can be misleading. Many ISPs use dynamic IP addressing which means that the same person could be logged in your log files with a different IP address each time, thus appearing to the analysis software as a different user. A corporate proxy server, on the other hand, will only register as one user when they may represent the usage of 1,000 people. Pages cached by proxy servers or by the user’s browser and the use of frames can give false levels of page impressions – one for the better, the other for the worse. Internal use and the activity of non-humans e.g. search engine spiders should, ideally, be stripped out at the analysis stage.
CTO at Siegelgale
16 October 2000 14:01pm
Re: the interesting article on User Log info, anyone know of the best/recomended tools for analysing Log info? I know of Web Trends as a package. Any others?
On 18:24:12 8 October 2000 ashley wrote:
>The log files that your web server generates as users
>interact with the site are very valuable as you can
>extract a lot of useful behvioural information from them.
>Here are some tips on how to manage log files:
>
>1. ECLF or CLF? You can configure most web servers to
>create Common Log Files or Extended Common Log Files. You
>get a lot more information with the ECLF format so in
>general go for this. However, bear in mind that ECLF log
>files are much larger than their CLF equivalent and more
>CPU power is taken up creating them
>
>2. Large log files. Although log files are only plain text
>they can get very large – over 100mb a day on busy
>sites. Be careful about filling up your disk space! Log
>files are written to the C drive of the server by default.
>This could fill up very quickly and cause problems with
>the proper functioning of the applications. Consider
>writing log files to a larger non-system drive. Back them
>up and zip them up to conserve space. Storing log files by
>month is common practice for month by month analysis and
>reports.
>
>3. CPU demands. Remember that running a log analysis tool
>on the server uses up a fair bit of CPU power. This could
>adversely affect the performance of the site and it might
>take several hours to run the report. Set the analysis
>tool to run automatically when the site is least busy.
>Many software tools allow for real time log analysis.
>Although exciting to look at, remember the performance
>overheads this incurs.
>
>4. Corrupt log files. If the log file is there but the
>analysis software cannot read it then the log file might
>be corrupt. Try deleting the last and first 10 lines of
>the log file and re-running the software. Often this cures
>the problem
>
>5. Missing log files. Are you sure they have not written
>elsewhere on the system? Have they written to their
>default location on the system drive? If they are really
>missing then there is not much you can do. You might be
>tempted to copy and paste an equivalent day’s log
>file and change the date by search and replace…?
>
>6. Misleading logs. Be aware how log files can be
>misleading. Many ISPs use dynamic IP addressing which
>means that the same person could be logged in your log
>files with a different IP address each time, thus
>appearing to the analysis software as a different user. A
>corporate proxy server, on the other hand, will only
>register as one user when they may represent the usage of
>1,000 people. Pages cached by proxy servers or by the
>user’s browser and the use of frames can give false
>levels of page impressions – one for the better, the
>other for the worse. Internal use and the activity of
>non-humans e.g. search engine spiders should, ideally, be
>stripped out at the analysis stage.
CEO at Econsultancy
16 October 2000 18:48pm
Below are some of the usual suspects, roughly in order of increasing price / sophistication. If you just want standard log file analysis then Webtrends is pretty much the industry standard.
If you want more customer traffic / behavioural analysis, personalisation, and more customisable and insightful web usage intelligence then things get more advanced and expensive, ultimately ending up with the likes of Broadvision, Vignette, NetPerceptions, Broadbase et al, who concentrate on using such intelligence and customer profiles for dynamic personalisation, eCRM, online marketing campaigns etc. etc.
MediaHouse LiveStats
http://www.mediahouse.com/statisticsserver/
Webtrends LogAnalyser
http://www.webtrends.com/products/log/default.htm
Accrue Insight / Hitlist
http://www.accrue.com/products/hitlist.html
Macromedia's Aria
http://www.macromedia.com/software/aria/
Engage’s IPRO
http://www.engage.com/ipro
NetGenesis
http://www.netgenesis.com/products/netgenesis.cfm
CEO at Econsultancy
16 October 2000 19:32pm
I've posted something on Internet Metrics in the Advertising forum that should be of interest to anyone involved in log file analysis. It describes the metrics which log files give you (unique users, page impressions, session lengths etc.), their pros and cons and other related information.
CEO at Econsultancy
25 October 2000 14:28pm
Just following up on this thread there is a very useful and informative paper by Mark Rosenstein which discusses the details of the possibilities and pitfalls in using web server logs to understand customer behaviour on a web site.
It can be found at http://www.apparent-wind.com/mbr/papers/ec2000.pdf