This topic covers something I've been advocating for some time, but I haven't really found anyone else doing it in the industry, so I would very much appreciate a discussion on the subject. So that I can get a grasp on wether I'm brilliant or totally gone with the fairies.
The thing is this:
One of the major problems with metrics suppliers, and even more with hosting services, is that accumulated data takes up a lot of storage space on servers. That creates a cost naturally. In order to save space and money a lot of hosting services and metrics providers either deletes the data after a certain amount of time, or abstracts the data in order to compress it.
Now, the problem with that is obviously that since many clients don't know about this until it's too late and when they want to go back to historical data in order to compare and find trends, then they find the data either gone or too compressed/abstracted.
With that in mind I've been advocating that instead of abstracting data, many clients could benefit from a representative statistics approach. That is, since we're talking large volumes of visitor data we can make conclusions that are general, based on smaller samples of the data we collect. Every third visitor, every fifth, or even every tenth, could be sufficient to make general conclusions.
With that approach you can also eliminate some of the problems with authentication of visitors, with the whole third-party cookie rejection problematics etc. There are a vavast amount of visitors that can be properly authenticated through a combination of IP-numbers, cookies and tags. They should be put in the "sample" category and used for the representative analysis. Thus, making smaller databases that you can collect uncompressed data in for longer amount of periods, with a maintained if not even higher accuracy of the analysises.
Managerial at Nellis and friends
08 February 2006 11:20am
This topic covers something I've been advocating for some time, but I haven't really found anyone else doing it in the industry, so I would very much appreciate a discussion on the subject. So that I can get a grasp on wether I'm brilliant or totally gone with the fairies.
The thing is this:
One of the major problems with metrics suppliers, and even more with hosting services, is that accumulated data takes up a lot of storage space on servers. That creates a cost naturally. In order to save space and money a lot of hosting services and metrics providers either deletes the data after a certain amount of time, or abstracts the data in order to compress it.
Now, the problem with that is obviously that since many clients don't know about this until it's too late and when they want to go back to historical data in order to compare and find trends, then they find the data either gone or too compressed/abstracted.
With that in mind I've been advocating that instead of abstracting data, many clients could benefit from a representative statistics approach. That is, since we're talking large volumes of visitor data we can make conclusions that are general, based on smaller samples of the data we collect. Every third visitor, every fifth, or even every tenth, could be sufficient to make general conclusions.
With that approach you can also eliminate some of the problems with authentication of visitors, with the whole third-party cookie rejection problematics etc. There are a vavast amount of visitors that can be properly authenticated through a combination of IP-numbers, cookies and tags. They should be put in the "sample" category and used for the representative analysis. Thus, making smaller databases that you can collect uncompressed data in for longer amount of periods, with a maintained if not even higher accuracy of the analysises.
Thoughts on this?
BR/
Stefan