Lies, Damn Lies, and Web Statistics (Not everything that counts can be counted, and not everything that can be counted counts) [A9]


Web usage analysis by web server log analysis are known to be rough estimations, and useful in general terms only. There are many sources of error, such as the activity of search engine robots and spiders, proxy caching, and the dynamic nature of IP assignation by many ISPs. A newer type of analysis that tracks users in their browser claims to remove many of these problems, giving us a clearer picture of ‘real user’ activity. Such services are also in near-realtime -whereas previously, most institutions have downloaded logs weekly/monthly for analysis and then spent considerable resources preparing reports etc.
In this session two different log analysis tools are compared, followed by a comparison of log analysis and browser-based analysis. As a result of this work, a consortium of national museums are considering moving to a hosted external service as a consortium. A consortium of universities have already moved to browser-based recording. Results from this work and issues raised by its implementation at several sites, will be presented.

Learning Outcomes:

By the end of the session participants will:

  • Have gained an understanding of the differences between log analysis and browser-based usage recording.
  • Have explored the sources of error in both types of analysis.
  • Have considered a hosted service as an alternative to in-house analysis (perhaps via a consortium approach).
  • Have gained an insight into the issues surrounding the implementation of browser-based analysis and the advantages it may bring.
  • What has been learned? Experiences from implementation and usage of browser-based tagging system. Overall trends and individual cases.
  • What for the future? What are the possible future implications and possibilities from browser based analysis.