Metrics, Part 1: The webstats legacy

Probably the hardest part of any social media project, whether it’s internal or external, is figuring out whether or not the project has been a success. In the early days of social media, I worked with a lot of clients who were more interested in experimenting than in quantifying the results of their projects. That’s incredibly freeing in one sense, but we are (or should be) moving beyond the ‘flinging mud at the walls to see what sticks’ stage into the ‘knowing how much sticks’ stage.

Social media metrics, though, are a bit of a disaster zone. Anyone can come up with a set of statistics, create impressive-sounding jargon for them and pull a meaningless analysis out of their arse to ‘explain’ the numbers. Particularly in marketing, there’s a lot of hogwash spoken about ‘social media metrics’.

This is the legacy of the dot.com era in a couple of ways. Firstly, the boom days of the dot.com era attracted a lot of snakeoil salesmen. After the crash, businesses, now sceptical about the internet, demanded proof that a site really was doing well. They wanted cold, hard numbers.

Sysadmins were able to pull together statistics direct from the webserver and the age of ‘hits’ was born. For a time, back there in the bubble, people talked about getting millions of hits on their website as if it was something impressive. Those of us who paid attention to how these stats were gathered knew that ‘hits’ meant ‘files downloaded by the browser’, and that stuffing your website full of transparent gifs would artificially bump up your hits. Any fool could get a million hits – you just needed a web page with a million transparent gifs on it and one page load.

This led to the second legacy: an obsession with really big numbers. You see it everywhere, from news sites talking about how many ‘unique users’ they get in comparison to their competitors to internal projects measuring success by how many people visit their wiki or blogs. It’s understandable, this cultural obsession with telephone-number-length stats, but it’s often pointless. You may have tens of thousands of people coming to your product blog, but if they all think it’s crap you haven’t actually made any progress. You may have 60% of your staff visiting your internal wiki, but if they’re not participating they aren’t going to benefit from it.

Web stats have become more sophisticated since the 90s, but not by much. Google Analytics now provides bounce rates and absolute unique visitors and all sorts of stats for the numerically obsessed. Deep down, we all know these are the same sorts of stats that we were looking at ten years ago but with prettier graphs.

And just like then, different statistics packages give you different numbers. Server logs, for example, have always provided numbers that were orders of magnitude higher than a service like StatCounter which relies on you pasting some Javascript code into your web pages or blog. Even amongst external analytics services there can be wild variation. A comparison of Statcounter and Google Analytics shows that numbers for the same site can be radically different.

Who, exactly, is right? Is Google undercounting? StatCounter overcounting? Your web server overcounting by a factor of 10? Do you even know what they are counting? Most people do not know how their statistics are gathered. Javascript counters, for example, can undercount because they rely on the visitor enabling Javascript in their browser. Many mobile browsers, for example, will not show up because they are not able to run Javascript. (I note that the iPhone, iTouch and Android do show up, but I doubt that they represent the majority of mobile browsers.)

Equally, server logs tend to overcount not just because they’ll count every damn thing, whether it’s a bot, a spider or a hit from a browser, but also they’ll count everything on the server, not just the pages with Javascript code on. To some extent, different sorts of traffic will be distinguished by the analytics software that is processing the logs, but there’s no way round the fact that you’re getting stats for every page, not just the ones you’re interested in. Comparing my server stats to my StatCounter shows the former is 7 times the latter. (In the past, I’ve had sites where it’s been more than a factor of ten.)

So, you have lots of big numbers and pretty graphs but no idea what is being counted and no real clue what the numbers mean. How on earth, then, can you judge a project a success if all you have to go on are numbers? Just because you could dial a phone with your total visitor count for the month and reach an obscure island in the Pacific doesn’t mean that you have hit the jackpot. It could equally mean that lots of people swung past to point and laugh at your awful site.

And that’s just web stats. Socal media stats are even worse, riddled with the very snakeoil that web stats were trying to mitigate against. But more on that another day.

Comments are closed.