Direct visits: A referral data black hole

Facebook drives more traffic than Twitter” ran the headline in May, after a Pew study seemed to show that Twitter just wasn’t as good for traffic numbers as people had thought. But there were problems with the study’s methodology, as many people, including Steve Buttry said:

The PEJ report acknowledges that the Nielsen Co., the source of all the data studied, relies “mainly on home-based traffic rather than work-based,” without adding that most use of news sites comes during the workday.

and

The study uses strongly dismissive language about Twitter’s contribution to traffic to news sites. But it never notes that many – probably most – Twitter users come from TweetDeck, HootSuite, mobile apps or some other source than Twitter.com. Twitter “barely registers as a referring source,” the report concludes, ignoring or ignorant of the fact that the data counted only traffic from Twitter.com and did not count most visits from Twitter users.

As the web evolves, so the tools that we use to measure and assess activity need to evolve, but this hasn’t really happened. We might have managed to ditch the misleading idea of ‘hits’, but web traffic measurement is still immature, with many of the tools remaining basic and unevolved. But this problem is only going to get worse, as Steve’s second point hints at.

As I mentioned in this post, earlier this year I did some work looking at referrer logs for a client, OldWeather.org, a citizen science project that is transcribing weather and other data from old ships logs. One of the things that I noticed was how messy Google Analytics’ data is when it comes to finding out which social networks people have visited from. Many social networks have multiple possible URLs which show up in the stats as separate referrers. For example, Facebook has:

  • facebook.com
  • m.facebook.com
  • touch.facebook.com

And Twitter has:

  • twitter.com
  • mobile.twitter.com

So in order to get a better picture of activity from Facebook and Twitter, we need to add the numbers for these subdomains together. But that alone doesn’t provide the full picture. A list compiled by Twitstat.com in August of last year showed that only 13.9% of its users were using the Twitter.com website, with another ~1% using Twitter’s mobile website. That means around 85% of Twitter users are not going to show up in the twitter.com referrals because they haven’t come from twitter.com or mobile.twitter.com.

It is possible to get some other hints about Twitter traffic as some web-based clients do provide referral data, e.g. twittergadget.com, brizzly.com, hootsuite.com or seesmic.com. But the big problem is that much of the traffic from Twitter clients will simply show up in your stats as direct visits, essentially becoming completely invisible. And when direct visits make up 40% of your traffic, that’s a huge black hole in your data.

It used to be assumed that direct visits were people who had your website bookmarked in their browser or who were typing your URL directly into their browser’s address bar. The advent of desktop Twitter clients has undermined this assumption completely, and we need to update our thinking about what a ‘direct visit’ is.

This obfuscation of traffic origins is only going to get worse as clients provide access to other tools. Tweetdeck, for example, can no longer be assumed to be a Twitter-only client, because it also allows you to access your LinkedIn, Facebook, MySpace, Google Buzz and Foursquare accounts. So even if you can spot that a referral has come via Tweetdeck, you have no idea whether the user clicked on a link from their Twitter stream, or via Facebook, LinkedIn, etc.

This makes understanding the success of your social media strategy and, in particular, understanding which tools/networks are performing most strongly, nigh on impossible. What if 20% of your traffic is coming from invisible Twitter clients and only 1% comes from Twitter.com? Because the majority of your Twitter traffic is hidden as direct traffic you might end up sensibly but wrongly focusing on the 5% that has come via Facebook.com, thus reworking your strategy to put more effort into Facebook despite the fact it is actually performing poorly in comparison to Twitter.

I recommend to all my clients that they keep an eye on their statistics and that if a tool isn’t working out well for them, that they should ditch it and move on to another. There are so many social networks around that you just can’t be everywhere, you must prioritise your efforts and focus on the networks where you are most likely to reach your target audience. But we need to have clarity in the stats in order to do this.

The scale of this problem is really only becoming clear to me as I type this. For sites with low direct traffic, a bit of fuzziness in the stats isn’t a big deal, but for sites with a lot of direct traffic – and I see some sites with over 40% direct traffic – this is a serious issue. You could potentially have a single referring source making up a huge part of your total traffic, and you’d never know. And as more services provide APIs that can feed more desktop clients, which themselves provide more functionality than the original service itself, the growth of wrongly attributed ‘direct visits’ is only going to accelerate.

Without meaningful numbers, we’re back to the bad old days of gut feeling about whether one strategy is working better than another. I already see people making huge assumptions about how well Facebook is going to work for them, based on the faulty logic that everyone’s in Facebook, ergo by being in Facebook they will reach everyone.

Now, more than ever, we need reliable web stats so that we can make informed decisions, but these numbers are turning out to be like ghosts: our brains see what they want to see, not what is actually there. Even established research institutions like Pew are suffering pareidolia, seeing a phantom Facebook in their flawed numbers.