The death of Google Reader: Taking the re- out of search

For hardcore RSS users and journalists, a collective cry of anguish went up as Google decided to kill Reader. As New Zealand developer Aldo Cortesi put it, it wasn’t just the death of a single application but a serious blow to the RSS eco-system, an eco-system that he said was already “deeply ill“. The knock-on effects of the death of Google Reader are not trivial:

Cortesi was very direct on the last point:

The truth is this: Google destroyed the RSS feed reader ecosystem with a subsidized product, stifling its competitors and killing innovation. It then neglected Google Reader itself for years, after it had effectively become the only player.

There are alternatives. I’ve used Feedly on and off for a while, and I still use NetNewsWire. I’m excited to hear that Feedly is working to allow people to easily migrate from Google Reader to its other sync services.

However, as the dust has settled since the announcement, I still haven’t found a drop-in replacement for Reader. As a journalist, Google Reader is essential to the work I do. As imperfect as Google Translate is, the ability to translate content easily from feeds in languages I didn’t speak was a god-send. It helped me keep up with developments in the Arabic, Chinese and Turkish markets that I simply wouldn’t have been able to without it. Sure, I can put things manually through Translate, but it’s all about efficiency.

Google Reader combined with Google Alerts (how long will that hang around I wonder) was another stunning way for me to discover new sources of information, especially as Google ripped out the sharing that once was a powerful social way to discover new information.

I’ll readily admit that I’m an edge case. RSS readers have never been a mainstream activity, but as a journalist, RSS was one way that I kept on top of the firehouse of information that I need to sift through as a modern information professional. In my work as not only a journalist but also as a digital media strategist, people ask me how I stay on top of all of the constant changes in the business. Although social and semantic news app Zite is the first thing I look at every morning, RSS and Google Reader have continued to play an essential role, and RSS, Google Reader or no Google Reader, will continue to be essential.

Google has had and then killed a number of extremely useful research tools for journalists, and Reader is just the latest. Search Timeline, which showed the frequency of a search term, was flawed but still extremely useful for research as a journalist. For journalists working with social media, the death of Realtime, Google’s social media search, was a terrible loss. No other tool has come even close to the functionality that Realtime offered. Topsy comes the closest, but it still lacks the incredible features that Realtime offered. Now, some of the death of Realtime was part of another company killing an eco-system, Twitter, but Google could have continued after the deal with Twitter fell apart. Google probably didn’t for the same reason it is killing Reader. The search giant wants to push Google+.

The death of Search Timeline, Realtime and now Reader all seem to be a pattern, loss of tools that were very important for journalistic research at the search company. I’m not saying my needs as a journalist are more important than the vast majority of other users of Google (although Suw notes that, as a consultant, she also relied on these tools and often recommended them to clients). My professional needs are quite particular. However, these tools were incredibly useful for research, and I don’t see any drop-in replacements.

My question to fellow journalists: How do we support the special web services that are valuable to us? How do we help create more resilient digital services that serve our special needs? I have some ideas, but that’s for another blog post.

New year, new blog, new report

In a happy coincidence, today I launched both my new blog on Forbes.com and Chatham House released the report on the effects of the Eyjafjallajökull ash cloud event to which I contributed.

My new Forbs.com blog will be covering the rather disparate topics of book publishing and high-impact low-probability (HILP) events. Slightly an odd mix, perhaps, but both are fascinating topics and we’ll see how it develops.

The Chatham House report, Preparing for High-impact, Low-probability Events: Lessons from Eyjafjallajökull, looks at the impact that the ash cloud had, as well as examining the need for companies and organisations to be prepared for these HILP events. My contribution was Chapter 4: The Battle for the Airwaves, which looks at the media response to the ash cloud disruption. You can get an overview in my first post on Forbes.

Do let me know what you think, both of the report and the new blog!

Sacrificing web history on the altar of instant

As I said in my last post about Twitter’s lack of a business model, I’ve been doing some research lately for a think tank. My research has basically consisted of three things:

  • Looking back on the media coverage of an event that happened in early 2010
  • Looking back at the way bloggers reacted to said event
  • And having a quick look at Twitter for reactions there too

Pretty simple stuff, I think you’ll agree. My assumption was that I would be able to tap into Google News; Google Blog, Icerocket and maybe Technorati; and Twitter’s archives. Then I’d be able to scrape the data using something like Outwit Hub, chuck it in Excel and Bob’s your uncle.

Oh, how sadly misguided, how spectacularly wrong.

Now before you talk about how ephemeral the web is and how no one should rely on it for anything, that’s only partly true. A lot of stuff on the web stays on the web, and given how much of our digital selves we are putting on the web, we do need to think about archiving and how we preserve our stuff for the future. But this post is not about archiving, but about accessing what’s already out there.

The first thing I did when I started my research was try to go back in time in Google News to early 2010 and search for news articles about the particular story – the eruption of Eyjafjallajökull – I was interested in.

But Google’s News search results are fuzzy. I wanted to search for the news on particular days, e.g. all the news about the Eyjafjallajökull on 16 April 2010. Do that search, and you’ll be presented with lots of results, many of them not from 16 April 2010 at all, but 17 April or even 18 or 15 April.

I  wanted to refine the search by location, so restricted it to Pages from the UK. Fascinatingly, this included Der Spiegel, Business Daily Africa, Manila Bulletin, FOX News, Le Post and a whole bunch of media sources that, when I last looked, weren’t based in the UK.

So I now have search results which are not limited to either the date or the place that I want. But even worse, results are clustered by story, which might seem like a good idea, but which in reality is lacking. Firstly, these clusters of similar stories are often not clusters of similar stories at all, but clusters of stories that appear to have some keywords in common but which are often actually about slightly different things. I can see the sense in attempting to cluster stories together for the sake of cutting down on duplication for the reader but equally, sometimes I just want a damn list.

Whilst doing my research, I also found that Google News is not, as I had thought, a subset of Google Web results. If you do the same searches on Google Web you get a slightly different set of data, obviously including non-news sites, but actually also including some news sites that aren’t in Google News, and not including many that are.

So far, so annoying, but Google isn’t the only search engine in the world… Except, Google pwned search years ago and innovation in search appears to be almost entirely absent. Bing does news, but a search on Eyjafjallajökull tosses up just three pages of results, and you can’t sort by date. Yahoo News finds nothing. A friend suggested that my local library might have a searchable news archive, but the one I looked at was unworkable for what I wanted.

I’m sure there are paid archives of digital news, but that wasn’t within my budget and, to be honest, given how much news is out there in the wild, there should be a good way to search it. I even tried the Google News API, but that has exactly the same unwanted behaviours as the website.

But hey, things will be better in the blog search, right?

Years ago, in the golden era of blogging, Technorati worked. Their site used to be really great, and I loved it so much I did some work with them. These days, I’m not quite sure what it’s for. It’s certainly not for search, given it finds nothing for Eyjafjallajökull. Icerocket is a better search engine, and you can refine by date, but it finds nothing on our target date, which is surprising as it’s a day or so after Eyjaf popped her top and the flight ban was well underway and, well, you’d think someone on the internet might have had something to say about it.

So, we’re back to Google Blogs. It lets me restrict by date! And specify UK-only! And it coughs up one page of results. Really? We have 4.57m bloggers in the UK and only 35 of them wrote something? I’ve always had my suspicions that the Google Blog index was poorly formed, but Google Blogs is the only choice I have, so I just have to put up with it. At least the results are in a neat list and all on the target date, even if some of them are clearly not from the UK, or even actually blogs, for that matter.

Now then, Twitter. We all know that Twitter’s archives have been on the endangered list for some time, but although they aren’t deleting old Tweets, accessing them is very difficult. Despite providing you with dates going back to 2008 in their advanced search page, you get an error if you try to search for April 2010: “since date or since_id is too old”.

SocialMention is a new search site that I’ve started to find really useful. They search across the majority of the social web and allow you to split that down by type. So I can search for ‘Eyjafjallajökull’ in ‘microblogs’ and get realtime results, but I can’t go back in time further than ‘last month’.

So, we’re back to Google again, this time Google Realtime. It only goes back to early 2010, so lucky for me that my target date is within that period. But the only way I can access that date is by a really clunky timeline interface – I can’t specify a date as I can in Google’s other searches.

Furthermore, there’s no pagination. I can’t hit ‘next page’ at the bottom and fish through a bunch of search pages to find something interesting – my navigation through the results is entirely dependent on the timeline interface. Such an interface will and does entirely outwit Outwit, which can normally follow ‘next’ links to scrap date from an entire search. I doubt it knows how to deal with the stupid timeline interface.

After all this searching and frustration, I’m left with this question:

What has happened to our web history?
The web is mutable, yes, but there’s an awful lot of fire-and-forget content that, generally speaking, hangs around for years. Individual blogs may come and go, but overall there’s a huge pool of blog content out there. Same for news. Twitter is a slightly weird case because it’s a single service with a huge archive of historically interesting data which it isn’t letting just anyone get at. Not even scholars. Twitter may have given its archive to the Library of Congress, but even that’s going to be limited access if their blog post is anything to go by:

In addition to looking at preservation issues, the Library will be working with academic research communities to explore issues related to researcher access.  The Twitter collection will serve as a helpful case study as we develop policies for research use of our digital archives. Tools and processes for researcher access will be developed from interaction with researchers as well as from the Library’s ongoing experience with serving collections and protecting privacy and rights.

The Library is not Twitter and will not try to reproduce its functionality.  We are interested in offering collections of tweets that are complementary to some of the Library’s digital collections: for example, the National Elections Web Archive or the Supreme Court Nominations Web Archive. We will make an announcement when the collection is available for research use.

I’m not an academic researcher, so whether I’d even get access to the archive for research is up in the air. (I can’t find any updates as to the availability of Twitter’s archive via the Library of Congress, so if anyone has info, please leave a comment.)

I think we have two problems here, one already briefly mentioned above.

1. Google has pwned search
For years, Google has been the dominant search engine, and in some ways they’ve paid a price for this as publishers of all stripes have climbed on the Google haterz bandwagon. My suspicions are that Google’s fuzzy search results are a sop to the news industry, because Google should be capable of producing a rich and valuable search tool that allows the user to see whatever data they want to see, in whatever layout they want. Maybe, after all the stupid shit the news publishers have thrown their way, Google thinks that building in failure to their news search product will insulate them from criticism from the industry.

But I don’t think that this absolves Google of responsibility for the lack of finesse in historical search. After all, which bloggers are gathering together to demand Google not index them? And Twitter users who don’t want to be indexed by Google can go private with their account.

But Google dominance does seem to have caused other search engines to wither on the vine. It’s almost like no one is bothering to innovate in search anymore. Bloglines used to be a pretty good blog search engine, but it has gone the way of the dodo. Technorati is now useless as a search engine. Bing is a starting point, but needs an awful lot of work if it’s going to compete. News search is completely underserved, and Twitter… really, Twitter archival search is non-existent.

Are Google really so far ahead that they can’t be touched? Are they really so great that no one is going to bother challenging them? The answer to the first question is clearly no, they aren’t that brilliant that their work can’t be improved upon, not just in terms of the search algorithm which has come in for a lot of criticism lately, but also in terms of their interface and the granularity of advanced searches. And I’d be deeply disturbed if people thought that the answer to the second question was yes. Google are the incumbent but that makes them vulnerable to a smaller, more nimble, more innovative competitors.

2. Historic search has been sidelined to serve instant
What’s going on right now? That’s the question that most search engines seem to be asking these days. Most have limited or zero capacity to look back on our web history, focusing instead of instant search. The immediacy of tools like Twitter and Facebook is alluring, especially for brands and companies who want to know what’s being said about them so that they can respond in a timely fashion.

But focusing on now and abandoning deeper, more nuanced historic searches is a disturbing trend. Searching the web’s past for research purposes might be a minority sport, but can we as a society really afford to disenfranchise our own past? Can we afford to alienate the researchers and ethnographers and anthropologists who want to learn about how our digital world has changed? About our reactions to events, as they happened rather than remembered years later? There is value in archives, but not if they are locked up, and the key thrown away by the search engines.

We cannot afford to sacrifice our history on the altar of instant. We can’t just say goodbye to the idea of being able to find out about our past, because it’s ok, we can see just how pretty the present looks. The obsession with instant risks not just our past, it also risks our future.

Twitter’s International Growth: Becoming the World’s Water Cooler? | Fast Company

Kevin: Some fascinating statistics showing Twitter's international growth. Kit Eaton writes at FastCompany: "Specific events around the world sparked peaks in international growth, Sanford notes--with the February 2010 Chilean earthquake prompting a 1,200% spike in member sign-ups. A 300% spike was seen after Colombian politicians began to use the system, and speedier growth was seen in India after local politicos and Bollywood stars began to Tweet."

More Twitter research gives us an insight

Last week I blogged about research by Meeyoung Cha, from Max Planck Institute for Software Systems in Germany, and her colleagues that showed on Twitter, the number of followers you have doesn’t correlate to the influence you have.

Corroborating that is research from Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon from the Department of Computer Science at the Korea Advanced Institute of Science and Technology. According to Shiv Singh, in this second piece of research:

The researchers also analyzed the influence of Twitter users and found that there’s a discrepancy in the relationship between the number of followers and the popularity of someone’s tweets. This basically means that the number of followers is not the only measure of someone’s value.

Singh draw out seven points of interest from the research, some of which are interesting and some of which are blindingly obvious to anyone who’s spent any time on Twitter:

  1. Twitter users have 4.12 degrees of separation on average
  2. The reTweet is powerful
  3. 75% of reTweets happen within an hour of the original Tweet
  4. Followers != influence
  5. Trending topics are mainly news headlines or ‘persistent news’
  6. Only a minority of users have reciprocal relationships, and there are a lot of observers
  7. ReTweets spread quickly

Read the whole post for the Singh’s full analysis.

It’s good to see researchers digging into the nuts and bolts of social media. As I said about Cha’s work, those of us who’ve been in this area for a while have built up through experience and observation a set of instincts about how things work. We use heuristics to get a sense of how the whole system functions, but like any assumption built from personal experience there are risks that we are wrong. So it’s very valuable to have those assumptions tested by research which can then ground us in evidence rather than gut feeling.

How many friends can you make in a week?

The New Scientist reports some research by Susan Jamison-Powell at Sheffield Hallam University which seems to show that prolific bloggers are more popular, regardless of the quality or tone of their posts.

[She] studied the popularity of 75 bloggers on the site Livejournal.com. She looked at the number of friends each blogger had, the number of posts they made, the total number of words written and the overall tone of the posts. She then asked the bloggers to rate how attractive they found each of their peer’s blogs.

She found that the more words a blogger posted, the more friends they had and the higher their attractiveness rating. The tone of their posts – whether they contained mostly positive or negative comments – had no effect.

The BPS goes into a little bit more detail, explaining that the Liverjournalers were invited into a new community and then asked to rate their fellow community members after one week. I’m not sure if this falls within the bounds of Bad Science, but it’s certainly not an accurate reflection of how communities build in the real world.

My first problem is that you just can’t extrapolate from communities on LiveJournal to blogs in general. LiveJournal has always had a different demographic to, say, bloggers using Typepad or WordPress. LiveJournal has always had a gender bias towards women, for example: currently it has 62.5% female and 37.5% male, the rest unspecified. And the bulk of users are between 18 and 34 (with an impressive spike at 30), historically much younger than demographics for other tools.

Furthermore, LiveJournal is culturally different to many other blogs and blogging platforms and has traditionally been the meeting place for people who felt that other platforms were too open for them or who felt disenfranchised by mainstream tools and wanted to be with their peers. LiveJournal, for many, was where you could be yourself and enjoy the company of people like you, no matter how weird others thought you were.

LiveJournal isn’t a typical blogging community and results from studies on LiveJournal can’t be applied to other bloggers.

But furthermore, after only a week of getting to know someone, you have very little information to go on. Those who talk most will almost certainly get higher rankings than those who are quiet simply because they stand out and can easily be remembered. If you are trying to get to know 75 people in just seven days – and you have to ask if that is even possible – you’re going to rank the noisier ones higher just because they are the people you’ve had most exposure to. If you’ve had very little conversation with someone you are bound to rank them near the bottom simply because they are still strangers and humans tend to be stranger-averse.

How would this study have turned out if they had got to know each other over the course of a month? Or six months? Or a year? You know, real human friendship timescales. And how does the nature of the community change how people react to each other? The study doesn’t say what the raison d’être of the community was, and whether these people were gathered around an issue they cared deeply about or were just mooching around online, killing time.

The lesson that this study appears to be teaching is that bloggers should write more, and not worry about quality. Frankly, I call bullshit on the whole thing. The way that we form relationships through blogging is a complex and nuanced process, just like the way that we form friendships offline. We get to know people over time. We decide whether we agree with their points of view, whether we like the way they present themselves, how they interact with others and we build a picture of them that is either attractive or not.

That this study should get headlines in The Telegraph and BusinessWeek shows how poorly social media is still being covered by the mainstream press and how little understanding or critical thinking they do.

We do need a lot more research into the use of social media and particularly its use in the UK. Studies like Jamison-Powell’s, however, do not advance the debate in any useful way.

“Users will scroll” says Nielsen

Jakob Nielsen, once an opponent of scrolling, has now said that users will scroll, but only if there’s something worth scrolling to. This totally fits in the “No shit, Sherlock” category, but I suppose it’s good to have one’s experiences backed up by the evidence.

What’s disappointing about Nielsen’s column is that he doesn’t appear to have taken different types of content and behaviour into account. So there’s no sign that he adjusted for interestingness of the content, its relevance to the test subject, or whether the site already prioritised key information at the top of the page. Nor does he say whether he adjusted for content that provokes seeking behaviour or what I shall call here ‘absorbed’ behaviour, e.g. reading an interesting blog post.

All three of Nielsen’s examples are sites where I would expect to see seeking behaviour, i.e. the user glances through the content until they find what they want. If the sites are well designed, then the user should find that information quickly, at the top of the page. It is thus not necessarily surprising that he found participants spent 80.3% of their time above the fold (i.e. the point on your screen where you’d need to scroll to see more), and 19.7% below, and that people’s attention flicked down the page until it settled on something interesting.

If Nielsen had used websites that provoke absorbed behaviour, such as well-written blogs or news sites, I would have expected to see a more evenly distributed eye-tracking trace. The third example, a FAQ, is starting to move towards that territory, but FAQs aren’t known for being fascinating. If a blog post or news article is interesting, I will read to the bottom without even realising I am scrolling. If it’s dull, on the other hand, I’ll either give up quite quickly or I’ll skip to the end to see if there’s anything juicy down there, i.e. the low quality of the content flips me from absorbed behaviour to seeking behaviour as I look for something more interesting.

Overall, I find this research, as presented in this column, rather lacking. You can’t just separate out user behaviour from content type and quality because the content has a huge impact on the user’s behaviour.

Nevertheless, Nielsen’s recommendations are sensible, even if they are also somewhat obvious:

The implications are clear: the material that’s the most important for the users’ goals or your business goals should be above the fold. Users do look below the fold, but not nearly as much as they look above the fold.

People will look very far down a page if (a) the layout encourages scanning, and (b) the initially viewable information makes them believe that it will be worth their time to scroll.

Finally, while placing the most important stuff on top, don’t forget to put a nice morsel at the very bottom.

And for those of you who made it this far, here’s your nice morsel (of cute):

Grabbity and Mewton

Report: Making the Connection: The use of social technologies in civil society

Last year I wrote a report for the Carnegie UK Trust’s Inquiry into the Future of Civil Society in the UK and Ireland. Called Making the Connection: The use of social technologies in civil society, it’s now available for download. Although focused primarily on the use of social media by the charitable sector, there’s still a lot of interesting stuff in it for business, I think, not least future scenarios that try to imagine what the world might be like in 2025 and pose some questions for organisations about their ability to adapt to a rapidly changing environment. Please do take a look and let me know what you think!

Do you have space for incubators?

Robert Biswas-Diener, who studies the psychology of happiness, writes on CNN.com about the difference between people who procrastinate and those who incubate:

Procrastinators may have a habit of putting off important work. They may not ever get to projects or leave projects half finished. Importantly, when they do complete projects, the quality might be mediocre as a result of their lack of engagement or inability to work well under pressure.

[…]

In a pilot study with 184 undergraduate university students, we were able to isolate specific items that distinguished incubators from the rest of the pack. Incubators were the only students who had superior-quality work but who also worked at the last moment, under pressure, motivated by a looming deadline.

This set them apart from the classic “good students,” the planners who strategically start working long before assignments are due, and from the procrastinators, who wait until the last minute but then hand in shoddy work or hand it in late.

I can certainly relate to the concept of the incubator. Whilst I like to have a long run up on important projects, they almost always end up left until the last minute.

This is problematic in a business context, where the slow-and-steady approach is the assumed default. Most project planning, for example, assumes that people will hit intermediary deadlines regularly throughout a project. Yet sometimes, particularly in areas where the ground is constantly shifting beneath your feet such as in tech, this can be a really bad thing because work done and decisions made early in the project can be out of date by the end of the project, ensuring the final deliverables are themselves obsolete as soon as completed.

I do think that social media can help with this, letting incubators share their thoughts, their incubation process with their team and manager without having to hit artificial deadlines that ultimately have a negative impact on the final result. I did this myself with a big report that I wrote last year. We agreed that I would not provide a “first draft”, but would instead put each section up on a wiki for the team to look at as it was completed. That meant that, come the “let’s assess your progress” meeting, I didn’t have anything much to show, but my final draft was something I was very proud of.

The major issue with that experience was that I was quite happy with the approach, it being one I am used to taking, but the people I was working with did not always seem to wrap their heads around it. Such an approach changes how the project should be managed, with ongoing communications the norm instead of sporadic, milestone-based catch-ups. If managers struggle with this different style, then they are unlikely to get the best out of incubator-type personalities.