Ushahidi and Swift River: Crowdsourcing innovations from Africa

For all the promise of user-generated content and contributions, one of the biggest challenges for journalism organisations is that such projects can quickly become victims of their own success. As contributions increase, there comes a point when you simply can’t evaluate or verify them all.

One of the most interesting projects in 2008 in terms of crowdsourcing was Ushahidi. Meaning “testimony” in Swahili, the platform was first developed to help citizen journalists in Kenya gather reports of violence in the wake of the contested election of late 2007. Out of that first project, it’s now been used to crowdsource information, often during elections or crises, around the world.

What is Ushahidi? from Ushahidi on Vimeo.

Considering the challenge of gathering information during a chaotic event like the attacks in Mumbai in November 2008, members of the Ushahidi developer community discussed how to meet the challenge of what they called a “hot flash event“.

It was that crisis that started two members of the Ushahidi dev community (Chris Blow and Kaushal Jhalla) thinking about what needs to be done when you have massive amounts of information flying around. We’re at that point where the barriers for any ordinary person sharing valuable tactical and strategic information openly is at hand. How do you ferret the good data from the bad?

They focused on the first three hours of a crisis. Any working journalist knows that often during fast moving news events false information is often reported as fact before being challenged. How do you increase the volume of sources while maintaining accuracy and also sifting through all of that information to find the information that is the most relevant and important?

Enter Swift River. The project is an “attempt to use both machine algorithms and crowdsourcing to verify incoming streams of information”. Scanning the project description, the Swift River application appears to allow people to create a bundle of RSS feeds, whether those feeds are users or hashtags on Twitter, blogs or mainstream media sources. Whoever creates the RSS bundle is the administrator, allowing them to add or delete sources. Users, referred to as sweepers, can then tag information or choose the bits of information in those RSS feeds that they ‘believe’. (I might quibble with the language. Belief isn’t verification.) Analysis is done of the links, and “veracity of links is computed”.

It’s a fascinating idea and a project that I will be watching. While Ushahidi is designed to crowdsource information and reports from people, Swift River is designed to ‘crowdsource the filter’ for reports across the several networks on the internet. For those of you interested, the project code is made available under the open-source MIT Licence.

One of the things that I really like about this project is that it’s drawing on talent and ideas from around the world, including some dynamic people I’ve had the good fortunte to meet. Last year when I was back in the US for the elections, I met Dave Troy of Twittervision fame who helped develop the an application to crowdsource reports of voting problems during the US elections last year, Twitter Vote Report. The project gained a lot of support including MTV’s Rock the Vote and National Public Radio. He has released the code for the Twitter Vote Report application on GitHub.

To help organise the Swift River project for Ushahidi, they have enlisted African tech investor, Jon Gosier of Appfrica Labs in Uganda. They have based Appfrica Labs loosely on Paul Graham’s Y Combinator. I interviewed Jon Gosier at TEDGlobal in Oxford this summer about a mobile phone search service in Uganda. He’s a Senior TED Fellow.

There are a lot of very interesting elements in this project. First off, they have highlighted a major issue with crowdsourced reporting: Current filters and methods of verification struggle as the amount of information increases. The issue is especially problematic in the chaotic hours after an event like the attacks in Mumbai.

I’m curious to see if there is a reputation system built into it. As they say, this works based on the participation of experts and non-experts. How do you gauge the expertise of a sweeper? And I don’t mean to imply as a journalist that I think that journalists are ‘experts’ by default. For instance, I know a lot about US politics but consider myself a novice when it comes to British politics.

It’s great to see people tackling these thorny issues and testing them in real world situations. I wonder if this type of filtering can also be used to surface and filter information for ongoing news stories and not just crises and breaking news. Filters are increasingly important as the volume of information increases. Building better filters is a noble and much needed task.

Reblog this post [with Zemanta]

Instapaper: Managing your ‘To Read’ list

I have this dreadfully bad habit of leaving lots of tabs open in my browser. Since the day Firefox introduced tabs, they have been my default way of “managing” large numbers of articles that I want to read. Whether someone has sent me a link by email or IM, or I spot something on Twitter, I’d open it up in a tab, glance at the headline and think, “Oh, I’ll read that later.” Then it would sit in my browser for weeks, sometimes months, whilst I did other stuff.

When Firefox grows to 60+ open tabs it becomes a bit of a resources pig and more often than not would crash horribly, maybe taking down the rest of the OS with it. I’d be forced to restart my Mac and when Firefox reopened I would feel compelled to reopen the 60 tabs that had caused it to crash in the first place. Sometimes I copy all URLs into a separate document and start afresh with an empty browser. I almost never go back to this list of URLs (which now goes back to 10th August 2006!).

I recently discovered Instapaper and now my workflow has totally changed. Instead of leaving tabs open, I open the article I want to read, save it to Instapaper, and close the tab. I can then read it either later on, in my browser, or I can read it on the Instapaper iPhone app. Once I’m done, I can archive the link, or I can share it on Tumbler, Twitter, Feedly, Google Reader, Facebook or via email. Instapaper also plays very nicely with Tweetie on the iPhone, so I can save links direct from my phone without having to star the Tweet and open it on my Mac later. The only thing I miss at the moment is that I can’t save links to Delicious, which is my current link storage facility.

It’s not often that an app revolutionises my reading in this way. RSS did it, years back. (If you’re curious, I use NetNewsWire which syncs to Google Reader and thence with Reeder on the iPhone – a fab combination.) But nothing has come close to changing how I consume non-RSS content until now.

The great thing is that I don’t feel the need to read everything that passes into view, but have a much more streamlined way of saving the link and assessing it later. And because Instapaper on the iPhone works offline, I can use some of that wasted time spent sitting on underground trains to flip through my articles. Win!

Just how gullible is the media?

Rather like our own Starsuckers, wherein the British media are shown not to give a fig about whether stories are true or not, Hungry Beast, a show on Australia’s ABC, recently put together their own hoax.


I don’t know if this shows that the media is gullible, or whether it just proves that they just don’t care whether what they print is true. If the former was true, we might stand a chance of turning things around. I think the latter is more on the money, which makes it a much more intractable problem.

links for 2009-12-14

Metrics, Part 1: The webstats legacy

Probably the hardest part of any social media project, whether it’s internal or external, is figuring out whether or not the project has been a success. In the early days of social media, I worked with a lot of clients who were more interested in experimenting than in quantifying the results of their projects. That’s incredibly freeing in one sense, but we are (or should be) moving beyond the ‘flinging mud at the walls to see what sticks’ stage into the ‘knowing how much sticks’ stage.

Social media metrics, though, are a bit of a disaster zone. Anyone can come up with a set of statistics, create impressive-sounding jargon for them and pull a meaningless analysis out of their arse to ‘explain’ the numbers. Particularly in marketing, there’s a lot of hogwash spoken about ‘social media metrics’.

This is the legacy of the dot.com era in a couple of ways. Firstly, the boom days of the dot.com era attracted a lot of snakeoil salesmen. After the crash, businesses, now sceptical about the internet, demanded proof that a site really was doing well. They wanted cold, hard numbers.

Sysadmins were able to pull together statistics direct from the webserver and the age of ‘hits’ was born. For a time, back there in the bubble, people talked about getting millions of hits on their website as if it was something impressive. Those of us who paid attention to how these stats were gathered knew that ‘hits’ meant ‘files downloaded by the browser’, and that stuffing your website full of transparent gifs would artificially bump up your hits. Any fool could get a million hits – you just needed a web page with a million transparent gifs on it and one page load.

This led to the second legacy: an obsession with really big numbers. You see it everywhere, from news sites talking about how many ‘unique users’ they get in comparison to their competitors to internal projects measuring success by how many people visit their wiki or blogs. It’s understandable, this cultural obsession with telephone-number-length stats, but it’s often pointless. You may have tens of thousands of people coming to your product blog, but if they all think it’s crap you haven’t actually made any progress. You may have 60% of your staff visiting your internal wiki, but if they’re not participating they aren’t going to benefit from it.

Web stats have become more sophisticated since the 90s, but not by much. Google Analytics now provides bounce rates and absolute unique visitors and all sorts of stats for the numerically obsessed. Deep down, we all know these are the same sorts of stats that we were looking at ten years ago but with prettier graphs.

And just like then, different statistics packages give you different numbers. Server logs, for example, have always provided numbers that were orders of magnitude higher than a service like StatCounter which relies on you pasting some Javascript code into your web pages or blog. Even amongst external analytics services there can be wild variation. A comparison of Statcounter and Google Analytics shows that numbers for the same site can be radically different.

Who, exactly, is right? Is Google undercounting? StatCounter overcounting? Your web server overcounting by a factor of 10? Do you even know what they are counting? Most people do not know how their statistics are gathered. Javascript counters, for example, can undercount because they rely on the visitor enabling Javascript in their browser. Many mobile browsers, for example, will not show up because they are not able to run Javascript. (I note that the iPhone, iTouch and Android do show up, but I doubt that they represent the majority of mobile browsers.)

Equally, server logs tend to overcount not just because they’ll count every damn thing, whether it’s a bot, a spider or a hit from a browser, but also they’ll count everything on the server, not just the pages with Javascript code on. To some extent, different sorts of traffic will be distinguished by the analytics software that is processing the logs, but there’s no way round the fact that you’re getting stats for every page, not just the ones you’re interested in. Comparing my server stats to my StatCounter shows the former is 7 times the latter. (In the past, I’ve had sites where it’s been more than a factor of ten.)

So, you have lots of big numbers and pretty graphs but no idea what is being counted and no real clue what the numbers mean. How on earth, then, can you judge a project a success if all you have to go on are numbers? Just because you could dial a phone with your total visitor count for the month and reach an obscure island in the Pacific doesn’t mean that you have hit the jackpot. It could equally mean that lots of people swung past to point and laugh at your awful site.

And that’s just web stats. Socal media stats are even worse, riddled with the very snakeoil that web stats were trying to mitigate against. But more on that another day.

links for 2009-12-12

  • Kevin: A fascinating infographic showing the use of various web 2.0/social web services. The one quick thing to see on this map is how popular photo sharing is, popular and universally so. Social networking also is very popular around the world. Microblogging and blogging shows a wide variation in use around the world. One thing that is really veruy interesting is how popular social media is in Asia compared to Europe. For instance, 60% of China's internet users upload photos but only 38% of British users. Some 46% of Chinese internet users have blogged but only 8.4% of British users. Wow. That's huge.
  • Kevin: My colleague Mercedes Bunz has a great interview with media consultant Gary Hayes on how social media services such as Twitter are now being used effectively to reflect the community that builds around television shows. This is a great point by Hayes: "Most broadcasters and programme-makers are really missing a trick in not having a presence in the real-time discussion that surrounds "their" show – they don't need to control the conversation, they just need to be a voice of "the creator" or represent the production." The same could be said for news organisations. It's not about control but about showing up.

Poynter asks: Are journalists giving up on newspapers?

The Poynter Institute in the US hosted an online discussion asking if journalists are giving up on newspapers after high-profile departures there including Jennifer 8. Lee, who accepted a buy out at the New York Times, and Anthony Moor, who left newspapers to become a local editor for Yahoo. Moor told the US newspaper trade magazine Editor & Publisher – which just announced it is ceasing publication after 125 years:

Part of this is recognition that newspapers have limited resources, they are saddled with legitimate legacy businesses that they have to focus on first. I am a digital guy and the digital world is evolving rapidly. I don’t want to have to wait for the traditional news industry to catch up.

This frustration has been there for a while with digital journalists, but many chose to stay with newspapers or sites tied to other legacy media because of resources, industry reputation and better job security. However, with the newspaper industry in turmoil, now the benefits of staying are less obvious.

Jim Brady, who was the executive editor of WashingtonPost.com but is now heading up a local project in Washington DC for Allbritton Communications, said on Twitter:

A few years ago, the risk of leaping from a newspaper to a digital startup was huge. Now, the risk of staying at a newspaper is also huge.

Aside from risk, Jim echoed Moor’s comments in an interview with paidContent:

Being on the digital side is where my heart is. Secondly, I think doing something that was not associated with a legacy product was important.

In speaking with other long-time digital journalists, I hear this comment frequently. Many are yearning to see what is possible in terms of digital journalism without having to think of a legacy product – radio, TV or print. There is also the sense from some digital journalists that when print and digital newsrooms merged that it was the digital journalists and editors who lost out. In a special report on the integration of print and online newsrooms for Editor & Publisher, Joe Strupp writes:

Yet the convergence is happening. And as newsrooms combine online and print operations into single entities, power struggles are brewing among many in charge. More and more as these unifications occur, it’s the online side that’s losing authority.

It’s naive to think that these power struggles won’t happen, but they are a distraction that the industry can ill afford during this recession. In the Editor & Publisher report, Kinsey Wilson, former executive editor of USA Today and editor of its Web site from 2000-2005, said that during the convergence at USA Today and the New York Times:

We both had a period of a year or two when our capacity to innovate on the Web stopped, or was even set back a bit

Digital models are emerging that are successful. Most are focused and lean such as paidContent (although it has cut back during the recession, I’d consider its acquisition by The Guardian, my employer, as a mark of success) and expanding US political site Talking Points Memo. There are opportunities in the US for journalists who want to focus on the internet as their platform.

Back to the Poynter discussion, Kelly McBride of Poynter said during the live discussion:

I talk to a lot of journalists around the country. I don’t think they are giving up journalism at all. I do think some of them have been let down by newspapers. But a lot are holding out. They are committed to staying in newspapers as long as they can, because they are doing good work.

It’s well worth reading through the discussion. I am sure that many journalists have some of the same questions.

What was the verdict? Poynter discussion - Are journalists giving up on newspapers?

links for 2009-12-11

News organisations miss opportunity to build community with online photo use

As Charlie Beckett, the director of the politics and journalism think tank POLIS at LSE, points out, the Daily Mail is getting a lot of grief for using pictures, mainly from photo-sharing site Flickr, without the permission of the users or in violation of the licencing on those pictures. Charlie’s post is worth reading in full, but here are some of the questions he poses:

At what point does material in the public domain become copyright? the people who published these images didn’t do so for financial gain. There is a genuine, if very slight, news story here which feels worthy of reporting. If I link to those photos am I also infringing people’s copyright? Might it be possible that they will actually enjoy seeing their work on the Mail’s website where it will be connected to millions of other people?

I don’t want to dwell on the copyright issue too much, apart from saying that if the newspaper industry is fuzzy on copyright on the internet, it undermines their arguments with respect aggregators, ‘parasites’ and ‘thieves‘ online. I’d rather make the case that there is a benefit to news organisations in not only respecting the copyright of others but also in being good participants in online communities like Flickr. Here’s part of the comment that I left on Charlie’s post:

Leaving (the copyright) issue aside, this is another example of the news industry missing an opportunity to build community around what they do. When I use Creative Commons photos from sites like Flickr, firstly, I honour the terms of the licence. Secondly, I drop the Flickr user a note letting them know that I’ve used a photo on our site. It’s not only a way to use nice photos, but it’s also a way to build goodwill to what we’re doing and do a little soft touch promotion of our coverage. It takes a minutes out of my day to create that email, but instead of a backlash, I often get a thank you. They let their friends know that the Guardian has used their picture. It’s brilliant for everyone. Their are benefits to being good neighbours online, rather than viewing the internet as a vast repository of free content. As a journalist, I wouldn’t use a photo on Facebook without permission. Besides, the photos on Flickr are very high quality, and with the common use of Creative Commons, I know exactly what the terms of use are. As a user of Flickr who licences most of my photos under Creative Commons licence, I also feel that whatever photos I use, I’m also giving back to the community. It’s a much more honest relationship.

Last year during the elections, I found an amazing picture of Democratic candidate John Edwards on Flickr under a Creative Commons Attribution licence that allows commercial use and used it on a blog post on the Guardian when he dropped out of the race. I let the photographer, Alex de Carvalho, know that I used his photo, and he responded:

Thank you, Kevin, for using this picture; I’m honored it’s in The Guardian.

Result:

  • Great picture.
  • Credit where credit is due.
  • Mutual respect for copyright. Creative Commons clearly states the rights wishes of the photographer.
  • Light touch outreach to promote our work at the Guardian.
  • Building community both on our site and on the broader internet.

That’s what we mean at the Guardian about being of the internet not just on it, and this is why I believe that social media is about creating great journalism and building an audience to support it.

Professionalism

First we caused the twin evils of poor communication and inability to learn from each other through our systematisation and bureaucratisation of the world of work. We devalued relationships and trust as twin pillars of human endeavour. Then we made it worse by sticking plaster on the wound, adding layers of “professional” intervention on top in the form of “internal communicators” and “knowledge managers” in our attempts to make things better. We buried the people trying to do things under increasingly collusive layers of “grown ups” pretending that this is the way things have to be.

And then… (Euan is a great read – if you’re not already subscribed, he’s well worth it.)

Professionalism is, at best, a veneer of objectivity. At worst is a false persona that distances us from our colleagues, complicates collaboration and erodes trust. Social media turns all this on its head – instead of being “professional” we can be ourselves, we can have genuine relationships with colleagues that promote trust and understanding. We can finally acknowledge that we are real people with real emotions and that those emotions matter.