The value of data for readers and the newsroom

When I was at the BBC, a very smart producer, Gill Parker, approached me about pulling together a massive amount of data and information she was collecting with Frank Gardner trying to unravel the events that lead to the 11 September 2001 attacks in the US. Not only had Gill worked on the BBC’s flagship current affairs programme Newsnight and on ABC’s Nightline in the US, she also had worked in the technology industry. They were interviewing law enforcement and security sources all around the world and collecting masses of information which they all had in Microsoft Word files. She knew that they needed something else to help them connect the dots, and speaking with me in Washington where I was working as’s Washington correspondent at the time, she asked if help her get some database help.

I thought it was a great idea. My view was that by helping her organise all of the information that they were collecting, the News website could use the resulting database to develop info-graphics and other interactives that would help our audience better understand the complex story. We could help show relationships between all of the main actors in al Qaeda as well as walk people through an interactive timeline of events. I had a vision of displaying the information on a globe. People could move through time and see various events with key actors in the story. This was a bit beyond the technology of the time. Google Earth was still a few years away, and it would have required significant development for some of the visualisations. However, on a story like this, I thought we could justify the effort, and frankly, we didn’t need to go that far. Bottom line: Organising the data would have huge benefits for BBC journalists and also for our audiences.

?Unfortunately, it was the beginning of several years of cuts at the BBC, and the News website was coming under pressure. It was beyond the scope of what I had time to do or could do in my position, and we didn’t have database developers at the website who could be spared, I was told.

A few years later as Google Earth developed, Declan Butler at Nature used data of the spread of the H5N1virus globally to achieve something like the vision I had in terms of showing events over time and distance.

It is great to see my friend and former Guardian colleague Simon Rogers move forward with this thinking of data as a resource both internally to help journalists and also externally to help explain a complex story in his work on the Wikileaks War Logs story. Simon wrote about it on the Guardian Datablog:

we needed to make the data easier to use for our team of investigative reporters: David Leigh, Nick Davies, Declan Walsh, Simon Tisdall, Richard Norton-Taylor. We also wanted to make it simpler to access key information for you, out there in the real world – as clear and open as we could make it.

As the digital research editor at The Guardian, data was key to many of my ideas (before I left this March to pursue my own projects). I even thought that data could become a source of revenue for The Guardian. Data and analysis is something that people are willing to pay for. Ben Ayers, the Head of social media and community at, (speaking for himself not ITV) said to me on Twitter:

Brilliant. I’d pay for that stuff. Surely the kind of value that could be, er, charged for. Just sayin’ … just an example of where, if people expect great interpretation of data as part of the package, the Guardian could charge subs

As I replied to Ben, I wouldn’t advocate charging for data for the War Logs, but I would suggest that charging for data about media, business and sports. That could become an important source of income to help subsidise the cost of investigations like the War Logs. Data wrangling can be time intensive. I know from my experience in developing the media job cuts series that I wrote at the end of 2009 for The Guardian. However, the data can be a great resource for journalists writing stories as well as developing interactive graphics like the media job cuts map or the IED attack map for the War Logs story. Data drives traffic, as the Texas Tribune in the US has found, and I believe that certain datasets could be developed into new commercial products for news organisations.

8 thoughts on "The value of data for readers and the newsroom

  1. Charging for data, really?
    Data is a commodity, if you’re charging for it, someone else can (or should!) supply it free.

    Make data open, and we as a society would still rely on journalists to add value to data. To add context, analysis, emotion. To create stories.

  2. Data is a commodity


    Some data is hard and expensive to gather, and has genuine worth to people. That data is neither a commodity nor likely to be open…

  3. Daniel,

    I don’t disagree that public data should be free, and I don’t advocate for that in the post. However, not all data is public, and I know from personal experience that gathering data is time consuming, expensive and definitely a potential source of revenue. No where in the post do I advocate charging for public data.

    Also, from your tweet, I might quibble that journalism+data adds value. There is too much sloppy anecdote-based journalism that highlights sensationalist examples that are outliers based on the data. As a journalism student, I didn’t take a statistics class, which I now regret. It should be mandatory. (Lesson number one: Mean or median and average are not the same thing.)

    Also, if it wasn’t clear in the post, I wouldn’t advocate charging for data such as the War Logs, although that data was definitely not a commodity. My reason for charging for the War Logs data is on principle. To me, that’s the kind of social mission journalism that I hope stays at low cost or no cost and is cross subsidised by the sale of commercial data. However, we have to find new sources of revenue to sustain journalism, and I believe that data can be one source, if it it’s merely making public data more accessible and using that to drive traffic.

    Also, Stewart Brand’s original “information wants to be free” needs to referenced in its full form:

    Information Wants To Be Free. Information also wants to be expensive. … That tension will not go away.

  4. I don’t understand in what context a newspaper could or would act in such a way.

    There’s enough companies providing market research, business data, sports stats etc. What would set them apart, where would they create value?

  5. Daniel,

    It depends on the context. For most local newspapers, it might be prohibitively expensive on their own, although most local newspapers are part of larger groups now. That group could offset the cost of data collection. It’s not all that unheard of. Newspapers commission polls, which is a form of data. There are also market opportunities where localised data is of more value to local audiences than generalised national numbers, which often do not demonstrate local or regional differences. There are a lot of market opportunities there.

    In terms of sports stats, yes, they are there, but value added services could be created in conjunction with fantasy league offerings. It would be a differentiator in existing cluttered markets. There are market opportunities there.

    Part of finding new revenue streams is identifying gaps in existing offerings and determining whether they offer enough return to offset the costs. In the post, what I’m suggesting is that while data gathering costs might be high, there are several benefits editorially and also potential benefits with specific datasets to not only justify the cost but also to pay for them.

  6. Thanks Kevin, some good points there.

    I’d like to see such a thing implemented on a commercial licence, with such stuff free for non-commercial use. I agree with newspapers making money, I just disagree with data access being prohibited by money.

  7. Though I concde we’re probably not at a point where data IS commodified. Data will *increasingly* be commodified by Google and it is not firm ground on which to build a business model.

  8. Daniel,

    There is a lot of data that is not published to the web, which means that Google will play no role in whether the data is a commodity (apart from the sales of internal search appliances). I think you’re making the mistake of oversimplifying what data is. Data is not all public. Data is not all published to the web. Data is not all commodity, nor is it valuable to everyone. Knowing your audience and collecting unique data that is of value to them is a very firm ground on which to build a business model.

    As for whether all data should be released free for non-commercial use, while I’m a huge supporter of Creative Commons, I release things under both non-commercial reuse terms and also under terms where I retain the copyright. What I’m advocating is that newspapers look for commercial opportunities to sell data and data analysis to pay for the social mission of journalism, which is often very difficult to monetise on an unbundled basis.

