Reuters’ Connected China: How to win the argument for big data projects at news organisations

Connected China project from Thomson-Reuters

Data has long been a part of journalism, but I think we’re moving into a new and exciting phase in which data is helping to drive innovations in storytelling. Aron Pilhofer and the interactive team at the New York Times, Simon Rogers and the data team at the Guardian and Reg Chua, the data and innovation editor at Reuters, are all exploring new ways to bring together data and novel storytelling techniques together in new ways that help reveal context and connections while also engaging audiences with rich narratives. I am very fortunate to call all three friends and, in the case of Simon, a former colleague.

At the recent NICAR conference in the US, Reg unveiled the latest example of this new wave of data-driven projects, Connected China. What is Connected China? Reg explains on his blog:

It’s a little hard to sum up simply; at one level, it’s a microsite that focuses on looking at power in China, explaining how it flows, the key players and institutions, and their relationships, featuring stories and rich multimedia (including fantastic archival footage.) But it’s much more than that: It’s also a series of innovative data visualizations that pull from a rich, underlying database of people, institutions and relationships to illustrate the connections, careers and positions of key officials in China. And more than that: It’s a great example of how the combination of data. visualizations, stories and multimedia can be much more than the sum of their parts.

To me the micro-site is only part of the story. It is one application built from a database.

It’s an amazing database: Tens of thousands of entities, 30,000 relationships, and a million and a half words (not to mention the array of news stories, photos and videos also featured in the app.) The team structured tons of information – connections, the importance of job roles, etc – with an editorial sensibility. In other words, they applied news judgment – but rather than use it just in stories, they used it to structure data.

One of the really powerful features of this database is the relationships. This is an incredibly rich store of context. In the visualisations created on the site, suddenly, a web of power invisible to all but the most knowledgeable experts on China becomes visible.

The power of connections and context

For years, I’ve dreamed of creating projects like this that unveil relationships between events, people and companies. Years ago, I was inspired by a project called a project called the Shakespeare Explorer developed for the Kennedy Center in Washington DC. It is a wonderful multimedia project that brings together pictures, places, plays and historical events. The timeline highlights connections between the plays and events in history, putting Shakespeare’s plays in a broader context.

You can see something similar with this PBS Frontline interactive showing the connections between Al Qaeda’s network, and how much the US intelligence services knew about the network at various points in time between 1993 and 2001. There were a few features like this around the time of the 9/11 attacks.

These are powerful features, and Connected China shows how a decade of development has moved these concepts forward. The question becomes why we haven’t seen more of them.

How to justify the effort

Projects like Connected China take a lot of time and resources. When I was at the BBC, my colleague Gill Parker worked with a database startup in addition to her work in journalism. Gill was working on the team with BBC security correspondent Frank Gardner looking into global investigation into the 9/11 attacks. They had huge amounts of material, mainly Microsoft Word documents, but they didn’t have an efficient way to organise them. Gill knew there was a better way, so she reached out to me to see if I could connect her with someone at BBC News Online who could provide database expertise. Unfortunately, we didn’t have the spare resources. Well, we probably did, somewhere, but at the time the BBC wasn’t very good at pooling resources across the organisation.

I’ve thought a lot about strategies for making the case, both then and over the years. These are my recommendations:

Plan for reuse, not artisanal single use apps – If there was one lesson I learned early on in my digital journalism career is that it is almost impossible to justify artisanal web interactives. In the mid and late 1990s, we built a lot of things by hand online. To be honest, the massive effort was almost never justified by the response from audience. Very quickly, we learned that we needed re-usable elements that we could build easily over time.

For data, think of a database as raw material for other projects and apps. For instance (and knowing Reg, I’m sure that he has thought about this) with Connected China, you’ve got a huge database of structured information. Using something like Thomson-Reuters’ Calais, it would be relatively easy to link people in China stories to elements of Connected China.

Building databases takes effort, and knowing the kind of databases that will generate the most applications might help you decide which ones to develop and which ones aren’t worth the effort.

Think of potential revenue streams – The days when journalism could afford to be completely divorced from business realities is over, and if you’ve got to make the case why your data project should get scarce resource over another project, you’ll need to think of possible revenues sources that might make it more attractive to the powers that be. I’ve worked with a number of news organisations on data journalism, including Reed Business Information, CNN and Czech TV. Some of the techniques that I advocate are about bringing down the cost of charts and graphs, but I also speak to teams about how they can develop revenue streams through data projects.

Always ask:

• Does the data have commercial value?
• Are there obvious sponsors for the data?
• Could you build an app with the data that might be a premium product?

Think of internal and external applications – One of the strategic justifications that I tried to use for the BBC 9/11 project was that it would lead to an important internal resource as well as an external resource. Ultimately, for that project, the argument didn’t win the day, but it can help you get important buy-in if you can make the case that the data resource can help your staff as well as being a compelling feature for your audience.

Think small before taking on big data – Connected China is a massive project beyond the scope of most organisations. However, there are still important concepts about context and data that can be used on much smaller projects. Think of how structured data can be used to add context to local stories and how you can build up databases over time rather than thinking that you have to build big right away.

This really is an exciting time to be a journalist, and it’s great to see news organisations invest in projects like this. No matter the size of your organisation, there are important ways to use data to add value, in all senses of the word, to your journalism.

A data first digital strategy?

Every once and a while reading comments on a good blog is rewarded. I’m an avid reader of Alan Mutter’s Reflections of a Newsosaur. His recent post on Big Data is well worth reading.

To date, publishers have applied the same business model to everything from print and the web to the latest mobile and social platforms: Build the biggest possible audience.

This approach, unfortunately, is exactly at odds with the point of Big Data, whose goal is to connect individuals with information specifically tailored to them.

The quicker Big Data applications develop, the faster the large but un-targetable audiences traditionally delivered by newspapers will become an anachronism, thus limiting their utility to consumers and value for advertisers.

However, read the first comment from Pat Scanlon, the director of digital strategy and Business Development at Pittsburgh Post-Gazette. It shows a publisher actually embracing data, and no I don’t mean embracing data journalism. Scanlon outlines how some newspapers are embracing data to deliver better, more targeted content to audiences and better results for advertisers. He writes in his comment:

At the beginning of this year we adopted a “DataFirst” digital strategy. Being DataFirst means: “Collecting, analyzing, understanding and using data to create better customer (users, readers & advertisers) experiences – and improve our business insights.”

More and better data means we understand our customers better. If we understand our customers better, we can deliver more relevant content – both Editorial and Advertising.

If we deliver more relevant Editorial and Advertising… We will make more money via increased traffic, more effective advertising and being relevant in the lives of our customers.

If we make more money we will save journalism and be employed.

 Amen. This is one to watch and most likely to emulate.