Xtech 2006: Paul Hammond – An open (data) can of worms

Used to work for the BBC, but left three weeks ago, so can’t talk too much about them. Started working for Yahoo! two weeks ago, lots of APIs at the Developer Network. But can’t really talk about that because he’s only been there two weeks.

Ideas he wants to talk about are his personal experience, and experiences of his friends which they’ve told him in confidence, so can’t talk about that either. So this talk will be not as detailed as would have liked.

Open data. BBC and Yahoo! both understand the benefits of open data. Both have made statements about the importance of open data. Both aim to make as much data available as possible. And there are restrictions on the use of those data.

People know that BBC and Yahoo! are opening up their data, because it’s still relatively rare. So when a new company does it, everyone gets excited. So wanted to see how much data there really is.

List of open APIs at www.programmableweb.com/apilist and it’s a fairly good list, but missing a few bits and pieces. It had 201 APIs listed, and they are all on one page. One quarter of APIs listed are from 7 companies







Plus one I missed.

Most of the companies are new, only 14 APIs from companies more than 20 years old. The big old companies are big, and they’ve collected a lot of useful data that we could do interesting things with that it’s not available.

So everyone in our tech bubble think open data is a good idea, but hardly anyeone is doing it. So if open data is such a good idea, why isn’t there more of it? Don’t care about the format of the data.

Haven’t mentioned RSS/Atom. There are millions of RSS feeds, but these highlight the problems even more. You can now get RSS feeds for almost anything you want, but try getting in depth sports statistics, or updated stock market data, or flight times. You can’t get it. RSS is intended to be read in an aggregator, and most of it can’t be reused or republished.

So you can get any data you want from the net, so long as it’s the last 10 items on an RSS feed, and you don’t what to do anything with it.

Why are people happy to put some data out, but not others. Do the tech and standards need to be better? Yes, they are not perfect but they never are. Simple things like character encoding are very easy to get wrong. Definitions are difficult.

But they are good enough. Standards have been developed because there’s a real need to use this stuff behind the firewall. RSS is popular, and most of it is not perfect, but it’s good enough.

So if it’s not the tech, it must be something else. But there’s a simple reason. Organisations don’t do anything unless they think it is in their best interests. A company won’t do anything unless it makes money, so maybe companies don’t think its worthwhile. That means either:

They’re right.


They’re wrong.

Either could be correct. But more important is to understand their reasons.

Most companies don’t know what an API is. If they don’t understand the concept of releasing their data online, then standards won’t matter. Explaining the concept of an API is hard when you are talking to people who don’t know how computers work.

People are starting to learn about RSS. They understand that if they use RSS they don’t need to visit the site. But to use it you do need to know a little bit about it. However, it fits in to an existing business model – it drives interest and visitors to their site. Is in a positive feedback loop because the more RSS there is, the more you see it, the more likely people are going to use it.

So assuming the companies knows what an API is…

Most companies make money from their data. So they will say ‘why give it away?’. For some you can explain why it’s good – for a public broadcaster you can say ‘we’ve paid for it already’. For some companies there are reasons – improves branding, etc. – but it’s a risk.

For most companies, they want competitive advantage. So if a competitor has opened up then you have to open up to keep up.

If you sell data and then you start giving it away it reduces the perceived data. If you sell it for tens of thousands of pounds, then why are you giving it away? Gets into a downward spiral as to what that data is worth.

Opening up data is risky – risk losing money that you’re making. Could argue that they are wrong, but not sure that they are.

Many companies are not allowed to open up, even if they want to.

Lawyers say no. Most companies don’t have complete rights over the data they used. So stock prices on the evening news don’t come from the broadcaster, it’s bought in. Google don’t create their own map data, they buy it from someone like Navteq. It’s cheaper that way. Data provider has economies of scale. Also waste of time to do it yourself. some companies also act as middlemen between groups, e.g. travel agents ticket bookings and Sabre and the airlines. Companies outsource things. Then there are exclusivity issues.

So even if they wanted to, some companies are contractually prohibited to share their data.

Look at Google Map mash-ups. Google get their map data from NavTeq, but the data used in the Google API is from Tele Atlas. Have to be determined to do this. Might also cost you more money.

Finally, the general public wouldn’t always like it. Personal data, for example.

It’s nice to have. But the benefits are second order. So people label it as low priority.

Once you have an API it will be missing features.

So what should we do?

Not sending emails demanding and API. That just makes you look like a moron.

But… what you can do

1. Be aware of the problems

2. Demonstrate usefulness, screen scrape if you need to, but don’t get yourself cease-and-desisted

3. Don’t assume it’s a technology problem

4. Target the right people, find someone on the inside who can help you

5. Talk about benefits to the provider, not the consumer. If you talk about the benefits to you, they’ll see you just as someone who wants something for free.

6. Have patience. It is getting better every day, and it takes time for business to come round.

Comments are closed.