Xtech 2006: Paul Hammond – An open (data) can of worms

Used to work for the BBC, but left three weeks ago, so can’t talk too much about them. Started working for Yahoo! two weeks ago, lots of APIs at the Developer Network. But can’t really talk about that because he’s only been there two weeks.

Ideas he wants to talk about are his personal experience, and experiences of his friends which they’ve told him in confidence, so can’t talk about that either. So this talk will be not as detailed as would have liked.

Open data. BBC and Yahoo! both understand the benefits of open data. Both have made statements about the importance of open data. Both aim to make as much data available as possible. And there are restrictions on the use of those data.

People know that BBC and Yahoo! are opening up their data, because it’s still relatively rare. So when a new company does it, everyone gets excited. So wanted to see how much data there really is.

List of open APIs at www.programmableweb.com/apilist and it’s a fairly good list, but missing a few bits and pieces. It had 201 APIs listed, and they are all on one page. One quarter of APIs listed are from 7 companies

Yahoo!

Google

Amazon

MS

Ebay

AOL

Plus one I missed.

Most of the companies are new, only 14 APIs from companies more than 20 years old. The big old companies are big, and they’ve collected a lot of useful data that we could do interesting things with that it’s not available.

So everyone in our tech bubble think open data is a good idea, but hardly anyeone is doing it. So if open data is such a good idea, why isn’t there more of it? Don’t care about the format of the data.

Haven’t mentioned RSS/Atom. There are millions of RSS feeds, but these highlight the problems even more. You can now get RSS feeds for almost anything you want, but try getting in depth sports statistics, or updated stock market data, or flight times. You can’t get it. RSS is intended to be read in an aggregator, and most of it can’t be reused or republished.

So you can get any data you want from the net, so long as it’s the last 10 items on an RSS feed, and you don’t what to do anything with it.

Why are people happy to put some data out, but not others. Do the tech and standards need to be better? Yes, they are not perfect but they never are. Simple things like character encoding are very easy to get wrong. Definitions are difficult.

But they are good enough. Standards have been developed because there’s a real need to use this stuff behind the firewall. RSS is popular, and most of it is not perfect, but it’s good enough.

So if it’s not the tech, it must be something else. But there’s a simple reason. Organisations don’t do anything unless they think it is in their best interests. A company won’t do anything unless it makes money, so maybe companies don’t think its worthwhile. That means either:

They’re right.

or

They’re wrong.

Either could be correct. But more important is to understand their reasons.

Most companies don’t know what an API is. If they don’t understand the concept of releasing their data online, then standards won’t matter. Explaining the concept of an API is hard when you are talking to people who don’t know how computers work.

People are starting to learn about RSS. They understand that if they use RSS they don’t need to visit the site. But to use it you do need to know a little bit about it. However, it fits in to an existing business model – it drives interest and visitors to their site. Is in a positive feedback loop because the more RSS there is, the more you see it, the more likely people are going to use it.

So assuming the companies knows what an API is…

Most companies make money from their data. So they will say ‘why give it away?’. For some you can explain why it’s good – for a public broadcaster you can say ‘we’ve paid for it already’. For some companies there are reasons – improves branding, etc. – but it’s a risk.

For most companies, they want competitive advantage. So if a competitor has opened up then you have to open up to keep up.

If you sell data and then you start giving it away it reduces the perceived data. If you sell it for tens of thousands of pounds, then why are you giving it away? Gets into a downward spiral as to what that data is worth.

Opening up data is risky – risk losing money that you’re making. Could argue that they are wrong, but not sure that they are.

Many companies are not allowed to open up, even if they want to.

Lawyers say no. Most companies don’t have complete rights over the data they used. So stock prices on the evening news don’t come from the broadcaster, it’s bought in. Google don’t create their own map data, they buy it from someone like Navteq. It’s cheaper that way. Data provider has economies of scale. Also waste of time to do it yourself. some companies also act as middlemen between groups, e.g. travel agents ticket bookings and Sabre and the airlines. Companies outsource things. Then there are exclusivity issues.

So even if they wanted to, some companies are contractually prohibited to share their data.

Look at Google Map mash-ups. Google get their map data from NavTeq, but the data used in the Google API is from Tele Atlas. Have to be determined to do this. Might also cost you more money.

Finally, the general public wouldn’t always like it. Personal data, for example.

It’s nice to have. But the benefits are second order. So people label it as low priority.

Once you have an API it will be missing features.

So what should we do?

Not sending emails demanding and API. That just makes you look like a moron.

But… what you can do

1. Be aware of the problems

2. Demonstrate usefulness, screen scrape if you need to, but don’t get yourself cease-and-desisted

3. Don’t assume it’s a technology problem

4. Target the right people, find someone on the inside who can help you

5. Talk about benefits to the provider, not the consumer. If you talk about the benefits to you, they’ll see you just as someone who wants something for free.

6. Have patience. It is getting better every day, and it takes time for business to come round.

Xtech 2006: Tom Loosemore – Treating Digital Broadcast As Just Another API, and other such ruminations

Going to tell us a story about Mr Baird, Mr Moore, and Mr Berners-Lee.

10-15 years ago, Mr Baird ruled the roost, but we know about TV and what makes great TV is great programmes, fabulous stories fabulously told. Mr Moore then came a long and said our chips will get faster, our kit will get smaller, and his corollary, that disks will just keep getting bigger. That was 30 years ago. 15 years ago Mr Berners-lee populated the net, and said the ‘internet is made of people’.

10 years ago, Mr Loosemore started working for Wired in the UK as a journalist before they went bust. One of his jobs was to keep abreast of Moore’s Law, as the editor wanted to do a monthly feature on costs and size of computing equipment. Recently found a spreadsheet from 95 charting ISP costs and it was really expensive. In 95 everything was analogue – TV, satellite, cable.

Then in 98 Mr Murdoch gave away digital set-top boxes, and it cost £2billion, but the market thought he was nuts. News International nearly cost him the business. But he saw that it was an essential move, because it gave him more bandwidth. In the UK in 95 you had 4 maybe 5 channels, but when Murdoch went with his set-top box, you had hundreds.

Then digital terrestrial started, which was rubbish, but then taken over by the BBC and you can have about 30 free channels.

Doesn’t look at digital broadcasting the same way that everyone else does. Sees it as a way of distributing 1s and 0s. Doesn’t see it as programmes, but as data.

Lots of different standards and formats.

Also live P2P being used to stream live TV.

Focus on Freeview, and view it as an API.

Expect from an API:

– rich, i.e. interesting. 30 TV channels and a bunch of radio is rich.

– open. Freeview is an unencrypted

– well structured. in theory Freeview is

– scalable

– very high availability, it doesn’t fall over

– accessible

– proven

Doesn’t do so well:

– licence? licence is domestic and personal, so do what you will so long as it is domestic and personal.

– documented? Theoretically, dtg.org.uk. But the documentation is copyrighted and managed by Digital Television Group, so have to be a member before you can get the documentation.

But it’s not hard to reverse engineer, so you can see where the broadcasters are adhering to the standards and where they are being a bit naughty.

Five years ago, Freeview is just taking off in the UK, but other stuff also going on.

There’s a lot of data, 2mpbs MPEG2, 2GB storage per day, 50gb per channel per day, so a terabyte will store 4 channels for a week. But linear TV is a bad way to distribute stuff – most of the time you miss most of the stuff. So what if we just record everything.

So, colleagues built a box to store entire broadcast from the BBC for a week. 2.3 terabytes of storage. About 1000 programmes. Had it for about three weeks. When you’ve got that much choice, existing TV interfaces like the grid layout don’t work. Too much data.

Broadcast metadata alongside the programmes, and the BBC have created an API for that metadata, Got 18 months worth of programme metadata, and got Phil Gifford turned it into a website. Got genre data, but that’s pretty useless when you have 100,000 programmes and it’s not help finding stuff you like.

But if you show people stuff that people are in, say programmes with Caroline Quentin, that’s helpful. Mood data was about as useful as genre, but associate it with people it becomes interesting.

Then discovered the BBC Programme Catalogue. Wonderfully well structured data model, and amazing how disciplined they had been in keeping their vocabularies consistent. So Matt Biddulph put it online, and the crucial thing is that everything is a feed – RDF, FOAF etc.

But that’s only the metadata. Where are the programmes?

So, 12 TB stores all BBC TV for 6 months, and that’s a lot of programmes. But what happens when you give people that amount of content? Can’t make it public, but can make it available to BBC staff, who have to watch a lot of TV in order to do their job. Built an internal pilot, the Archive Testbed, which is no longer live. Took the learning from the metadata only prototype and found a few things.

Keep the channel data. Channels are useful and throwing them away too soon cost them. Channel brands are more than just a navigational paradigm, they are a kite mark of different types of programme. So some programmes scream ‘BBC 2’, for example.

Give people all the metadata, all of which came from external broadcast sources, not internal databases.

Added ratings and comments, links to blog posts, bit of social scheduling – what are my friends watching? What do people recommend? If I don’t know what I want, I want other people to tell me.

Was fantastic, but had to limit it to a couple of hundred people within the BBC. Was a bit too popular for their own good.

In the R&D department, a couple of them worked on a project called Kamaelia to create framework to plug together components for network applications and about six months ago, persuaded them they needed a project for that framework and so applied it to this.

Hopefully will make the project very successful. Now BBC Macro has been released as a pilot. Will be eventually everywhere.

Xtech 2006: Di-Ann Eisnor – Collaborative Atlas: Post geopolitical boundaries

Platial, trying to help link people to people.

People have been mapping their lives, autobiogeography: where you were born, went to school, etc. They are mapping things of historical importance, e.g. ‘Women who changed the world’. Maps for hobbies and interests, e.g. bird-watchers and cat-lovers.

Over 4000 maps. Everything from food to activists to romantic encounters.

Has tags and comments. Can embed video.

When people get to ‘own’ places, geopolitical boundaries start to melt. Initial analysis. They looked at tags and found a social topography irrelevant to proximity or national borders. Correlated cities based on users, and some cities are gateways to other cities.

Some themes within the tags are universal from city to city, e.g. city names, coffee, restaurants, food, art and home.

Aggregating geodata in Placedb. Taking location point data such as geoRSS or geotagged data, or data that includes city or street names, and then apply comparative analysis algorithms to find the location of documents with no obvious location. So can collect Flickr pictures, Reuters stories, etc. for a specific area, e.g. your home town, and this is fed into Platial, e.g. the London page.

Need more geofeeds into Placedb.

Xtech 2006: Matt Biddulph – Putting the BBC’s Programme Catalogue on Rails

Matt’s talking about the BBC’s experimental Programme Catalogue. It’s amazing. Absolutely great piece of work.

One million contributors, 1.1 million contributors, going back to the 1920s.

The BBC has some 80 year’s worth of archives, which have been catalogued. Only catalogued stuff that was archived, so no record of stuff that was broadcast but then never archived. The default format for archiving audio was, until the 80s, vinyl. They even have stuff on wax cylinder. And it was basically down to Matt Biddulph, with help from Ben Hammersley, to take their database and do something cool with it. So now you can search for anything and you’ll get as much information as possible, including:

– programmes

– xml

– tags

– by date

– search on keywords

– contributors

– feeds

So here’s the Dr Who search, which tells you how many programmes have been made, and when. And the episode page for New Earth, which includes broadcast details, a very ‘terse’ description, categories which the Beeb has been using to organise its categories for years, list of people involved and an RDF feed. So then I can search on any particular person and see what other entries they have, and can see a cool little graph showing frequency of appearances from 1930 to 2006.

There’s also the ability to do maps of who’s appeared with whom from the FOAF feed.

It’s linked into the rest of the web, for example, Wikipedia so that you can correlate data using the API at developer.yahoo.com.

The site was built in two months in Ruby on Rails. No ‘really good ideas, just really good practice’.

Matt is now going through some techie stuff regarding how he created the site.

Can do a fulltext search. People are spending significant amount of time on the site and people are linking to it not just because of the programmes, but because of all sorts of reason, perhaps talking about an event that happened in the 70s.

Built it really quickly, in two months. But deployment in the Beeb is difficult because the tech is dealt with by Siemens, so they had never seen Ruby, didn’t know about Rails, didn’t work on a fast turnaround. So five months after Matt delivered, the site was still not deployed properly.

The archives people really bought into the project, because their budget always gets cut ‘because they are librarians’. So then people high up picked up the word ‘metadata’, and the archive guys immediately said ‘we have metadata’. They got into the idea of making their stuff available.

Matt worked offsite, communicated by blog where he said what he was doing, and they would send him bugs. Had hardly any meetings, once every two or three weeks.

On the Web 2.0 meme map, does well. Very long-tail, because most of the stuff in the archives doesn’t get much attention. BBC must put its content online at some point, but they don’t have any way of knowing what people want. This gives them an idea of what people want.

Google searches coming in are people searching for really strange and obscure stuff, so there’s a lot of stuff in the database that isn’t anywhere else.

It is perpetual beta, and falls over a lot. Exactly one URL for everything on the site, and they are canonical identifiers. Not a beautiful user experience, but is very rich, very clickable.

[I have to say, this is what the BBC should be talking about and promoting. This is just the cat’s whiskers, and the Beeb are missing a trick by not shouting about this from the rooftops.]

Xtech 2006: Paul Graham – How American are Startups?

Sitting in the Grand Ballroom of the Hotel Grand Krapolinsky Krasnapolsky in central Amsterdam, at yet another conference. This is the third in three weeks, and I’ll be glad when next week comes and I don’t have to think about writing a presentation. Well, for a few weeks at least.

First up:

Paul Graham – How American are Startups?

I
‘m here to talk about start-ups. Well, if I was giving a talk about start-ups in the US, there’d be a lot more people here, so maybe that says something, or maybe I’m reading too much into it.

Could you recreate Silicon Valley elsewhere? With the right 10,000 people, yes. It used to be that geography was important, but now it’s having the right people. You need two kinds of people to create a start-up: rich people and nerds. Towns become start-up hubs when there are rich people and nerds. NYC could not be a start-up hub because there are lots of rich people but no nerds. Result: no start-ups. Pittsburgh has the opposite problem – lots of nerds, no rich people. Uni of Washington yeilded a hi-tech community in Seattle, but Pittsburgh has a problem with the weather, and no beautiful old city, so rich people don’t want to live in Pittsburgh.

Do you need rich people? Would it be best if the gov’t invested? no. You need rich people, because they tend to have experience and connections, and the fact that it’s their money makes them really pay attention. The idea of gov’t beaurocrats making start-up decisions is comic, it’d be like mathmaticians running Vogue… or editors of Vogue running a maths journal. Start-ups funded by burocrats would be competing with start-ups run by rich people who have their own money on the line.

Start-ups are people, not the buildings. Creating business parks doesn’t help start-ups, because start-ups do not use that kind of space. Where a start-up starts it stays, so all you need your three guys sitting round a kitchen table. If you can get rich people and nerds together, you can recreate Silicon Valley.

Smart people like smart people and will go where they are, so universities are good. First rate compsci depts are important to this, preferably one of the top handful in the world, and has to stand up to MIT and Stanford. Professors consider one factor only – they are attracted by good colleagues. So if you can attract the best people then you will create a chain reaction which would be unstoppable. Just takes about half a billion, which is within the reach of any developed country.

But you need a place where investors want to live and students want to live when they graduation. It needs to be a major air travel hub so people can travel easy. Investors and nerds have similar taste because most investors used to be nerds. Taste can’t be too different, but also can’t be too mainstream.

Like the rest of the creative class, nerds want to live somewhere with personality, that’s not mass produced. To create a start-up hub, you need a town that doesn’t have mass development of large tracts of land [so, no Milton Keynes then?]. Most personality is found in older towns. Pre-war apartments are built better, and people like them more.

You can’t build a Silicon Valley, you let it grow.

Any town’s personality needs to have a good nerd personality. Nerds like towns where people walk around smiling, so not LA because people don’t walk around, or NYC where people don’t smile.

Nerds will pay a premium to live where there are smart people. They like quiet, sunlight, hiking. A nerd’s idea of paradise is Berkeley or Boulder. The start-up hubs in the US are very young-feeling, but not new towns. Want a place that tolerates oddness. Get an election map and avoid the red bits.

To attract the young, you need a city with a live centre. None of the start-up hubs have been turned inside out like some Americans have. Young people do not want to live in suburbs.

Within the US, Boulder and Portland have the most potential. They are both only a great university short of being a start-up hub.

The US has some significant advantages:

1. Allows immigration. Would be impossible to reproduce Silicon Valley in Japan, because most of the people there speak with accents and the Japanese don’t allow immigration. It has to be a Mecca, and you can’t have a Mecca if you don’t let people in.

2. Won’t work in a poor company. India might one day produce a Silicon Valley, because it has the right people but it’s still very poor. US has never been as poor as some countries are now, so we have no data to say how you get from poor to Silicon Valley. There may be a ‘speed limit’ to the evolution of an economy.

3. The US is not (yet) a police state. China might want to create a Silicon Valley, but their tradition of an autocratic central gov’t goes back a long way. Gets you efficiency but not imagination. Can build things, but not sure it can design things. Hard to have new ideas about tech without having new ideas about politics, and many new tech ideas do have political implications so if you squash dissent you squash new ideas. Singapore suffers a similar problem.

4. Need really good unis, and their US has those. Outside of the US, people think of Cambridge in the UK, then pause. The best professors seem to be all spread out, instead of concentrated, so that hinders them because they don’t have good colleagues, and their institutions don’t act as a Mecca.

Germany used to have the best universities, until the 30s. If you took all the Jews out of any university in the US, there’d be a huge hole, so as there are few Jews in Germany, perhaps that would be a lost cause.

5. You can fire people. One of the biggest obsticles are the rigid labour laws in Europe, which is bad for start-ups because they have the least ability to deal with the beurocracy. Start-ups need to be able to fire people because they need to be flexible. EU public opinion will tolerate people being fired in industries where they care about performance, but that seems limited to football.

6. Work is less identified by employment in the US. EU has the attitude that the employer should protect the employee. Employment has shed these paternalistic overtones, which makes it easier for start-ups to hire people, and easier to start start-ups. In the US, most people still think they need to get a job, but the less you identify work with employment the easier it is to start your own company. All you have to do is imagine it.

A year after the founding of Apple, long after they had sold their stuff, Steve Wozniak was still working for HP. When Jobs found someone to give them venture funding, on the condition that Woz quit, he initially refused.

7. America is not too fussy. If there are any laws regarding businesses, you can assume that start-ups will break them because they don’t know them and don’t care. They get run out of places that they shouldn’t be run out of, like garages or apartments. Try that in Switzerland and you’d likely get reported. To get start-ups you need the just right amount of regulation.

8. US has a huge domestic market. Start-ups usually begin by selling locally, which works because their is a huge domestic market. In Sweden, for example, the market is smaller. The EU was created to form an international market, but everyone still speaks different languages. But it seems as if everyone educated person in Europe now speaks English, if present trends continue, more will.

9. Funding. Start-up funding doesn’t only come from VCs, but also from business angels. Google might not have got where they were without angel funding of $100k. All you need to do to get that process started is to get a few start-ups going, but the cycle is slow. It takes five years for a successful start-up to produce a potential angel investors. You need angels as well as VCs.

10. America has a more relaxed attitude to careers. In the EU, the idea is that everyone has a specific occupation, but in the US things are more haphazard. That’s a good thing. A start-up founder is not the type of career a high-school kid will choose – they choose conservatively. Start-ups are not things you plan, so you are more likely to get them in a society that allows career changes on the fly.

Compsci was supposed to provide researchers, but in actual fact most students are there because they are curious.

Americas schools might be a benefit – they are so bad, that people wait til college before they make their decisions about what their career will be.

This list is not meant to suggest America is the best place for start-ups. It should be possible not to duplicate, but improve on Silicon Valley. what’s wrong with SV?

– It’s too far away from San Francisco. You either live in the Valley, or commute from SF. Would be better if the Valley was actually interesting, but it’s the worst sort of strip development.

– Bad public transportation. There’s a train, which is not so bad by American standards but by Eu standards, its’ awful. So design a town for trains, bikes and walking and cars last, but it’ll be a long time before the US does that.

– Have lower capital gains taxes. Low income tax isn’t so much of an issue, but capital gains is. Lowering the tax instantly increases returns on stocks, as opposed to real estate, so to encourage start-ups, have low cap gains. But decreases in cap gains disproportionately favour the rich. Belgium as cap gains of 0.

– Smarter immigration policy. People running Silicon Valley are aware of the shortcomings of the US immigration system, and since 2001 it has become very paranoid. What fraction of the smart people who want to come to the Valley want to actually do get in? Half? The US policy keeps out most smart people and puts the majority in crap jobs. A country which got immigration right coudl become a Mecca for smart people simply by letting them in.

So basic recipe for a start-up hub is a great university and a nice town. You could improve on Silicon Valley easily. Just let people in.

Technorati blogtags

Technorati have launched a service so that you can search for blogs on a specific topic using blog-level tags (initially scraped from your categories). I’ve been wanting this for ages, so that I can at last go and find ‘blogs about copyright‘ rather than ‘posts about copyright‘ or ‘posts that happen to mention the word copyright, probably at the bottom of the page instead of using a ©‘.

Go, claim your blog and add in your blogtags.

Last.fm relaunch site – with tags

The guys over at Last.fm have have finally merged the Last.fm music playlist sharing and internet radio site with their Audioscrobbler.com site, which served the plugin that you need to make Last.fm work (fyi: my page). I spoke to them about that before Christmas, but sadly didn’t get the chance to work with them. What’s really cool about the new redesign is that they’ve added tags to the mix, so you can forget about all that horrible music taxonomy stuff and just tag stuff however you like. Plus there seem to be a lot more in the way of charts now, for the statistically obsessed. Good work, guys.