Xtech 2006: Matt Biddulph – Putting the BBC’s Programme Catalogue on Rails

Matt’s talking about the BBC’s experimental Programme Catalogue. It’s amazing. Absolutely great piece of work.

One million contributors, 1.1 million contributors, going back to the 1920s.

The BBC has some 80 year’s worth of archives, which have been catalogued. Only catalogued stuff that was archived, so no record of stuff that was broadcast but then never archived. The default format for archiving audio was, until the 80s, vinyl. They even have stuff on wax cylinder. And it was basically down to Matt Biddulph, with help from Ben Hammersley, to take their database and do something cool with it. So now you can search for anything and you’ll get as much information as possible, including:

– programmes

– xml

– tags

– by date

– search on keywords

– contributors

– feeds

So here’s the Dr Who search, which tells you how many programmes have been made, and when. And the episode page for New Earth, which includes broadcast details, a very ‘terse’ description, categories which the Beeb has been using to organise its categories for years, list of people involved and an RDF feed. So then I can search on any particular person and see what other entries they have, and can see a cool little graph showing frequency of appearances from 1930 to 2006.

There’s also the ability to do maps of who’s appeared with whom from the FOAF feed.

It’s linked into the rest of the web, for example, Wikipedia so that you can correlate data using the API at developer.yahoo.com.

The site was built in two months in Ruby on Rails. No ‘really good ideas, just really good practice’.

Matt is now going through some techie stuff regarding how he created the site.

Can do a fulltext search. People are spending significant amount of time on the site and people are linking to it not just because of the programmes, but because of all sorts of reason, perhaps talking about an event that happened in the 70s.

Built it really quickly, in two months. But deployment in the Beeb is difficult because the tech is dealt with by Siemens, so they had never seen Ruby, didn’t know about Rails, didn’t work on a fast turnaround. So five months after Matt delivered, the site was still not deployed properly.

The archives people really bought into the project, because their budget always gets cut ‘because they are librarians’. So then people high up picked up the word ‘metadata’, and the archive guys immediately said ‘we have metadata’. They got into the idea of making their stuff available.

Matt worked offsite, communicated by blog where he said what he was doing, and they would send him bugs. Had hardly any meetings, once every two or three weeks.

On the Web 2.0 meme map, does well. Very long-tail, because most of the stuff in the archives doesn’t get much attention. BBC must put its content online at some point, but they don’t have any way of knowing what people want. This gives them an idea of what people want.

Google searches coming in are people searching for really strange and obscure stuff, so there’s a lot of stuff in the database that isn’t anywhere else.

It is perpetual beta, and falls over a lot. Exactly one URL for everything on the site, and they are canonical identifiers. Not a beautiful user experience, but is very rich, very clickable.

[I have to say, this is what the BBC should be talking about and promoting. This is just the cat’s whiskers, and the Beeb are missing a trick by not shouting about this from the rooftops.]