NowPublic NowFunded

I was the first person to blog about the launch of Michael Tippett’s participatory news network, NowPublic, which marries news stories from the media and public with “crowd-sourced” media such as photos and videos. I saw Michael demo NowPublic last February at the fabulous Northern Voice conference in Vancouver. Over a year later, just a few weeks ago, Michael, Kevin and I met up at a conference in London and had a really nice evening talking about everything, almost except NowPublic.

I’m delighted to announce that NowPublic has raised a healthy US$1.4 million in angel financing, lead by Brightspark Ventures. Congratulations to co-founders Michael, Leonard Brody and Michael Meyers and all the angels involved.

NowPublic met with early success when U2 played a ‘secret’ gig in New York. The photos posted on the site were fantastic – a realtime record of a gig posted without the aid of paid photographers or the traditional media. As an event of national and international interest to U2 fans, it was a bit of a no-brainer for people who were there to take and post photographs.

Since then, NowPublic has become one of the fastest growing news networks, with (and here I quote from the press release) “over 15,000 reporters in 130 countries and over 2 million unique visits a month. During Hurricane Katrina, NowPublic had more reporters in the affected area than most news organizations have on their entire staff.”

But what is news? We frequently thing of news as being events that have national or international importance, but much more news happens at a local or hyperlocal level and these are the types of events that we are less likely to share because they don’t ‘seem like news’ to us. We also tend to think of ‘news’ as being the same as ‘current events’, but in actual fact it spreads far wider than that, into technology, science, sports and beyond.

This is where NowPublic has huge potential – to be a repository of hyperlocal and focused news that is defined not by the sections in your newspaper or the packages on the 1 o’clock bulletin, but by the people who are involved or who witness what happened. We can make our own news – we just have to remember that what we are experiencing is newsworthy.

I myself have contributed to the site a paltry once, when I reported on a “five alarm” fire in San Francisco last July that happened just a few blocks away from where I was staying. I could have contributed more often, and one missed opportunity in particular springs to mind.

Kevin and I were walking to Holborn station in London, only to find that area sealed off. To find a tube station shut is not that big of a deal in London, but the fact that the surrounding roads were sealed off and the place was swarming with police was much more unusual. Had I had any presence of mind, (or a decent cameraphone), I would have taken some snaps, posted them on NowPublic and asked if anyone knew what had happened. Something patently had, but the traditional news outlets didn’t cover it, and the London Underground site never even mentioned the closure of the station. Yet there was news there – I could smell it. My curiosity nearly killed me.

But much participatory media happens at the behest of an authoritative source – XYZMediaCo requests photos of a specific event, or a news anchor invites people to text or email in questions. Under some circumstances – such as the London bombings or the Buncefield fire, the media can be inundated with images and reportage. But we, the public, frequently forget that smaller events are news too, and retraining us to think more critically about what is news is a hefty challenge I am sure that Michael will relish.

How many news outlet staff actually read their own RSS feeds?

I don’t have a TV. I also don’t have a radio. I get my news the same way any self-respecting geek does, via the intarwebthingy. It used to be that I would pop along to news websites and see what was going on, but then Dave Winer invented RSS and that saved me all the fuss and bother of having to figure out whether a site had been updated or not by conveniently feeding new articles into my aggregator. Wonderful.

Blogs, you see, have been using RSS for almost as long as it and they have been around, because blog software is written by geeks, and geeks do like to save themselves some effort whenever they can. RSS was invented in 1997 when Winer created an XML syndication format for use on his blog. Now no self-respecting blog is without a feed. Yay us.

News outlets, on the other hand, suffer from Chronic 90s Web Buzzword Syndrome, which means that they are still thinking about ‘stickiness’ and ‘eyeballs’. I don’t know about you, but the thought of sticky eyeballs quite makes my stomach churn. However, they have – slowly, painfully, and with no small amount of looking over their shoulder to see if the Big Nasty Sticky Eyeball Eating Monster was creeping up behind them – adopted RSS. Despite the fact that bloggers saw RSS as a no-brainer, the media had to think long and hard before they committed to using a technology which made it easier for people to find out what they had published on their websites and which could, therefore, drive lots of traffic their way.

But they’ve got there. Sort of.

I’m glad that The Times, The Guardian, The Independent, the BBC et al are using RSS. I am, at heart, a lazy wossit, and I much prefer my news to come to me, rather than for me to have to go out and find it. However, I am afeared that the media has not quite paid enough attention to RSS, making the consumption of news via my aggregator a painful and unpleasant experience.

Firstly, no one seems to have figured out that when you change a story, the changes show up in some aggregators. I use NetNewsWire, and it’s set to show me the differences between old and new versions of an RSS feed. It’s true to say that sometimes NNW misinterprets what constitutes a change, but it also exposes all the real changes made to news stories.

The BBC seems to have the biggest problem with constantly changing RSS feeds. I brought this up once with a meeting of senior BBC news execs, and they failed to understand why this is a problem. It’s not just that it’s irritating – changing a story even a little bit causes it to be republished which then flags it up as ‘unread’ in my aggregator, even though I have actually read it. It’s also that changing stories after they have been published is unprofessional and damages the news source’s credibility. When I link to a news story, I want it to say the same thing next week as it said when I linked to it.

When I explained this to the BBC’s news execs, they cried in exasperation that they couldn’t possibly be expected to be right all the time, and where do you draw the line between a major update, which gives the story a new URL, and a minor update? Well, that is a good question. Another good question is, why do you even do minor updates? Perhaps better sub-editing, along with not rushing too fast to publish, would help get rid of the need for minor updates, and any major changes to the story are dealt with by a new article? Or perhaps there is an even better way to deal with additional facts coming in, such as saying ‘Update’ in the article, or some other methodology that I haven’t thought of that doesn’t screw with the integrity of the original.

To be fair, not all of the BBC’s RSS output is affected. Out of nearly 40 items from the BBC in my aggregator at the moment, five have changes. That’s only ~13%, which you might thing is negligible, but I think that figure should be zero.

You’ll also find, when you click through to the site, that the first paragraph is different from the excerpt that’s published in the RSS feed. Considering how concise some of these first paragraphs on the site are, it makes you wonder why the BBC are writing separate excerpts at all, and particularly makes me question why those excerpts get edited. Seems like make-work to me.

Here are a few examples from today, picked in sequence from headlines published around the 15:30 mark and including the copy from the website as well. I have replicated the additions (in italics) and the deletions (strikethrough) exactly as they show up in NetNewsWire, so you have to take into account its inherent over-enthusiasm for marking things as changed.

Virus-hit cruise firm apologises

Five hundred UK holidaymakers are sent home after their Hundreds of passengers whose cruise ship was detained because of holidays were ruined by a severe virus.virus outbreak are to be offered refunds.

Virus-hit cruise firm apologises

A travel company has apologised and offered a refund to hundreds of passengers whose cruise holidays were ruined by a virus outbreak.

US crash sparks Afghanistan riot

At least seven people are killed in the Afghan capital, Kabul, Violent disturbances rock Kabul after a deadly traffic accident involving a US convoy crashes, triggering a riot.military convoy.

US crash sparks Afghanistan riot

At least seven people have been killed in the Afghan capital Kabul after a traffic accident involving a US military convoy sparked mass rioting.

Race against time in Java quake

The United Nations warns that the task of bringing taking aid to the survivors of the earthquake in Indonesia is “enormous”.“enormous”, the United Nations warns.

Race against time in Java quake

The task of helping survivors of Saturday’s earthquake on the Indonesian island of Java is “a race against the clock”, the United Nations has warned.

Worse than the BBC is Google News. By it’s very nature, Google News is all about change, but by god that screws with your RSS feeds. Out of about 80 items, 68 had changes. Now, Google News aims to track news stories from multiple sources, so it is inevitable that their items should change frequently, but it makes it completely useless in an RSS aggregator, because every time I refresh, the items that I had read become marked as unread again because Google News have either done something as minor as changed the timestamp from “5 hours ago” to “6 hours ago”, which is not hugely useful, or added a new source, or substantively changed the copy.

This breaks Google News’ RSS feed in terms of usability. There’s just no way I can continue to have Google News in my RSS reader.

Now, what the BBC does get very right is its timestamps. Items published today have the time published as their timestamps, and items published yesterday and before have the date.

Would that The Times could learn that timestamps are important. Instead, RSS items from The Times are timestamped with the time that I refresh my aggregator, not the time that they are published. I have my news feeds grouped in one folder and I read them en masse. News is highly time-sensitive, and I want to read stuff as it comes in, so having an accurate timestamp is essential. The Times, however, hasn’t figured this out yet. Instead, I get a cluster of items grouped around a single timestamp, and when I refresh, I not only get new items, I get repeats of old items with the new ‘timestamp’. This is not helpful.

For example, I learnt that ‘3,000 UK troops are Awol since war began’ both at 11:09 and at 13:58; and that ‘Abbas threatens Hamas with referendum over blueprint’ both on 25th and 26th of May. These items are in the same feed, and appear to be identical, yet they are showing up twice.

The Guardian is pretty good, compared to The Times, Google News, and the BBC, in the way they treat RSS, as I would expect considering they have people like Neil McIntosh and Ben Hammersley to advise. They get timestamping right – the clusters of articles all being published at once is more to do with their editorial time-table than bugs in their RSS feed.

But there is still room for improvement. Whilst they do edit their RSS excerpts, sometimes just as pointlessly as the BBC, they do it a lot less often, so my main criticism would be that they are inconsistent in their excerpt writing habits. Some articles get a sentence, others get two bullet points; and sometimes the excerpt (and headline) is the same as on the site, and sometimes it isn’t. I have to say, I’d prefer a single sentence excerpt and headline which was the same as the site.

A few examples of what I mean.

Ghost ship washes up in Barbados

· 11 petrified corpses found in cabin· Letter left by dying man gives clue

After four months at sea, ghost ship with 11 petrified corpses washes up in Barbados

· Letter left by dying man gives clue to investigators

· Dozens of others thought to have perished en route

Climber left for dead rescued from Everest

A climber who was left for dead on Mount Everest has been found alive.

Climber left for dead rescued from Everest

· Team forced to leave Australian at 8,800 metres

· ‘I imagine you’re surprised to see me,’ he tells rescuer

Cage swaps Malibu for own desert island

Nicolas Cage has bought a 40-acre undeveloped island in the Bahamas for $3m (£1.6m)

Cage swaps Malibu for own desert island

Dan Glaister in Los Angeles

Monday May 29, 2006

The Guardian

Wherever you go, people stare at you. Paparazzi take pictures, fans ask for autographs, absolute strangers wonder aloud if they once met you at a party. For the hard-pressed celebrity there’s only one way to get away from it all: hide on your own desert island.

The surprise in all this is that the one newspaper who gets it spot on is The Independent. I really can’t fault their RSS feed at all. Timestamps are reliable, and again reveal their editorial timetable with many articles being published in the small hours and few being published during the day. Their excerpts vary in length, but some of the longer ones are more useful than those of other news outlets. Personally, I like longer excerpts because I would rather skim a two sentences that give me a better feel for whether I want to read the article than have just one short sentence that doesn’t tell me much.

Some examples, again with the copy from the RSS and the copy from the site:

British journalists killed in Iraq

Two British television journalists were killed in Iraq by a roadside bomb today.

British journalists killed in Iraq

PA

Published: 29 May 2006

Two British television journalists were killed in Iraq by a roadside bomb today.

Bush ‘planted fake news stories on American TV’

Federal authorities are actively investigating dozens of American television stations for broadcasting items produced by the Bush administration and major corporations, and passing them off as normal news. Some of the fake news segments talked up success in the war in Iraq, or promoted the companies’ products.

Bush ‘planted fake news stories on American TV’

By Andrew Buncombe in Washington

Published: 29 May 2006

Federal authorities are actively investigating dozens of American television stations for broadcasting items produced by the Bush administration and major corporations, and passing them off as normal news. Some of the fake news segments talked up success in the war in Iraq, or promoted the companies’ products.

Indonesia Earthquake: As a people, they already had little – now they are left with nothing

In the morning, Salim retrieved the lifeless body of his three-year-old son, Sihman, from the ruins of their brick and bamboo hut. In the afternoon, he buried him, digging the grave himself. As night fell, he searched through the rubble of his former home for scraps of food. “I have lost everything,” he said.

Indonesia Earthquake: As a people, they already had little – now they are left with nothing

By Kathy Marks in Bantul, Indonesia

Published: 29 May 2006

In the morning, Salim retrieved the lifeless body of his three-year-old son, Sihman, from the ruins of their brick and bamboo hut. In the afternoon, he buried him, digging the grave himself. As night fell, he searched through the rubble of his former home for scraps of food. “I “I have lost everything,” he said.

I like the fact that the RSS feed and the website copy are identical. To me, that’s ideal – what I see in RSS is what I get on the site. I also can’t see any evidence of changes, although I will say that this RSS feed is new in my aggregator so maybe this point will clarify itself over time.

Now, all this criticism may seem like pointless nit-picking. Perhaps some it is down to my inner editor screaming for consistency and my inner blogger begging for honesty, but certainly some of this has a direct impact on the usability of RSS feeds for the reading of news.

I want news. I have no problem with the idea of clicking on a link in my aggregator and reading the full article on the news outlet’s website – this is not a plea for full posts (although hell, that’d be great and if Corante can put advertising in their RSS feed, so can anyone, but that’s not the point I want to make).

It’s a plea for journalists and media IT staff to think a little harder about how news is being read these days. RSS is not a fad, and it’s not going to go away. It is going to flourish, with more and more people using it to get their news from many disparate sources. It is in your best interests to ensure that your RSS feeds work, that your editorial policies take into account the effect of new technology on the transparency of your medium, and that you strive at all times for honesty even if that means owning up to your updates (note, ‘update’ does not necessarily mean ‘mistake’).

Google News is revealing your reliance on syndicated content, and RSS is revealing your edits. If you want to remain credible, you must adapt. In an increasingly competitive world, where people choose which news sources to read not just based on content but also on usability and accessibility, can you really afford not to?

Webinar: News as conversation

It was live from North London as I did a ‘webinar’ Tuesday night on the nitty gritty of how we do a global interactive radio programme five nights a week on the BBC World Service. Francois Nel from University of Central Lancashire invited me to take part in their Journalism Leaders Forum. You can watch the whole thing here.

First off, we try to eavesdrop on conversations around the world, virtually get a sense of what people are talking about in cafes and around water coolers the world over. What are the most viewed, most e-mailed stories on major news sites? What are bloggers talking about? We check Global Voices, the global blog network based out of Harvard. What are the stories coming picked up by BBC Monitoring, our global media monitoring department? We do a roundup on our blog and ask the audience what is important to them.

With the help of our audience, we settle on topics to discuss that day. We often post debates on the Have Your Say section of the BBC News Website. We use a discussion system based on Jive Software. People can not only comment, but also leave an e-mail address and phone number. Personal information apart from name and place don’t appear on the public site, but we can log in and see those contact details to invite people to join our on air discussion.

Our blog is beginning to gain some momentum. We’ve got on average four comments per post, and I’m really pleased on how the blog allows the conversations to continue long after our on air discussions finish. This is what I meant by saying that blogs can overcome the limits of linear media. We’ve only got one hour on air, but our audience can explore other threads of discussion online for weeks to come.

We’ve had some amazing conversations grow out of it. I remember recently when we had a south Asian sailor calling on a sat phone from a ship in the Molucca Straits talking to another Asian Muslim living in Stockholm being asked questions by a caller from Austin Texas in the United States about recent violent clashes between Hamas and Fatah factions in the West Bank and Gaza.

We’ve still got more to do. As I said on the webinar, we’re still building community around the programme. People often say that the BBC has a huge audience. Recent figures show the BBC World Service has 163 million listeners. But a sense of community is different from a large audience. Community is a sense of ownership, belonging and participation. The greater the community we build around the programme, the more the audience will feel a sense that this is their programme. As I’ve said before, building community around a global discussion programme is difficult. Community develops around several shared things, place or a shared passions or interests.

Another question asked was how to make money with blogs. Suw often says that she doesn’t make money with her blog but because of her blog. There is a lot of truth even for us in traditional media. I remember in the late 90s people in traditional media said that the web was great but there was no way to make money with it. Now, many media websites turn a profit, a profit not necessarily that is replacing revenue lost from their traditional business, but a profit. And I believe that blogs can renew our relationship with our audiences.

It’s not simply a commercial relationship. A lot of my colleagues ask me why I blog. I found that when I wrote the blog during the US elections in 2004 that it reminded me a lot of the relationship I had with my readers when I first started out in journalism as a local newspaper reporter. I was part of the communities that I wrote about in western Kansas. That was one of the things that made journalism a fulfilling job for me.

Even though in 2004, I was writing the blog for people all over the world, I felt I was writing for a community again, not just readers. I got more response from the blog I wrote than almost anything I have done for the BBC. I think there are a lot of opportunities for news organisations to embrace blogging to renew our relationship with our audiences. While I won’t outline a business model with facts and figures about a return on investment, I know that blogs can help us create compelling content. And that is the start for any media business model.

Xtech 2006: Wrapup

Well, Xtech ended on Friday afternoon, and I’ve had the weekend to recover and to think about it all. Actually, I’ll need a lot longer than a weekend to process all the stuff that I took in, but it’ll be fun cogitating on everything I heard. I think Edd Dumbill and Matt Biddulph lined up some fantastic speakers, and having produced a conference myself in the past, I know just how much work it is.

The talks that stood out for me were:

– Matt Biddulph, talking about putting the BBC’s programme catalogue online.

Paul Hammond, on open data and why there isn’t more of it about.

Tom Coates, doing his web of data talk, which is always good.

Had some fun conversations too, with a whole host of people, including but not limited to: Teh Ryan King, Brian Suda, Thomas Vander Wal, Jeremy Keith, Simon Willison, Matt Patterson, Jeffrey McManus, and Jay Gooby. I’m sure I’ve missed someone off: sorry if that’s you!

The next Xtech will be in Paris in 2007. Can’t wait.

Creative procrastination

Derek Powazek, with whom I worked at Technorati for a week or so last summer and whose design chops I highly rate, writes about how important is it to let things stew sometimes. This is what’s missing from my life right now. Although I’ve had more time lately – sitting on planes, waiting at airports – which is already combining with some really cool conversations with some really bright people and threatening to splurge out of my brain just as soon as I get a moment free.

Xtech 2006: Jeff Barr – Building Software With Human Intelligence

Amazon have released a number of APIs, but going to focus on the Mechanical Turk.

[He has ‘ground rules’ and says ‘please feel free to write about my talk’. Er… command and control anyone? Give me a break.]

[Lots of guff about Amazon.]

Artificial artificial intelligence.

The original Mechanical Turk was a chess playing robot. No one could figure out how it worked. Turns out there was a ‘chessmaster contortionist’ inside.

So the MTurk is powers by people. Machines can’t tell the difference between a table and a chair, whereas a person can do it immediately.

HIT – Human Intelligence Task.

Can check people have the appropriate skills, e.g. be able to speak Chinese or tell a chair from a table.

So you make a request, say what skills etc. are required, figure out the fee you’re willing to pay. Workers go to MTurk.com. 45 types of work, can filter by price or skills required etc. Transcriptions are very popular. E.g. CastingWords.com do transcriptions of podcasts. Question and answer kinds of things.

You decide if your workers get paid, but there are ratings on both sides, so you can rate the employers as well as the workers.

Software Developers

– can use APIs with this to include humans in their applications

Businesses

– can get stuff done that humans need to do

Anyone

– can make money

– new businesses feasible

Public use – massive scale image clean-up, i.e. which picture best represents the thing it’s of? Got Slash-Dotted to death. People did greasemonkey scripts to help them doing it. Had so many people doing this that they ran out of images. Had more workers than work for a while.

HIT Builder, helps you create your HIT (task thing).

Could use it for market research or surveys. E.g. wanted a survey for developers, so added some qualifications to weed out the non-developers by making people answer questions like ‘which of these four aren’t programming languages’.

Translations services.

Translate written transcripts to audio.

Image Den, photo editing and retouching, e.g. removing red-eye, cutting things up etc.

CastingWords, podcast transcription service.

Need an Amazon account to work, which requires a credit card, and that’s their way of trying to ensure no child labour.

[I think this could possibly have been an interesting talk, but it wasn’t. I like the idea of having APIs for something like MTurk, but this guy was really dull. I guess I could cut and paste from the back channel to spice things up a bit, but that might be mean.]

Xtech 2006: David Beckett – Semantics Through the Tag

Common way to think of tags is as a list of resources. You tag something, then you get a list of stuff you’ve tagged, etc.

Another way is tag clouds. Size of the tag represents how popular it is.

Suggested tags. Discovery process for other tags that people are using that are similar to your tags.

Flickr – clustering of photos with tags that are related, also interestingness which is partly tagged-based but also takes interactions with the site from people into account.

Mash-ups – assume tag is a primary key. Works well on events such as Xtech: Technorati, Planet Xtech. Photo sites are more place/time centric; del.icio.us ones are more pic related. So further away you get from place/time centric and the more generic the tag gets, the weaker their usefulness for mash-ups is. Generic tags don’t work so well – dont work as a connection across tag space, won’t tell you anything.

Emergent tag structures

There’s no documentation because it’s so lightweight so people use it however they like. Pave the paths that people follow.

– Geo tagging, lat/long, places

– Cell tagging, from mobile phone cell towers, associated with cameraphone pictures

– Blue tagging, bluetooth devices that were in context at the time

Hierarchy

Not about creating an taxonomy, but looking at emergent hierarchy, e.g.

– programming

– programming:html or programming/html

These appeared on their own, no on is thinking of consistency. Tag system may not understand the hierarchical system, but it helps people find them.

Grouping

Bundles in delicious. Similar to Flickr sets.

So who’s stuff is tagged?

– Yours on the tagging site

– Other people’s on the tagging site

– Anybody’s

Flickr is a closed system, and you can only tag certain things like your photos and your contacts; whereas Del.icio.us is more open.

So how’s tag is it anyway?

If it was a domain name it is clear, but really, who cares. This is lightweight, don’t need to think about who owns it, just use it.

Tags are vocabularies per service, tagonomies. Each services uses different words as tabs, and may or may not be the same across services. So they can use terms differently.

What’s the point of tagging semantics?

1. for people to understand what some use of a tag means: there’s no way of finding out what a tag means without looking at it in context and figuring it out.

2. for computers to gather information about a tag, supporting #1.

What does a tag mean to someone?

– ask them? not scalable

– look it up in a canon. dictionary or encyclopaedia, but it isn’t distributed and it’s too much like hard work. need a mechanism that you can just use. don’t want anything heavyweight.

Good things

– low barrier to use, just start typing

– few restrictions on syntax

– unrestricted word space, if you were looking at it from a librarian’s point of view, Dewey Decimal is restricted to what’s defined by the system

– social description, folksonomy, can see what friends are doing, looking groups and sets, and make up your own tags

– if you have lots of tags, over time as the no. of tags increases, the descriptions merge towards an average, because any one individuals version of a description becomes less important, so over time meanings converge.

– easy to experiment, because there is no authority that says it’s not allowed.

Problems

– formalism problems: mixing types of things, names of things, genres, made up things, ambiguity, synonyms

– meaning is implicit

– power curves, nobody explains the long tail tags, so individuals meanings get lost and subdued by the mass of people tagging (this is a plus – see above – and a minus)

– naive tag mashups mix up meanings

– syntax problems – stemming, plurals. some services try to join things by ignoring spaces, plurals, caps, lower case, etc. by using natural language.

– tricky to make a short, unique tag. computer wants something unique, humans want something short and easy.

All these are the usual human-entered metadata problems.

Possible solutions

– microformats: no good hoock for software, and are read only

– web APIs: read/write but are for programmers only, not much use to 99% of tag users

– RSS: but it’s read-only, so more about me giving you stuff than getting stuff back.

Separate from service

Need to then understand the words out of context, with no service behind it.

Want

– a description

– a community

– a historical record

Answer: a wiki

– a description page

– a community of people to discuss and/or edit

– a historical record

Example, raptor tag

Raptor is a bird of prey, a hard drive, a plane, dinosaurs. So what does it mean if you tag something raptor?

So there is ambiguity. Wikipedia uses disambiguation pages to help clean up meanings from works. People can read this, but so can machines. This stuff is recording semantically, so can tell this term is ambiguous. Can also look up in Wiktionary, and can then leap across languages too.

Wikitag

– Easy to create

– record the ambiguity, and synonyms/prefered names

– microformat compatible: metadata is wiki markup, so is visible; reuse of existing format.

http://en.wikitag.org/wiki/Raptor

Defined the meaning of a tag.

– can discuss the term

– and can add disambiguation if need be

This isn’t perfect.

– discussion, needs easy-to-use threaded discussion

– wikipedia rules, e.g. NPOV and encylopedic style are not appropriate for something as lightweight as this. Needs fewer rules, maybe just ‘keep it legal’.

– Centralised. 🙁 don’t want a ‘one true way’ of doing this.

Can also add in semantic wiki mark-up.

Tagging is s social process with a gap: the place for a community to build the meaning. wiki can fill the gap.

Xtech 2006: Mikel Maron – GeoRSS

90% of information has a spatial component. Needed to agree a format.

Definitive history of RSS

– Syndicus Geographum – ancient Greek treaty for sharing of maps between city states

– blogmapper/rdfmapper – 2002, specifying locations in weblog posts, little map with red dots

– w3c rdfig geo vocabulary – 2003, came up with simple vocab on irc and published a doc, and this is the basis of geocoding RDF and RSS

– geowanking – May 2003, on this discussion list GeoRSS first uttered

– World as a Blog/WorldKit – realtime weblog geo info nabbing tools, World as a Blog looks at geotags in real time then plots them on a map so you can see who’s up too late.

– USGS 2004, started their Earthquake Alerts Feed.

– Yahoo! maps supports georss 2005

Lots has happened. Google released GoogleMaps, and shook everyone up with an amazing resource of map data, and released an API. Lots of map-based mash-ups.

OSGeo Foundation, Where 2.0, OpenStreetMap.

Format then only specified points, not lines or polygons. GeoRSS.org. Alternatives were KML used by Google Earth, rich, similar to GML, but too complicated and tied to GoogleEarth so some stuff is more for 3d; GPX, XML interchange format for GPS data, but tied to its applicaton, extensible but not so useful; and GML, Open Geospacial Consortium, and useful for defining spatial geometries, so an XML version of a shape file, but quite complicated spec at over 500 pages, and a bit of confusion on how you use it because it’s not a schema its similar to RDF, so provides geometric objects for your own schema.

OGC got involved in GeoRSS because they wanted to help promote GML. So some of GeoRSS is drawn from GML. Two types of GeoRSS: Simple and GML. Simple is a compressed version of GML. Neutral regading feed type, e.g. RSS1.0/RDF, RSS2.0, Atom.

Looking for potential to create a Microformat.

[Now goes into some detail re the spec which I’m not going to try to reproduce].

EC JRC Tsunami Simulator. Subscribed to USGS earthquake feed, ran tsunami model, and dependent on outcome, they would sent out an alter. Also had RSS feed. Produce maps of possible tsunami.

Supported by, or about to be supported by:

– Platial

– Tagzania

– Ning

– Wayfairing, Plazes

Commercial support

– MSFT announced intention

– Yahoo! (Upcoming, Weather, Traffic, Flickr may potentially use it, and Maps API

– Ning

– CadCorp

Google

– OGC member

– MGeoRSS

– Acme GeoRSS

– GeoRSS2KML

And other stuff

– Feed validator

– WordPress Plugin in the works

– Weblog

– A press release

– Feed icon

Aggregation

http://placedb.org

http://www.jeffpalm.com/geo/

http://fofredux.sourceforge.net/

http://mapufacture.com/georss/

Mapufacture, create and position a map, select georss feeds and put them together in a map. Then can do keyword searches and location searches. Being able to aggregate them together is very useful. Rails app. E.g. several weather feeds, added to a map and then when you click on the pointer on the map, the content shows up.

Social maps, e.g. places tagged as restaurants in Platial and Tagzania on one simple maps.

Can search, and navigate the map to show the area you’re interested in, then it searches the feeds and grabs everything in that location. All search results produce a GeoRSS feed which you can then reuse.

Odds and ends

– mobile device potential, sharing info about where you are

– sensors, could be used for publishing sensor data

– GIS Time Navigation, where you navigate through space and see things happening in time, e.g. a feed of events in Amsterdam which provided you with a calendar and location.

– RSS to GeoRSS converter, taking RSS, geocode place names and produce GeoRSS

Xtech 2006: Ben Lund – Social Bookmarking For Scientists

Connotea, social bookmarking for scientists.

Why for scientists? Obviously, scientists and clinicians are a core market. doesn’t exclude others, but concentrating on users with a common interest they could increase discover benefits. Hooks into academic publishing technologies.

Connotea is an open tool, is social so connects to other users, and has tags. But what it does is identify articles solely from the bookmark URLs. So it can pull up the citation from the URL – title, author, journal, issue no. page, publication date. This is important for scientists.

Way it does it is by ‘URL scanning’. So user is on a page, e.g. PubMed which is a huge database of abstracts from biomed publications. When the user clicks ‘Add to Connotea’, this opens a window, it recognises that this is a scholarly article, and imports the data.

Uses ‘citation source plug-ins’ – perl modules for each API. It asks each plug-in to see if it recognises the URL and when it does it goes and gets the information which then associates it with the bookmark in the database.

[Now runs through some programming stuff.]

Bookmarks on a lot of these scientific resources are far from clean or permanent and have a lot of session data in. So this needs cleaning off.

So what’s important? Retrieval and discovery. Already has tagging for navigation. Also has search in case there are some articles that haven’t been accurately tagged.

Provides extra link options for bookmarks. Main title links to the article, say in PubMed; but there are links to other sources for this article, e.g. to the original Nature article; plus other databases, and cross-referencing services.

System also produces a long open URL with all the bibliographic information in it.

Now … the hate.

First hate:

– poorly documented and poorly implemented data formats. Variety of different XML schema. Liberal interpretations of standards.

Second hate:

– have to do lots of unnecessary hoop-jumping to get this data. Lots of pinging different urls to get coookies, POSTs, etc.

Third hate:

– have to do everything on a case-by-case basis. have to reverse engineer each publisher’s site . have to write ad hoc rules and custom procedures for each case.

A wish

Nature release a proposal called OTMI, open text mining interface – wants to make Nature’s text open for data mining, but not the articles themselves. So researchers looking for raw XML for doing data mining research, but ever time someone asks they have to make ad hoc arrangements for each case. So OTMI does some pre-processing to make the data more usable.

Publishers could choose to be supported by Connotea and remove the need for them to reverse engineer. Publisher just puts a link through to an ATOM doc with the relevant data in so that the citation can be easily retrieved.

Blogs already do autodiscovery of ATOM feeds, so can test idea using a citation source plug-in for a blog. It works, so can treat any source as a citation, but only whilst the post is still in the RSS feed.

Another wish

Citation microformat. Connotea would work really well with a citation microformat, so is going to look into that.

Summary

How to do URL to metadata

– manual entry

– scraping the page

– recognise and extract some ID, Connotea does that, but it doesn’t scale to the whole web.

– follow a metadata link from page, this is the blog plug-in

– parse the page directly, not possible yet.

Useful not just for Nature as publishers of data, but also anyone else who wants to be discoverable and bookmarkable.

Nature blog about this, Nascent.

Xtech 2006: Tom Coates – Native to a Web of Data: Designing a part of the Aggregate Web

This is a developed version of the talk Tom gave first at the Caron Summit on the Future of Web Apps. So now you can compare and contrast, and maybe draw the conclusion that either I am typing more slowly these days, or he just talked faster.

Was working at the BBC, now at Yahoo! Only been there a few months so what he’s saying is not corporate policy. [Does everyone who leaves the BBC go to work for Yahoo?]

Paul’s presentation was a little bit ‘sad puppy’ but mine is going to be more chichi. Go to bingo.scrumjax.com for buzzword bingo.

I’m going to be talking about design of W2.0. When people think about design they think rounded corner, gradient fills, like Rollyo, Chatsum. Now you have rounded corners and aqua effects FeedRinse. All started with Blogger and the Adaptive Path group.

Could talk for hours about this, the new tools at our disposal, about how Mac or OmniGraffe change the way people design. But going to talk about products and how they fit into the web. Web is gestalt.

What is the web changing into?

What can you or should you build on top of it?

Architectural stuff

Web of data, W2.0 buzzwords, lots of different things going on, at design, interface, server levels, social dynamics. Too much going on underneath it to stand as a term, but w.20 is condensing as a term. These buzzwords are an ettempt to make sense of things, there are a lot of changes and innovations, and I’m going to concentrate on one element. On the move into a web of data, reuse, etc.

Web is becoming aggregate web of connected data sources and service. “A web of data sources, services for exploring and manipulating data, and ways that user can connect them together.”

Mashups are pilot fish for the web. By themselves, not that interesting. But they are a step on the way to what’s coming.

Eg., Astronewsology. Take Yahoo! news, and star signs, so can see what news happens to Capricorns. Then compare to predictions. Fact check the news with the deep, importance spiritual nature of the universe.

Makes two sets of data explorable by each other, put together by an axis of time.

Network effect of services.

– every new service can build on top of every other existing service. the web becomes a true platform.

– every service and piece of data that’s added to the web makes every other service potentially more powerful.

These things hook together and work together so powerfully that it all just accelerates.

Consequences

– massive creative possibilities

– accelerating innovation

– increasingly competitive services

– increasing specialisation

API-ish thing is a hippy dream… but there is money to be made. Why would a company do this?

– Use APIs to drive people to your stuff. Amazon, eBay. Make it easier for people to find and discover your stuff.

– Save yourself money, make service more attractive and useful with less central dev’t

– Use syndicated content as a platform, e.g. stick ads on maps, or target banner adds more precisely

– turn your API into a pay-for service

Allows the hippies and the money men to work together, and the presence of the money is good. The fact that they are part of this ecosystem is good.

If you are part of this ecosystem, you benefit from this acceleration. If you’re not, you’re part of a backwater.

What can I build that will make the whole web better? (A web of data, not of pages.) How can I add value to the aggregate web?

Data sources should be pretty much self-explanatory. Should be able to commercialise it, open it out, make money, benefit from the ecosystem around you. How can you help people use it?

If you’re in social software, how can you help people create, collect or annotate data?

There is a land grab going on for certain types of data sources. People want to be the definitive source. In some areas, there is opportunity to be the single source. In others, it’s about user aggregation, reaching critical mass, and turn that aggregated data into a service.

Services for exploring/manipulating data. You don’t need to own the data source to add values, you can provide people tools to manipulate it.

Users, whether developers or whomever. Feedburner good at this. Slicing information together.

Now will look at the ways to build these things. Architectural principles.

Much of this stuff from Matt Biddulph’s Application of Weblike Design to Data: Designing Data for Reuse, which Tom worked on with Matt.

The web of data comprises these components.

– Data sources

– Standard ways of representing data

– Identifiers and URLs

– Mechanisms for distributing data

– Ways to interact with/enhance data

– Rights frameworks and financial

These are the core components that we have to get write for this web of data to emerge properly.

Want people to interrogate this a bit more, and think about what’s missing.

Ten principles.

1. Look to add value to the aggregate web of data.

2. Build or normal users, developers and machines. Users need something beautiful. Developers need something useful, that they can build upon, show them the hooks like consistent urls. Machines need predictability. How can you automate stuff? E.g. tagspaces on Flickr can be automated getting those photos thus tagged.

3. Start by explorable data, not page. How are you going to represent that data. Designers think yuou need to start with user needs, but most user needs stuff is based on knowing what the data is for to start with. Need to work out best way to explore data.

4. Identify your first order objects and make them addressable. What are the core concepts you are dealing with? First order objects are things like people, addresses, events, TV shows, whatever.

5. Correlate with external identifier schemes (or coin a new standard).

6. Use readable, reliable and hackable URLs.

– Should have a 1-1 correlation with the concept.

– Be a permanent references to resources, use directories to represent hierarchy

– not reflect the underlying tech.

– reflect the structure of the data – e.g. tv schedules don’t reflect the tv show but the broadcast, so if you use the time/date when a show is broadcast, that doesn’t correlate to the show itself, it’s too breakable.

– be predictable, guessable, hackable.

– be as human readable as possible, but no more.

– be – or expose – identifiers. e.g. if you have an identifiers for every item, e.g. IMDb film identifiers could be used by other service to relate to that film.

Good urls are beautiful and a mark of design quality.

7. Build list views and batch manipulation interfaces

Three core types of page

– destination

– list-view

– manipulation interface, data handled in pages that are still addressable and linkable.

8. Create parallel data service using understood standards

9. Make your data as explorable as possible

10. Give everything an appropriate licence

– so people know how they can and can’t use it.

Are you talking about the semantic web? Yes and no. But it’s a web of dirty semantics – getting data marked up, describable by any means possible. The nice semantic stuff is cool, but use any way you can to get it done.