Xtech 2006: David Beckett – Semantics Through the Tag

Common way to think of tags is as a list of resources. You tag something, then you get a list of stuff you’ve tagged, etc.

Another way is tag clouds. Size of the tag represents how popular it is.

Suggested tags. Discovery process for other tags that people are using that are similar to your tags.

Flickr – clustering of photos with tags that are related, also interestingness which is partly tagged-based but also takes interactions with the site from people into account.

Mash-ups – assume tag is a primary key. Works well on events such as Xtech: Technorati, Planet Xtech. Photo sites are more place/time centric; del.icio.us ones are more pic related. So further away you get from place/time centric and the more generic the tag gets, the weaker their usefulness for mash-ups is. Generic tags don’t work so well – dont work as a connection across tag space, won’t tell you anything.

Emergent tag structures

There’s no documentation because it’s so lightweight so people use it however they like. Pave the paths that people follow.

– Geo tagging, lat/long, places

– Cell tagging, from mobile phone cell towers, associated with cameraphone pictures

– Blue tagging, bluetooth devices that were in context at the time

Hierarchy

Not about creating an taxonomy, but looking at emergent hierarchy, e.g.

– programming

– programming:html or programming/html

These appeared on their own, no on is thinking of consistency. Tag system may not understand the hierarchical system, but it helps people find them.

Grouping

Bundles in delicious. Similar to Flickr sets.

So who’s stuff is tagged?

– Yours on the tagging site

– Other people’s on the tagging site

– Anybody’s

Flickr is a closed system, and you can only tag certain things like your photos and your contacts; whereas Del.icio.us is more open.

So how’s tag is it anyway?

If it was a domain name it is clear, but really, who cares. This is lightweight, don’t need to think about who owns it, just use it.

Tags are vocabularies per service, tagonomies. Each services uses different words as tabs, and may or may not be the same across services. So they can use terms differently.

What’s the point of tagging semantics?

1. for people to understand what some use of a tag means: there’s no way of finding out what a tag means without looking at it in context and figuring it out.

2. for computers to gather information about a tag, supporting #1.

What does a tag mean to someone?

– ask them? not scalable

– look it up in a canon. dictionary or encyclopaedia, but it isn’t distributed and it’s too much like hard work. need a mechanism that you can just use. don’t want anything heavyweight.

Good things

– low barrier to use, just start typing

– few restrictions on syntax

– unrestricted word space, if you were looking at it from a librarian’s point of view, Dewey Decimal is restricted to what’s defined by the system

– social description, folksonomy, can see what friends are doing, looking groups and sets, and make up your own tags

– if you have lots of tags, over time as the no. of tags increases, the descriptions merge towards an average, because any one individuals version of a description becomes less important, so over time meanings converge.

– easy to experiment, because there is no authority that says it’s not allowed.

Problems

– formalism problems: mixing types of things, names of things, genres, made up things, ambiguity, synonyms

– meaning is implicit

– power curves, nobody explains the long tail tags, so individuals meanings get lost and subdued by the mass of people tagging (this is a plus – see above – and a minus)

– naive tag mashups mix up meanings

– syntax problems – stemming, plurals. some services try to join things by ignoring spaces, plurals, caps, lower case, etc. by using natural language.

– tricky to make a short, unique tag. computer wants something unique, humans want something short and easy.

All these are the usual human-entered metadata problems.

Possible solutions

– microformats: no good hoock for software, and are read only

– web APIs: read/write but are for programmers only, not much use to 99% of tag users

– RSS: but it’s read-only, so more about me giving you stuff than getting stuff back.

Separate from service

Need to then understand the words out of context, with no service behind it.

Want

– a description

– a community

– a historical record

Answer: a wiki

– a description page

– a community of people to discuss and/or edit

– a historical record

Example, raptor tag

Raptor is a bird of prey, a hard drive, a plane, dinosaurs. So what does it mean if you tag something raptor?

So there is ambiguity. Wikipedia uses disambiguation pages to help clean up meanings from works. People can read this, but so can machines. This stuff is recording semantically, so can tell this term is ambiguous. Can also look up in Wiktionary, and can then leap across languages too.

Wikitag

– Easy to create

– record the ambiguity, and synonyms/prefered names

– microformat compatible: metadata is wiki markup, so is visible; reuse of existing format.

http://en.wikitag.org/wiki/Raptor

Defined the meaning of a tag.

– can discuss the term

– and can add disambiguation if need be

This isn’t perfect.

– discussion, needs easy-to-use threaded discussion

– wikipedia rules, e.g. NPOV and encylopedic style are not appropriate for something as lightweight as this. Needs fewer rules, maybe just ‘keep it legal’.

– Centralised. 🙁 don’t want a ‘one true way’ of doing this.

Can also add in semantic wiki mark-up.

Tagging is s social process with a gap: the place for a community to build the meaning. wiki can fill the gap.