More on Technorati tags

Over on Burningbird, Shelley has written a great summary/analysis of the current thinking on Technorati’s tags. It is beautifully written, sports some wonderful photographs, and is well worth reading. I’m not even going to attempt to summarise it here, because to do so would be to be like reinventing the wheel in triangular shape – pointless and nowhere near as good as the original.

The thoughts that follow are an elaboration of the comment I left on Shelley’s post, so if you read that then some of this may seem eerily familiar.

As I said on my previous post about Technorati tags, I can’t help feeling that we’re really only at the very beginning of the creation of meaningful tagsonomies and tagsonomical tools. Technorati’s implementation of tags is one step on a long road, but until we can sort by what Technorati calls ‘authority’ (but which is really a sort of popularity), pull the search results in to our aggregators by RSS, search using Boolean operands on multiple tags and do all sorts of complicated bespoke filtering, tags will remain a bit of a kludge.

Tags are, at the moment, at the ‘sledgehammer to crack a walnut’ stage, and there’s a lot of work to be done before we get it refined down to the toffee hammer stage.

A big issue is obviously implementation. People are lazy – I certainly am and I am sure I am not alone. Until we have a way to automatically tag or create tag suggestions that can be approved or disapproved by the user, we are going to have to rely on people bothering to tag their posts, and we’re going to have to put up with the way that the variable quality of their metadata affects this metadata-reliant system.

Of course, we have movement in that direction in terms of the various tagging tools which have sprung up with impressive rapidity. Ecto supports tags using the Custom Tag facility – just create a custom HTML tag with the code below and it will automatically create a tag from the selected text.

<a href="http://technorati.com/tag/%*" rel="tag"></a>

Stephanie Booth has created a plug-in for WordPress, and there is of course the Oddiophile bookmarklet I have mentioned previously. All good starts, but they still require the blogger to bother using them and think clearly about which tags are relevant. As Shelley and others have noted, people are not necessarily very good at creating accurate tags – even people knowledgeable in the area of taxonomy and metadata don’t always create good tags for their own work.

That said, I think there are a few uses for which tags, even as they stand, beat every other system hands down, and one of those is classifying posts by language. At the moment, there really isn’t a consistent way to mark blogs or blog posts by language and that makes it very difficult if one is interested in finding blogs in a given tongue.

If I wanted to find blogs written in Welsh, then I have a bit of a challenge ahead of me. I can search in Google for ‘blog cymraeg’ but all that gives me are blog posts which use the word ‘cymraeg’, so if the post is in Welsh but doesn’t mention the word ‘cymraeg’ it’s not going to show up. For more popular languages, I can choose which language Google should search in, but that still means I need to pick some keywords to search on.

There is a similar problem even with specialised blog search engines, including the keyword search on Technorati – they all search content. I’m no metadata expert, but I see a clear difference between metadata that describes the contents of a post, i.e. what it is about, and metadata that describes the format of the post, such as what language it is in.

By allowing people to add format metadata, tags give bloggers the power to describe aspects of their posts that would not be accurately reflected by keywords selected from the content. Tagging all Welsh posts with ‘Cymraeg‘, for example, allows anyone interested in Welsh blogging to locate the most recent posts in that language, regardless of what those posts might be about.

Using tags to make up for this shortfall in existing blog metadata, we can then use Technorati as an engine for discovery (as opposed to search) within a set of given criteria. At the moment there is just no other way to do this.

Tags may be a bit kludgy at the moment, but because they are capable of filling a gap in the way we locate blog posts that may be of interest, I think they are going to be with us for the long haul.

, , , , , , ,

7 thoughts on “More on Technorati tags

  1. Great thought on the language potential of Tags. I can see this being really useful on the Tags site (I can’t read Welsh).

    Alex.

  2. But, if one were to invent a triangular wheel, it wouldn’t be pointless… it would be quite pointy, in fact. (Just not terribly useful.)

    Heh.

  3. Aaah I set ’em up… 😉

  4. I’m a bit slow to all the RSS stuff, so forgive me if this is just too newbie…

    But there is a meta-tag in the header for you to identify your document’s language. Does Welsh not have a code for this?

    Perhaps a bit of background here: http://community.roxen.com/developers/idocs/rfc/rfc1766.html

    —-
    Ok I couldn’t help but find the solution to your Welsh searching woes… Welsh is an option for the language meta-tag when you code you page. You can see a full list of languages at:

    http://www.oasis-open.org/cover/iso639a.html

    —-

    So including the tag

    in the header section should identify your page as Welsh. Obviously the usefulness of the search will depend on who uses the tag. And, moreover, on Google adding two or three lines of code to it’s advanced search so that it supports the language…

    I guess, getting to my point (apologies for taking awhile to get there): there already is a method of tagging each page for a wide list of languages (and it includes codes for private-use if you don’t want to mess with IANA or ISO or W3C/WTF). Is there perhaps a useful way to take advantage of this?

    Cook up a search engine that is focused on a full implementation of understanding the language codes that are already in use and develop a truly multi-lingual/global-reach version of Google? Especially for obscure languages? Someone out there must be in search of a graduate thesis.

  5. For an HTML document, yes, you can use metatags to describe the language of that document using the language code – cy in the case of Welsh.

    With some blog software that uses templates, like Movable Type or Blogger, it would be possible to insert the metatag into the template, and some hosted blogs allow you to add custom metatags to the header. But in both cases that indicates that the whole blog is written in the given language(s), and doesn’t indicate which post is in which language.

    There would be no point inserting the metatag by hand into the HTML of every post you wrote, because it would then be in the body of the page, instead of the header, and not readable by search engines as a metatag.

    One could use divs in the way that Stephanie Booth’s ClimbToTheStars (http://climbtothestars.org/) does, e.g.:

    <div class=”post” lang=”fr”>
    <div class=”post” lang=”en”>
    <div class=”other-excerpt” lang=”fr”>

    But that is not an answer for everyone – these sorts of fixes just don’t have legs for two reasons:

    1. You need to be capable of coding your own fix for your own blog software, or using software that someone else has created a plug-in for. That rules out using hosted blog platforms such as Typepad or Blogware that are not amenable to third-party plug-ins.

    2. Once you have sorted out some way to insert the relevant language metadata, there is no way for people to get any serious use from it. I don’t know of any search engine that has a comprehensive list of languages within which one can search, and specialised blog search tools have yet to address the issues of multilingual (or, in many cases, even non-English) blogging.

    What we need is for the main blog software and hosting providers to use existing metadata standards to add in the ability to metatag individual posts (and excerpts) by language, as well as being able to pick multiple languages for metatagging the blog as a whole. (Mine would need En and Cy, for example, with the occasional post in Pl.)

    We then need the search engines and blog search tools to pick up on this and provide the ability to search in any language.

    Trouble is, I just can’t see that happening any time soon. Why? Well, most blogging software is created by monoglot English developers for a monoglot English audience and therefore the incentives to develop multilingual support are few. They already have other features to develop that are more important to a bigger number of people. I’m not excusing them, but that’s just how I suspect it works.

    This is why tags are powerful – they are here, now, and they are easy to use.

    Of course, if someone wants to cook up a decent search engine that can make full use of the language metatags, then I’m all for it. I could use a search engine like that. But for the moment, our choices remain limited.

  6. First, the restrictions placed upon me by my TypePad account prevent me from modifying the HEAD section of the HTML of my web pages, and the META tag that indicates the content type of the page MUST be placed inside the HEAD.

    Second, how do I indicate that a BLOCKQUOTE contains Welsh, while the article contains English?

  7. Of course, the emergent properties of the blogosphere will help even in the absence of well-formed meta-data: people who blog with particular language combinations will get known in the cultural contexts they write for simply by dint of their interactions with other bloggers and readers.

Comments are closed.