How metafeeds will lead the way to RSS nirvana. Maybe.

I have blogged before about RSS overload, the problem of simply having too many feeds in your aggregator to be able to read them all. Now Bill Burnham gives it a name, Feed Overload Syndrome, and discusses how “RSS threatens to sow the seeds of its own failure by creating such a wealth of data sources that it becomes increasingly difficult for users to sift through all the “noise” to find the information that they actually need.”

He then describes the problem in detail and discusses possible solutions. Syndicating the results of keyword searches instead of actual blogs, he says, is not an ideal approach for three reasons: many RSS feeds are excerpt not full post, thus preventing comprehensive indexing; keyword searches become less effective the more data you index; keywords can have multiple meanings which produce noise in the results.

The new Technorati tag system is also ‘fundamentally flawed’ in his view:

The problem at the core of tagging is the same problem that has bedeviled almost all efforts at collective categorization: semantics. In order to assign a tag to a post, one must make some inherently subjective determinations including: 1) what’s the subject matter of the post and 2) what topics or keywords best represent that subject matter. In the information retrieval world, this process is known as categorization. The problem with tagging is that there is no assurance that two people will assign the same tag to the same content. This is especially true in the diverse “blogsphere” where one person’s “futbol” is undoubtedly another’s “football” or another’s “soccer”.

I agree that this is a big problem with tagging, if what you are aiming to achieve is a flawless, cross-referenced database of blog posts. In an ideal world, that would be nice, but this is not an ideal world and people are used to the internet not working quite right. Users learn how to rephrase their search terms to improve results and once Technorati allow for more complex tag searches or starts to produce clustered search results then semantic issue becomes less important. (Although I doubt they will ever become irrelevant regardless of what is done.)

Instead, Bill Burnham believes that the way to RSS nirvana is through the use of metafeeds – “RSS feeds comprised solely of metadata about other feeds”.

Combining meta-feeds with the original source feeds enables RSS readers to display consistently categorized posts within rich and logically consistent taxonomies. The process of creating a meta-data feed looks a lot like that needed to create a search index. First, crawlers must scour RSS feeds for new posts. Once they have located new posts, the posts are categorized and placed into a taxonomy using advanced statistical processes such as Bayesian analysis and natural language processing. This metadata is then appended to the URL of the original post and put into its own RSS meta-feed. In addition to the categorization data, the meta-feed can also contain taxonomy information, as well as information about such things as exact/near duplicates and related posts.

RSS readers can then request both the original raw feeds and the meta-feeds. They then use the meta-feed to appropriately and consistently categorize and relate each raw post.

The benefits of using metafeeds as outlined by Bill look great. You would be able to find related documents, eliminate duplicates, create custom taxonomies, combine metafeeds and have your information “consistently sorted and grouped into meaningful categories”.

I have to admit, that sounds great. It would be wonderful to be able to create complex search strings and to get a feed back from the web that would contain only relevant posts and no duplicates. It would indeed be a form of RSS bliss.

It won’t, however, solve the problem of RSS overload – it is likely that it will just make it worse. Bill’s fix is a technical solution to a non-technical problem, and as such it is only half a fix.

We have always lived in a world where there was more information available than any one person can comprehend, but before email, the internet, blogs and RSS feeds, the limiting factor was not the existence of the information but gaining access to it. The form of the information limited the speed with which it could be accessed: having to go to a library, find the right book or journal, turn the pages, reading them one by one; gaining an introduction to an expert, persuading them to sit down with you and discuss the matter at hand; or doing empirical studies in order to reveal the information sought. It all took time.

Now the data we seek is easily accessible and the problem has shifted – it’s not finding information that’s the issue, it’s finding the right amount of the right information. The limiting factor is no longer access but discrimination. There is so much information available that it’s hard to know which bits to trust.

Anyone who paid attention at university learnt that the way you do library research is to cross reference your sources – you can’t trust one single source to be telling the truth so you learn to triangulate. The more sources that tell you that zebras are black and white, the more you believe it. Then you learn to weight your sources by credibility and reputation. If Learnéd Academic Journal tells you that zebras are black and white, then you feel confident that all other sources are going to agree with that, and it’s easier then to discount the Tabloid Freakshow Magazine article that claims to have discovered a purple zebra.

That’s basic research methodology. Cross reference. Consider the source. Keep a bibliography. And it’s a hard, hard habit to break, even for people who didn’t know that they were doing it.

RSS overload is partly to do with trying to triangulate the ‘truth’ from too many sources. There are many blogs devoted to Macs, for example, and the urge is to read them all to see what each one is saying, to compare the information in order to draw some conclusion as to what is most likely to be true. In blogging, there really aren’t any Learnéd Academic Journal-type sources with the sort of standing that allows you to immediately trust them. There are many reliable blogs written by many well-informed people, but it is difficult to tell which they are until you have completed your triangulation, reached your own conclusion and found that it syncs with what your now trusted blog tells you.

Of course, this is not necessarily a bad thing, as many previously trusted data sources are being shown to be less than trustworthy, but we do have to recognise that this whole process of building up a list of trusted blogs takes time and effort. Although to some degree trust can be passed on to other readers through word of mouth recommendations, we are still doing more work to locate trusted sources than we used to.

Another problem not solved by Bill’s metafeeds is that of completism. If you’ve ever met a rabid collector of stuff then you have probably met a completist, someone who just can’t bear not to have every last Star Wars toy, or every last scrap of Elliott Smith memorabilia. That’s what makes collectors collectors.

Many bloggers are completists too – information completists. To go back to the Mac example, you may rapidly decide which feeds are most reliable and which are mainly talking rubbish, but that doesn’t mean you are going to delete the rubbish feeds from your aggregator because there is the possibility, however slim, that they might just break the rumour of the G5 PowerBook that you’ve been desperately waiting for all these months.

Then there are the long link trails left for us to follow when we are researching our next post. You come across an interesting post, it contains links, which you follow, and then that contains more links which seem relevant so you follow those too… and then you check Technorati and read the posts you find there, and they lead to more and more posts and before you know it you’ve spent a day researching a blog post that is only two paragraphs long.

Information completism is dangerous – it leads to chronic information overload and can turn into a form of ‘legitimate procrastination’. Because link trails are convoluted and potentially exceedingly long, it’s easy to over-research instead of actually get on with the post.

The only cure is to accept that we are human and flawed and we cannot possibly know everything about everything. We can’t even know everything about one thing, because there is too much to know, too many perspectives to take on board, too many angles to look at it from. We cannot and should not attempt to read every post and comprehend everyone’s point of view on a subject.

Instead we should refine our lists of sources down to a few trusted writers, and let the rest go. Is the Mac idiot whose blog makes you fume really going to break news about a new G5 PowerBook? No. Ditch it. Is reading every post about RSS really going to make your post about RSS overload any better? No. Read what you need then get on with the writing.

If anything, Bill’s metafeeds could well add to the problem of RSS overload by adding more sources to the mix. Instead of cutting down the number of feeds people try to read, it will add to them by providing alternative concretions of data which supplement existing sources rather than supplant them. This is because of the third flaw in his plan – blogs are social, and his fix is technological.

Most of the blog feeds I read on a daily basis I read for social reasons rather than informational reasons. I have 56 feeds in my ‘friends/dailies’ group in NetNewsWire, another ten under ‘acquaintances’. None of these feeds have anything to do with information per se. They could not be replaced by any sort of keyword search and metafeeds would be simply irrelevant in this context. I read them because I want to know what these people are up to – they are friends or people I wish were friends.

But even here, where you would think that the territory is fairly well defined, there is a problem of bloat. Social networking is great, it allows you to meet a whole bunch of interesting people you would never otherwise have met, but widening your social circle also means you have more friends and acquaintances to keep up to date with. Whilst individuals may not expect you to read their blog, (indeed, I remain in a state of permanent surprise that anyone reads any of my blogs at all), there remains a nebulous feeling that one really ought to. I’m now connected to a ludicrous number of people, and in all honesty there is no way I can read everyone’s blog.

The problem of RSS overload is not completely technological and a technological fix will not work. Instead it is partly technological, partly cultural, partly social, and partly down to our own personality quirks and habits. Metafeeds may help us find more relevant information more easily, but they won’t cure the information overload problem. Only we can do that, by cutting down on the number of feeds we read, the number of tabs we leave open in Firefox, and the number of people whose blogs we follow.

6 thoughts on “How metafeeds will lead the way to RSS nirvana. Maybe.

  1. A wonderful post, Suw.

    I particularly like the distinction between social, news and informational reading. The emphasis on minimising the information overload to improve the signal-to-noise ratio is particularly good advice.

    I think the most promising approach is to find an expert blog reader with time on their hands to check corners of the web you don’t have time to and report back their insights. There’s a number of bloggers I use for this and others who would be perfect if they weren’t heavily slanted by corporate or political sponsorship.

    Whilst I agree this is not a technological problem there are some technology enhancements that could help. However, I think they will involve reducing the information architecture rather than expanding it.

    For example, I really wish Bloglines would have a “filter out” function so I can skip posts in certain genres by bloggers I like. e.g. I unusually read Brad De Long’s weblog for everything but the economics and stuff on US Social Security etc.

    I also think there will be an eventual role for surrogate machine readers who filter content based on your reading patterns. Right now the efforts I’ve seen require too much interaction, are too coarse (usually at the blog rather than post level) and are fairly dumb in their suggestions.

  2. Uh…wow. There’s so much here it’s going to take a while to absorb, but a couple of things spring immediately to mind.

    I may be misreading this, but a technology fix like metadata tagging, forces us back into fitting our data into someone else’s taxonomies, instead of using tags that make sense to us. To cover all the bases, you have 10 Techno-tags. It seems awkward and counterintuitive, but it opens the way to many more possible connection. That’s one reaction.

    Another comes from my need for serependity. Finding things I didn’t know I was looking for. Or needed to know. Right now, it’s provided by link trails and 15 open tabs in my browser, while I’m monitoring email, my aggregator is checking the 70 or so blogs there, and iPodder is downloading podcasts.

    I get rid of stuff as I find human “aggregators” who have built a level of credibility (results over time) as a source for stuff that interests me (internet, journalism, Mac, whatever), and who also deliver the serendipity hits. I don’t need more tech: I need more trusted voices.

    Thanks for a deep, deep post.

  3. Hi,

    I am developing a tagging searchengine for websites ( – beta – beta)and my remark is:

    Don’t we need te ‘overload’ to learn and cross boarders ? In that overload are the things who are not related to any off the cross posts. That’s were the new stuff is…. ?

  4. Thanks indeed for a very interesting and insightful article. I somewhat agree to henkjan’s comment that there is a possibility that the human ability to cope with the ever-growing masses of information on the net may improve significantly in the future.

    What I mean is that we are already able to ‘digest’ much more media content today – audio, video and text – than the generation of our grandparents would have been able to cope with, because we’ve been conditioned to intuitively filter information. Maybe the stage we are in now
    is somewhat similar to the time after the advent of television – people are fascinated by the new medium and ‘overload’ themselves with too much information because they haven’t learned their own limits (and I’m not talking about ‘the internet’ here, but about rss and blogging – the internet is too versatile a platform to be even called just one medium).

    Possibly future generations will be able to digest much more that we can and at the same time be able to discard information that goes over the personal theshold more effectively than we do today.

    Either way, I absolutely agree that this is a social or intellectual problem and not a technical one. Right now we’re a bit like the wizard’s apprentice – we have all the tools but maybe we don’t know how to use them effectively yet.

  5. Hrmm… the technology just empowers us to ignore even more. I think that’s truly revolutionary. It will give people time to focus on topics they want, and situations they can actually affect.

  6. Seyed: thanks! I agree that finding a trusted expert who can scour the net for news and insights is certainly a good way to cut down on your feeds. It takes time and effort to decide which bloggers can be trusted and are experts, but one needs to remember to shed the dross once the trusted sources have been located.

    I agree that it would be handy if we could filter feeds based on category. There is no good reason why aggregators couldn’t provide that sort of filtering functionality as much blogging software includes category in the RSS. Of course, not all blogs have categories (e.g. Blogger), and not all feeds include category data, but in theory it could be done. We just need to hassle the aggregator programmers until they do it.

    Mark: Yes, metafeeds would force us to use someone else’s taxonomy, but it’s swings and roundabouts – at least that taxonomy would, I presume, be consistent. But the use of that taxonomy is why I think that metafeeds would provide additional sources of information, rather than become a replacement for existing sources.

    However, I’d like to point out that talking about metafeeds and an all-encompassing taxonomy for it is a far cry from actually having these tools. Building it all would be problematic at best, in my opinion.

    You’re also right about serendipity being important. This is one reason why I like the Technorati tags – they allow discovery rather than constrain one by search.

    henkjan: Depends what you mean by ‘overload’. On an individual basis, we need a diverse range of input, because it is at the boundaries between areas of expertise that innovation happens, but what we don’t need are an overload of synonymous sources. Having a folders of dozens of feeds about Macs, for example, doesn’t help us to come up with new ideas, it just swamps us with too much to read.

    However, collectively, overload can cause a jump in technology in order to deal with it. An overload of blogs lead to RSS aggregators. Now it’ll be interesting to see what is developed because of the overload of RSS feeds. Tags is one innovation, but I am sure there will be more.

    CP: On a generation-to-generation basis, yes, I think humans are dealing with more non-personal information than every before, and our children will undoubtedly be capable of processing more than we do.

    However, I do wonder sometimes what we have sacrificed. As individuals, we know less about our natural environment than ever before, we interact less with our local community and have less input from extended families. How many people in information-based societies now know how to find food without the intervention of a supermarket?

    I think what we’ve done is just replace one sort of information with another sort of information, and it takes us time to learn new filters to deal with this new data. Future generations may be able to deal with more info than us, but what will they sacrifice in order to do so?

    Thomas: Yes, I think that’s the nub of it!

