Sponsors


Blog powered by TypePad

« Hurricane of Hope | Main | Enterprise Bearware »

September 27, 2005

The Present Failure of Tagging

Categories

This is in response to Rashmi Sinha's write-up: A Cognitive Analysis of Tagging.  She makes a distinction between categorization and tagging to describe why tagging is a better approach.  Categorization, she says, is an extra step in the mental process of relating a concept to other concept(s) we already know about.

To summarize:

Categorization
1) Discover concept A -> 2) Brainstorm what A relates to -> 3) Pick one (Concept X) and save the relationship between A and X.

Tagging
1) Discover concept A -> 2) Brainstorm what A relates to, saving all relationships between A and X1...Xn.

What follows is a short review of the problem with tagging as it now stands, an example of the problem and brief ideas for fixing it.

The Problem
For historical reasons, categorizing has always been an artificially constrained process.  Tagging is just another electronic way to relax these constraints.  Being forced to associate a concept with only one other, single concept is hold over from the physical world of data persistence: I have a written/typed piece of paper that I must store in, at most, one file folder for future reference. 

To be a useful shift in the way we store and retrieve relevant information, tagging must be able to match the dynamic and relatable nature of our brain.  As it now stands, tagging doesn't evolve with our changing ideas of how our saved mental landmarks (tags) relate.  This makes retrieval of relevant information based on these landmarks barely more useful than a single category-style bookmark (even a public one).

The ability to relate electronic data seems to solve the problem, right?  Databases are one iteration of this.  'Mind maps' or 'concept maps' are another method.  IHMC has an interesting tool for working with concept maps. 

Only recently, has the ability to quickly link a concept to more that one other concept been practical for the masses in web applications such as del.icio.us and Flickr. Now a web page can be related to as many other concepts as can be held in a text field in del.icio.us.  Getting better, but still far from being truly useful.

The general problem can be seen as the task of 1) externalizing knowledge-retrieval 'landmarks' when encountering information you want to store (in some context) and then, 2) being able to quickly find these landmarks when trying to recall the information later on, potentially in a completely different context from the one in which you created the landmark in the first place.  The Semantic Web and its associated technologies should go a long way to addressing this.

In Semantic Web terms, you'd need to write your own OWL document and then build a tool that uses this document to help you create and search tags.  Then, you'd need to use an OWL document editor to constantly update your OWL ontology to reflect new information you discover and tag (keep it in sync with all other forms of your tag libraries).  This is "ontology engineering" and no average Joe Tagger will ever do it.

I dare say, even if Joe Tagger was a computer engineer, excluding a change in his career, he couldn't do it.  What follows is a concrete example of Joe's problem.

Example: The Battery Breakthrough
Let's say Joe reads a new article about a battery technology breakthrough in the Scientific American. Joe has been thinking about buying a fuel-efficient car lately.  When Joe goes to tag the article's web page, he uses the following tags: "battery," "fuel-savings," "car," "future-vehicle."  Let's say the article comes with a .gif of a high-level schematic for how the battery works.  Joe saves the .gif in his Flikkr account, tagging it with "battery," "schematic," and "fuel-savings." 

Eighteen months and many tags later, due to Joe's profession as an engineer at Intel, he has an electric moment and realizes the battery tech breakthrough has more relevance to something he's directly working on, in nano-tech.  Given the keywords he chose, will he be able to 1) recall how he tagged the original article, to find it later on or, 2) if he can find it at all, will he be able to easily re-tag the article and the schematic .gif to match the new context in which Joe finds these ideas relevant?  I wouldn't bet on either outcome.

The Solution
We need refactoring for tagging.

Refactoring is the programming activity of reevaluating and then changing the names and relationships between program subcomponents in order to more clearly express their intent and actual behavior.  This makes it easier to fix, reuse, or extend these components later on.  The advent of Smalltalk, then Java refactoring tools has been a boon for programmers in the last two decades and has contributed significantly to the value of working programs.  Web application users need something like this for their tag libraries in del.icio.us, etc.

How does Joe Tagger find and change his del.icio.us and Flickr tags to reflect his changing view of the world?  Heck, today, he can't even relate simple tags he already has.  I predict doing this much isn't far off.  What still is far off is the ability to quickly and efficiently, if not automatically, evolve our brave new-age 'categories' to consistently reflect our constantly changing mental model of the world in which we think we live.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/277049/3261112

Listed below are links to weblogs that reference The Present Failure of Tagging:

» The Present Failure of Tagging from ed costello: comments & links
Interesting essay on tagging, proposes introducing refactoring into the use of tagging: Table or Booth: The Present Failure of Tagging: To be a useful shift in the way we store and retrieve relevant information, tagging must be able to match the dynami... [Read More]

» Tags require less thought than folders from Tim Haines
Rashmi Sinha bring clarity to the discussion on why tagging is easier than filing in folders in her article... [Read More]

» Software testing series: Organizing a test suite with tags part one from Tyner Blain
Tagging is a method of organizing information that is pushing into the mainstream now through the success of sites like Flickr and Del.icio.us, and blogging software like Wordpress. We can apply this idea to managing our automated test suites. ... [Read More]

Comments

Hey Benjamin, thanks a lot for this critical analysis of tagging. I think you're insights are right-on. With the current zeitgeist of "tagging" - being everybody's darling - it's useful to see the cracks around the edges so we can see in what direction we can improve things. I'm currently designing something that involves large data-sets, and tagging is going to play a part - so I appreciate this notion of "refactoring" being important for "evolving" tags in sync with the evolution of one's mental model. Very intersting. Cheers [now...what else is wrong with tagging?? :)]

I think that categorization is more like a decision than tagging. To make a decision, more 'historic' information must be retrieved from Long-term memory so that it's time consuming, in another word, more cognitive cost will be paid. On the other side, tagging allows decision-delay, which means the decision could be made just in time. It's short-term memory friendly.
How do you think?

Linan,

I agree. A decision implies a choice. And it's a tough one if you assume you must choose only one from many alternatives. Working with our current definition of categorization, this one choice must be right and remain right until some indefinite time in the future.

The probability that you've made the right choice for every future search context in which you find yourself is pretty unrealistic. This is why our bookmarks are often never used and out of date.

If, however, you're choosing n out of many, you may still make bad choices but you're more likely to make a few right ones. For this reason, 'tags' are a better memory-retrieval device than 'categories.'

But tagging, alone, is still not good enough. Even our many tags become useless if/when their meaning changes (in our minds) by the time we go retrieve the data they point to. This could be years after we tagged something. Somehow, whether manually or automatically, we need agents and tools to help us keep our tags updated and relevant.

Thanks for the commentary.

Benjamin,

You are right. Tagging is not good enough as a next-generation knowledge storing and representation approach while it's better than categorization. I think there are several key points in the process, which are:
1, Efficiency of storing process.
2, Efficiency of recalling process.
3, Supporting of methodology evolving.

I think Rashmi Sinha's contribution in this topic is introducing the cognitive psychology into this field. (I found the blog through keyword 'cognitive cost-benefit analysis' in google)

I guess Microsoft's WINFS and Namesys's Reiserfs are splendid attempts and worth a hack :)

Post a comment

If you have a TypeKey or TypePad account, please Sign In