The Present Failure of Tagging
This is in response to Rashmi Sinha's write-up: A Cognitive Analysis of Tagging. She makes a distinction between categorization and tagging to describe why tagging is a better approach. Categorization, she says, is an extra step in the mental process of relating a concept to other concept(s) we already know about.
To summarize:
Categorization
1) Discover concept A -> 2) Brainstorm what A relates to -> 3) Pick one (Concept X) and save the relationship between A and X.Tagging
1) Discover concept A -> 2) Brainstorm what A relates to, saving all relationships between A and X1...Xn.
What follows is a short review of the problem with tagging as it now stands, an example of the problem and brief ideas for fixing it.
The Problem
For historical reasons, categorizing has
always been an artificially constrained process. Tagging is just
another electronic way to relax these constraints. Being forced to
associate a concept with only one other, single concept is hold over
from the physical world of data persistence: I have a written/typed
piece of paper that I must store in, at most, one file folder for
future reference.
To be a useful shift in the way we store and retrieve relevant
information, tagging must be able to match the dynamic and relatable
nature of our brain. As it now stands, tagging doesn't evolve with our
changing ideas of how our saved mental landmarks (tags) relate. This makes
retrieval of relevant information based on these landmarks barely more
useful than a single category-style bookmark (even a public one).
The ability to relate electronic data seems to solve the problem, right? Databases are one iteration of this. 'Mind maps' or 'concept maps' are another method. IHMC has an interesting tool for working with concept maps.
Only recently, has the ability to quickly link a concept to more that one other concept been practical for the masses in web applications such as del.icio.us and Flickr. Now a web page can be related to as many other concepts as can be held in a text field in del.icio.us. Getting better, but still far from being truly useful.
The general problem can be seen as the task of 1) externalizing knowledge-retrieval 'landmarks' when encountering information you want to store (in some context) and then, 2) being able to quickly find these landmarks when trying to recall the information later on, potentially in a completely different context from the one in which you created the landmark in the first place. The Semantic Web and its associated technologies should go a long way to addressing this.
In Semantic Web terms, you'd need to write your own OWL document and then build a tool that uses this document to help you create and search tags. Then, you'd need to use an OWL document editor to constantly update your OWL ontology to reflect new information you discover and tag (keep it in sync with all other forms of your tag libraries). This is "ontology engineering" and no average Joe Tagger will ever do it.
I dare say, even if Joe Tagger was a computer engineer, excluding a change in his career, he couldn't do it. What follows is a concrete example of Joe's problem.
Example: The Battery Breakthrough
Let's say Joe reads a new article about a battery technology breakthrough in the Scientific American.
Joe has been thinking about buying a fuel-efficient car lately. When
Joe goes to tag the article's web page, he uses the following tags:
"battery," "fuel-savings," "car," "future-vehicle." Let's say the
article comes with a .gif of a high-level schematic for how the battery
works. Joe saves the .gif in his Flikkr account, tagging it with
"battery," "schematic," and "fuel-savings."
Eighteen months and many tags later, due to Joe's profession as an engineer at Intel, he has an electric moment and realizes the battery tech breakthrough has more relevance to something he's directly working on, in nano-tech. Given the keywords he chose, will he be able to 1) recall how he tagged the original article, to find it later on or, 2) if he can find it at all, will he be able to easily re-tag the article and the schematic .gif to match the new context in which Joe finds these ideas relevant? I wouldn't bet on either outcome.
The Solution
We need refactoring for tagging.
Refactoring is the programming activity of reevaluating and then changing the names and relationships between program subcomponents in order to more clearly express their intent and actual behavior. This makes it easier to fix, reuse, or extend these components later on. The advent of Smalltalk, then Java refactoring tools has been a boon for programmers in the last two decades and has contributed significantly to the value of working programs. Web application users need something like this for their tag libraries in del.icio.us, etc.
How does Joe Tagger find and change his del.icio.us and Flickr tags to reflect his changing view of the world? Heck, today, he can't even relate simple tags he already has. I predict doing this much isn't far off. What still is far off is the ability to quickly and efficiently, if not automatically, evolve our brave new-age 'categories' to consistently reflect our constantly changing mental model of the world in which we think we live.

Hey Benjamin, thanks a lot for this critical analysis of tagging. I think you're insights are right-on. With the current zeitgeist of "tagging" - being everybody's darling - it's useful to see the cracks around the edges so we can see in what direction we can improve things. I'm currently designing something that involves large data-sets, and tagging is going to play a part - so I appreciate this notion of "refactoring" being important for "evolving" tags in sync with the evolution of one's mental model. Very intersting. Cheers [now...what else is wrong with tagging?? :)]
Posted by: Phil Cockfield | September 28, 2005 at 06:24 PM
I think that categorization is more like a decision than tagging. To make a decision, more 'historic' information must be retrieved from Long-term memory so that it's time consuming, in another word, more cognitive cost will be paid. On the other side, tagging allows decision-delay, which means the decision could be made just in time. It's short-term memory friendly.
How do you think?
Posted by: Linan Wang | November 02, 2005 at 03:11 PM
Linan,
I agree. A decision implies a choice. And it's a tough one if you assume you must choose only one from many alternatives. Working with our current definition of categorization, this one choice must be right and remain right until some indefinite time in the future.
The probability that you've made the right choice for every future search context in which you find yourself is pretty unrealistic. This is why our bookmarks are often never used and out of date.
If, however, you're choosing n out of many, you may still make bad choices but you're more likely to make a few right ones. For this reason, 'tags' are a better memory-retrieval device than 'categories.'
But tagging, alone, is still not good enough. Even our many tags become useless if/when their meaning changes (in our minds) by the time we go retrieve the data they point to. This could be years after we tagged something. Somehow, whether manually or automatically, we need agents and tools to help us keep our tags updated and relevant.
Thanks for the commentary.
Posted by: Ben | November 02, 2005 at 03:37 PM
Benjamin,
You are right. Tagging is not good enough as a next-generation knowledge storing and representation approach while it's better than categorization. I think there are several key points in the process, which are:
1, Efficiency of storing process.
2, Efficiency of recalling process.
3, Supporting of methodology evolving.
I think Rashmi Sinha's contribution in this topic is introducing the cognitive psychology into this field. (I found the blog through keyword 'cognitive cost-benefit analysis' in google)
I guess Microsoft's WINFS and Namesys's Reiserfs are splendid attempts and worth a hack :)
Posted by: Linan Wang | November 03, 2005 at 07:44 AM