2005-10-10

You are not alone

I finally came around to reading The Structure of Collaborative Tagging Systems by Scott Golder and Bernardo A. Huberman (via You're It via Ed Vielmetti), and even if it contains some interesting observations, I was left with the feeling that they didn't really grasp the collaborative power of these systems very well.

Here are a few notes on some statements I disagree with:

From 2.1 Semantic and Cognitive Aspects of Classification:

Synonymy, or multiple words having the same or closely related meanings, presents a greater problem for tagging systems because inconsistency among the terms used in tagging can make it very difficult for one to be sure that all the relevant items have been found [1]. It is difficult for a tagger to be consistent in the terms chosen for tags [2]; for example, items about television may be tagged either television or tv. This problem is compounded in a collaborative system, where all taggers either need to widely agree on a convention, or else accept that they must issue multiple or more complex queries to cover many possibilities [3]. Synonymy is a significant problem because it is impossible to know how many items “out there” one would have liked one’s query to have retrieved, but didn’t.


[1] This point is raised over and over again, and in my opinion it's just plain wrong. If you tag some item tv, someone else will tag it television - or vice versa, so the item will be found using either term. One probably could argue that if an item is bookmarked by very few people, one might miss relevant results, but so what. As items become popular (which is an implicit indicator for being relevant) they will be found.

[2] Why is it difficult? del.icio.us provides you with a list of tags you previously used and with another list of tags other people used. And even if you do tag inconsistently, it is very easy to homogenize your tags later by renaming and/or merging them.

[3] Widely agreeing on a convention is not only impracticable, it's actually also bad advice, since this would eliminate any special meaning individuals might add to the cloud. Again, if one user doesn't apply a term, another user will, so there usually is no need for multiple or complex queries for finding stuff.

Relatedly, plurals and parts of speech and spelling can stymie a tagging system. For example, if tags cat and cats are distinct, then a query for one will not retrieve both, unless the system has the capability to perform such replacements built into it.


Same as above (someone will use cats, and anotherone will use cat, so there is no need for tag-stemming), but that's even worse advice, since such a system would blur the meanings one might associate with each tag for ones personal use.

For the purposes of tagging systems, however, conflicting basic levels can prove disastrous, as documents tagged perl and javascript may be too specific for some users, while a document tagged programming may be too general for others.


This is not disastrous, quite to the contrary: this is one of the qualities of social tagging systems. Basic leveled tags will be used by folks who don't draw a distinction, specific leveled tags by folks who do, and everybody is happy. Again they seem to forget that you are not alone.


From 3.2 User Activity and Tag Quantity

That is, some users use Delicious very frequently, and others less frequently...

Some users have comparatively large sets of tags, and other users have comparatively small sets...


duh

Because sensemaking is a retrospective process, information must be observed before one can establish its meaning (Weick et al. forthcoming). Therefore, a distinction may go unnoticed for a long time until it is finally created by the individual, who then continues to find that distinction important in making sense of future information. Since finding previously encountered information is extremely important (Dumais et al. 2003), this is deeply problematic for past information. For example, user #575 (Figure 4a) did not use “tag 3” until approximately the 2500th bookmark. If ‘tag 3” indeed constitutes a new distinction among the kinds of items this user bookmarks, though Delicious does allow users to alter previous bookmarks, it would be arduous to reconsider each of the earlier 2500 bookmarks to decide whether to add “tag 3” to them. Further, if in the future this user needs to filter his bookmarks by “tag 3”, then no bookmark before the 2500th will be retrieved, compromising the practical usefulness of the tag.


Here they raise a good point, but they underestimate the cognitive abilities of taggers of being able to reflect their own tagging behaviour of the past. Anyone who started adopting the tag ajax for instance at some point, still will be able to retrieve relevant items by searching for javascript and/or xmlhttprequest and/or whatever previously seemed to be the most appropriate. Does this compromise the practical usefulness of ajax? I don't think so.


Overall this is an interesting paper of course, so if you are into tagging and del.icio.us (which you are, since you are still reading this), head over and read it.


[] [] [] - trackback

Comments: Post a Comment

<< Home