January 6th, 2005


Categorisation - a few thoughts

Various sites, including bookmarking site del.icio.us, email site GMail, photo site Flickr, and our very own Livejournal Photo Hosting allow users to categorise the items they store. This is generally done through the use of 'tags', allowing us to label photos as "Family", emails as "To-Do" and bookmarks as "Porn" so that we can find what we need when we need it.
Rather than impose arbitrary 'top-down' categories on users these sites allow us to define our own tags, to use the labels that we would naturally use, making it much simpler to use the filing system we create.

However, there are problems with allowing this. The tags can conflict both with other users (for instance one user could use "Mac" to refer to items related to Apple Macintoshes while another uses "Macintosh" and a third uses "Apple") and with themselves (when a user's nomenclature changes or they mistype). This can make sharing information difficult and even make it hard to find all of the information you've stored yourself.

There are a few obvious solutions to this:
1) Reuse: Help the user to re-use their old tags by offering them a list of previously used tags - this will prevent typos and unintentional changes.
2) Synonyms: Help users to lump tags together by stating that "Mac" and "Macintosh" mean the same, as far as they are concerned. When they look for tags in the same category as "Mac" the search will automatically be broadened to include similar ones.
3) Build categories from the most commonly used tags. This returns to the top-down imposition of structure, but builds it from the tags that people actually use. If a tag is used by more than x% of the population then categorise it and assign it a detailed description. For instance, if more than 1% of people are using "Mac" as a tag, then "Apple Macintosh Computer" could be assigned as a detailed description. Users could then choose to use the 'official' tag. Synonyms would also exist, so that "Macintosh" and "Apple" would both link to this single 'anchor'.

The use of more-defined descriptions would allow multiple meanings for the same tag to exist, so that someone using "Apple" as a tag could be offered the choice of attaching that tag to the definition "Apple Macintosh Computer", "Apple Fruit" or "Apple Music Corporation". The user could obviously also attach it to any other definition or leave it definitionless.

I am, of course, assuming that most people would find utility in using common definitions, as it would allow them to find things that used the same tags, whilst leaving them the freedom to use any tag they like for their own use.