Translations, Glosses, Tags and Folksonomies


There's been some recent discussion on Slashdot and in the blogosphere on the incremental, bottom-up taxonomies ("folksonomies") created via tags in things like del.icio.us and Flickr.

Beside the fact that I've long been interested in taxonomies, I've been thinking about some of these issues recently because (a) I'll soon be implementing categories in Leonardo; (b) I've just started reading John Lee's A History of New Testament Lexicography (which, for all you New Testament Greek scholars out there, is a must read).

What does New Testament Lexicography have to do with del.icio.us tags? Read on.

When I'm explaining to people some of the challenges with translation and reading translated works (whether the New Testament or any other work), I like to use the following Venn diagram:

Consider A to be the word in the original language and the circle on the left to represent the range of possible meanings of that word. A translator chooses to translate A as the word B, with the circle on the right representing the range of possible meanings of that word.

Very few words match up between two languages. There will senses of A that B doesn't have (marked '1' above) and senses of B that A doesn't have (marked '3')

The first thing that can go wrong is the translator assuming the wrong sense of A. If the original author meant '1' then B will be a bad translation.

But even if the translator gets the sense right there is still the possibility that the reader of the translation will assume the wrong sense of B (marked '3').

This challenge arises not only in translating texts but also in dictionaries and this is where Lee's book is so fascinating. Looking up an individual word in a bilingual dictionary is subject to the same challenge, particularly if the dictionary just provides a gloss rather than a full definition. In just providing a gloss (an equivalent word in the target language) there is a risk that a user of the dictionary will take the wrong sense of the gloss.

Full definitions are generally much better, although, as Lee points out there are cases where a gloss does just fine and is even preferable. χιών is adequately defined by the gloss snow and there is no need to define χιών as "the aqueous vapour of the atmosphere precipitated in partially frozen crystalline form and falling to the earth in white flakes" (which is how one dictionary, cited by Lee, defines "snow").

In the realm of New Testament Lexicography, lexicons such as Louw and Nida's Greek-English Lexicon of the New Testament Based on Semantic Domains does an excellent job of teasing out the different senses of Greek words and making clearer which senses of corresponding English words they map to.

What does all this mean for tags? There is a tremendous practicality in tag-based folksonomies but they do suffer from many of the same problems as glossing. Perhaps the biggest issue is disambiguation. A given tag can have multiple senses.

Say I used the tag "leonardo" for my software. I'd then need to come up with a different tag if I wanted to talk about Leonardo da Vinci. If I'd talked about the latter first and chosen "leonardo" for him, I would have then needed to come up with a different tag for my software.

That doesn't sound that big a deal, but in a common tag set, it's much more difficult to coordinate that kind of disambiguation. Someone might have already started using "leonardo" for one sense and another come along and used it for another sense without realising.

In a way, the problem is that the tags are their own gloss. There's no definition of what their sense or scope is. How might one provide a disambiguated version of a tag, without adding complexity that would drive people away from using them at all? Using URIs instead of tags is, of course, the "right" thing to do (in as much as it would provide a unique identify for each sense) but it just won't fly with the majority of Flickr or even del.icio.us users.

That's why previously, I suggested wikipedia as the basis for disambiguation. Wikipedia provides an excellent platform for disambiguation, not at the level a lexicographer or translator might expect, but good enough that it would provide enough benefit for the cost in folksonomy tag disambiguation.

Also see Tag the Tags which suggests an easy way to add expressiveness to the tagging approach to classification without adding too much complexity.