Economics of Taxonomies

In his latest post on folksonomies, Clay argues that we have no choice about moving to folksonomies, because of the economics. I’d like to tackle those economics a bit.

(Some background: There was recently a fascinating exchange between Clay Shirky and Louis Rosenfeld on the subject of taxonomies versus “folksonomies,” lightwieght, uncontrolled terms that users attach to things as classification. Now, as the name of my blog implies, I’m all in favor of such emergent and chaotic phenomenon as folksonomies. At the same time, some of the work I’m doing may involve the creation of a taxonomy. Worse, its a taxonomy where the items being classified are subject to a great many potential classifications, and really, a folksonomy may well be a better choice. So how to decide where to go?)

I don’t think that there is a single economics of taxonomies. We could compare effort of creation to effort of use. Flickr users create a folksonomy because its trivial to create, and the work needed to use it for tagging is also low. In contrast, the Linean taxonomy of life is the subject of a huge amount of work.
Once you’ve learned to use both Flickr and the plethora of modern library systems to search, the effort to search the Flickr site is higher than the effort to search in a library. So Flickr (and perhaps all folksonomies) offload costs from classifiers to searchers.

There’s also an economic question of the cost of failure. Flickr is not there to help you find precisely the photo you’re looking for, nor the paper or book you mean to find. It’s there to make surfing easier. If you want to see specific people’s photos, you can subscribe to their site. So the folksonomy works where there’s a very low cost of not seeing a result. Does it work as well where the costs are higher? If you’re searching for a specific book in a library, and can’t guess the tags attached to it, you can fall back to other, organized search criteria. I’m finding it hard to quantify the search failure costs here, because moving from photos to say, reference specimens of butterflys, that specimen, and its name, act as an index into all sorts of scientific work.

Another tension is speed of change. Fast changing taxa are hard to search, but easy to create. Is it worthwhile to spend the effort to enable effective searching? To whom is it worthwhile?

To relate this back to the work I’m doing, I think that the cost of failed searches may be very high. High enough to dominate? Unclear.

“Metadata for the masses”

In “Metadata for the masses,” Peter Merholz presents an interesting idea, which is build a classification scheme from free-form data that users apply. He points to Flikr’s “Cameraphone” category, which would probably not exist if there was only a pull-down list.

He also points up problems: Many categories for one thing (nyc, NewYork, NewYorkCity), one category that means many things (“Flow, for instance, can either mean optimal creative experience, or the movement of a fluid,”), and categorizations that are wrong.

I think there’s a tie here to memes, or ideas which encourage you to adapt them. If I see a tag which strikes me, is evocative to me, or I see as useful, I’m likely to use it myself. If I create a tag which I find evocative, but no one else does, (say, “Bastiat-ic”) its unlikely to get picked up. I am a big fan of evolutionary, or memetic systems like this, and am sorely tempted to try to include it in my project, but the goal of that project isn’t actually to create a taxonomy, its to create a useful naming scheme. I think a taxonomy is part of that, but others who get a say in the final analysis disagree, and so I’d like to focus on getting a taxonomic name space, rather than a cool evolutionary method for creating it.

(Via Nudecybot. Oh, and its too bad that there’s no RSS on Merholz’s page. I’d like to see their essays, but not their “appearance dates and other news.”)

The Tree of Life, COI-ly

The September 30th issue of the Economist points to an article in PLoS Biology by Hebert, et al, discussing a new technique for identifying species. The technique, which relies on mitochondirial genes for cytochrome c oxidase I (COI), which is a 648 pair gene. [1]

This technique helps settle the question of “Is Astraptes fulgerator one species or several?”[2]. The butterfly in question looks the same as a butterfly, but there are important variations in the caterpillar forms.

Which, as I strugle to create a taxonomy for a specific set of computer security issues, shows that I am doomed to fail, and that may just be ok.

[1] Who the heck told them they could throw a ‘c’ out in the midst of a protien name like that? Do these people have no respect for the English language?
[2] It was keeping me awake at night, too. (As many as 10 species in Costa Rica alone.)

Taxonomic Software

A small window into a large world, with its own software:
biological software, including DELTA, a DEscription Language for TAxonomy, database software, ecology software, morphometric, paleontologic, and phylogentics software. (Hey, I need a taxonomy just to keep the breakdowns straight!)

Or DMOZ has a page, but it doesn’t seem as comprehensive.


What I want to do is to throw keywords at database and have them organized for me. I suspected that this may be sufficiently specialized as to not have software available for it, but I’m no longer so sure.

Taxonomies

Biological taxonomy is not fixed, and opinions about the correct status of taxa at all levels, and their correct placement, are constantly revised as a result of new research, and many aspects of classification will always remain a matter of judgement. The ITIS database is updated to take account of new research as it becomes available, and the information it yields is likely to represent a fair consensus of modern taxonomic opinion. Inevitably, however its information cannot be final, and is likely to be more reliable for some groups than others.

So says Wikipedia, in discussing ITIS, the Integrated Taxonomic Integrated System. Who knew that the USDA was in charge of calling us homo sapiens?

Mathematical Classifications

Mathematicians use a scheme called the Mathematics Subject Classification, (MSC) which includes a “how to use“, as well as a long history of being revised to reflect changes in the field, and I would guess, practice in how to effectively classify things.

It has a General and Miscellaneous Topics section, too.

Articles must be given a primary classification, and may be given arbitrary additional classifications. The first article in the first volume I was published in was 54C40, 14E20 secondary 46E25, 20C20.

That’s (54C40 Algebraic properties of function spaces), (14E20 Birational Geometry:Coverings), (46E25 Rings and algebras of continuous, differentiable or analytic functions {For Banach function algebras, see 46J10, 46J15})*, 20C20 Modular representations and characters).

Google doesn’t seem to be specialized in searching these things. Those 4 numbers as a search don’t return the specific paper, but then, the specific paper isn’t online. There are search engines that are able to search by MSC. (It’s under “Class”) in that link, or try to navigate in Norwegian. I did, before finding the English link.

UPDATE: The * after the {see 46J10, 46J15} was going to be a footnote, explaining that {braces} represent prioritization–you must check to see if 46J10 or 46J15 are better fits.