Naive catalogers No Authority If you've got a large, ill-defined corpus, if you've got naive users, if your cataloguers aren't expert, if there's no one to say authoritatively what's going on, then ontology is going to be a bad strategy.

The list of factors making ontology a bad fit is, also, an almost perfect description of the Web -- largest corpus, most naive users, no global authority, and so on. The more you push in the direction of scale, spread, fluidity, flexibility, the harder it becomes to handle the expense of starting a cataloguing system and the hassle of maintaining it, to say nothing of the amount of force you have to get to exert over users to get them to drop their own world view in favor of yours.

The reason we know SUVs are a light truck instead of a car is that the Government says they're a light truck.

This is voodoo categorization, where acting on the model changes the world -- when the Government says an SUV is a truck, it is a truck, by definition. Much of the appeal of categorization comes from this sort of voodoo, where the people doing the categorizing believe, even if only unconciously, that naming the world changes it.

Unfortunately, most of the world is not actually amenable to voodoo categorization. The reason we don't know whether or not Buffy, The Vampire Slayer is science fiction, for example, is because there's no one who can say definitively yes or no. In environments where there's no authority and no force that can be applied to the user, it's very difficult to support the voodoo style of organization.

Merely naming the world creates no actual change, either in the world, or in the minds of potential users who don't understand the system. Mind Reading One of the biggest problems with categorizing things in advance is that it forces the categorizers to take on two jobs that have historically been quite hard: It forces categorizers to guess what their users are thinking, and to make predictions about the future.

The mind-reading aspect shows up in conversations about controlled vocabularies. Whenever users are allowed to label or tag things, someone always says "Hey, I know! Let's make a thesaurus, so that if you tag something 'Mac' and I tag it 'Apple' and somebody else tags it 'OSX', we all end up looking at the same thing!

The assumption is that we both can and should read people's minds, that we can understand what they meant when they used a particular label, and, understanding that, we can start to restrict those labels, or at least map them easily onto one another.

I learned this from Brad Fitzpatrick's design for LiveJournal, which allows user to list their own interests. LiveJournal makes absolutely no attempt to enforce solidarity or a thesaurus or a minimal set of terms, no check-box, no drop-box, just free-text typing.

Some people say they're interested in movies.

Some people say they're interested in film. Some people say they're interested in cinema. The cataloguers first reaction to that is, "Oh my god, that means you won't be introducing the movies people to the cinema people!

The movie people don't want to hang out with the cinema people. If you think the movies and cinema people were going to have a fight, wait til you get the queer politics and homosexual agenda people in the same room. You can't do it. You can't collapse these categorizations without some signal loss.

The problem is, because the cataloguers assume their classification should have force on the world, they underestimate the difficulty of understanding what users are thinking, and they overestimate the amount to which users will agree, either with one another or with the catalogers, about the best way to categorize.

They also underestimate the loss from erasing difference of expression, and they overestimate loss from the lack of a thesaurus. Fortune Telling The other big problem is that predicting the future turns out to be hard, and yet any classification system meant to be stable over time puts the categorizer in the position of fortune teller.

Alert readers will be able to spot the difference between Sentence A and Sentence B. Sentence A is a statement.

Sentence B is a prediction. But this is the ontological dilemma. Consider the following statements: They are real, physical facts. Countries are social fictions. It is much easier for a country to disappear than for a city to disappear, so when you're saying that the small thing is contained by the large thing, you're actually mixing radically different kinds of entities.

We pretend that 'country' refers to a physical area the same way 'city' does, but it's not true, as we know from places like the former Yugoslavia. There is a top-level category, you may have seen it earlier in the Library of Congress scheme, called Former Soviet Union.

There is a top-level category, you may have seen it earlier in the Library of Congress scheme, called Former Soviet Union. The best they were able to do was just tack "former" onto that entire zone that they'd previously categorized as the Soviet Union.

It was supposed to be the next big thing.

