I say movie, you say film, our personal clouds can still work together

When you combine hundreds of calendar feeds in cities like Boston, Houston, and Seattle, the unfiltered list of things to do on any given day is too long to scroll through. Good categorization is essential. But do you rely on a top-down taxonomy? A bottom-up folksonomy? My answer is: both.

The curator of an elmcity hub can assign tags to whole feeds. All events in the feed inherit that tag. If the feed comes from the YMCA, the tag might be recreation. If it's a Meetup group for birdwatchers, the tag would be birds. That's the top-down taxonomy.

But sometimes the YMCA will show a movie. You'll be annoyed if you missed the movie because it wasn't listed in the movie category.

The iCalendar standard provides a CATEGORIES property that be used to tag individual calendar entries. That opens the way to a complementary bottom-up folksonomy. Most personal calendar clients don't use the CATEGORIES property, sadly, but some enterprise-class content management systems do.

One such folksonomy, for Houston, comes from the Harris County Public Library. It uses Evanced, a library management system that supports iCalendar's CATEGORIES property. Here are some of the categories defined in the library's current feed:

bilingual
book discussion
book sale
children's program
computer class
esl/literacy
family program
gulf coast reads - book discussion
gulf coast reads
story time
young adult/teen program

These specific tags are useful for library patrons. If you widen your view to include everything that's happening in Houston, though, you'd rather find these events associated with more general tags:

specificgeneral
bilinguallanguage
book discussionbooks
book salebooks
children's programchildren
computer classclasses, technology
esl/literacylanguage, education
family programfamily
gulf coast reads - book discussionbooks
gulf coast readsbooks
story timebooks
young adult/teen programchildren

This mapping between specific and general taxonomies, which helps the Harris Library feed serve both a local and a city-wide constituency, is a new feature of the elmcity service. Along with iCalendar filters, it helps curators achieve the best possible categorization of feeds coming from many sources, without requiring those sources to change their behavior.

This notion of category mapping is, of course, famously controversial. In Ontology is Overrated Clay Shirky questions its value:

Whenever users are allowed to label or tag things, someone always says "Hey, I know! Let's make a thesaurus, so that if you tag something 'Mac' and I tag it 'Apple' and somebody else tags it 'OSX', we all end up looking at the same thing!" They point to the signal loss from the fact that users, although they use these three different labels, are talking about the same thing.

The assumption is that we both can and should read people's minds, that we can understand what they meant when they used a particular label, and, understanding that, we can start to restrict those labels, or at least map them easily onto one another.

This looks relatively simple with the Apple/Mac/OSX example, but when we start to expand to other groups of related words, like movies, film, and cinema, the case for the thesaurus becomes much less clear. I learned this from Brad Fitzpatrick's design for LiveJournal, which allows user to list their own interests. LiveJournal makes absolutely no attempt to enforce solidarity or a thesaurus or a minimal set of terms, no check-box, no drop-box, just free-text typing. Some people say they're interested in movies. Some people say they're interested in film. Some people say they're interested in cinema.

The cataloguers first reaction to that is, "Oh my god, that means you won't be introducing the movies people to the cinema people!" To which the obvious answer is "Good. The movie people don't want to hang out with the cinema people." Those terms actually encode different things, and the assertion that restricting vocabularies improves signal assumes that that there's no signal in the difference itself, and no value in protecting the user from too many matches.

Often, though, the terms really do encode the same thing. Suppose it's Thursday evening in Houston and you want to watch a movie, or see a film, or go to the cinema. Online calendars use various categories: movie, film, cinema, movie/film, etc. Most people, most of the time, don't care about the nuances of terminology. They'd just like to see a complete list of choices in one place. For many reasons that isn't (yet) possible. The main reason is that calendars mostly don't form syndication networks, so you have to dig through many information siloes to get the whole story. I'm trying to bootstrap those syndication networks.

Once a calendar syndication network is up and running, though, category mapping does start to matter. In Semantic web mashups for the rest of us I described the work of the team that created Timeline and Potluck at MIT -- and later, at Google, the amazing Google Refine.

After fighting ontology wars in standards committees for years, what Stefano Mazzocchi and David Huynh learned was: don't fight. Let people name things however they will. Then provide intermediary mapping services that enable local namespaces to also work in global contexts.

Naming is, and should be, a matter of exquisite personal choice. But our cloud services aren't just for our own personal or organizational use. We want them to play well with others too. With intermediary services like the elmcity project's category mapper I can have my cake and eat your pastry too.