Open-ended query and tag spam

In response to yesterday's Flickr hack, Doug Marttila, who works at Visual i|o, wrote to point out a couple of Flash-based alternatives to Flickr's user interface: his own findr, also tagnautica and flappr. These are all intriguing and worth a look.

One thing I notice about all three is that they appear to rely on Flickr's tag clustering feature. So when I start with trapeze, for example, they all present the set of tags in the first trapeze cluster: nyc, newyork, manhattan, nycpb, ny, hudsonriverpark. Following that trail, you'll never get to my trapeze photos which are tagged trapeze and keene.

Open-ended querying is a lot more powerful, but needs to be exposed in a more usable way. My tagshow feature helps a little, by pulling the URL-line API up into the user interface where it will make sense to more people, and by connecting it directly to the slideshow for better visualization of results. Surfacing the tags for each photo in that viewer would close the feedback loop, enabling open-ended tag discovery to aid query refinement.

Then, of course, there's the problem of tag spam. Doug Marttila alludes to it on findr's home page. I ran into the same thing yesterday, when I cited this set of crying babies in the bath. One of the terms, bathtub or sink or bathtime, should really have been bathtub or sink or bath. When you run that query, the results do include more crying babies in bathtubs. But they also include a bunch of photos like this one, which include long lists of terms unrelated to the photo.

Of course all social software is inherently spammable. And Flickr certainly never promised us high-precision recall based on tags. But even in the current system, you can get a sense of how well it could work.

Here's a challenge for Flickr or third-party hackers: let me refine my query results by removing unreliable taggers.

Former URL: