Last year I created an intermediary service to reorganize the output of InfoWorld's Ultraseek search engine. The InfoWorld power search[1,2] wound up serving three purposes. First, it deduplicated the results and organized them by story type. Second, it made searches subscribable by way of RSS. And third, it used the OpenSearch RSS extensions to integrate with Amazon's A9. That was all good, but more metadata was needed to do a better job.
We've since moved to a structured format for our HTML page titles, and I've just updated the power search to exploit the extra metadata. As a result you can now see, and sort by, authors and publication dates as well as story types. And when available, tags are included. So, for example:
all results containing intrusion detection
just reviews containing intrusion detection
an RSS feed for reviews containing intrusion detection
It's a big improvement over the first version, but there's always more to do. For starters, I was wrong when I said last week that search engines can't discriminate between the core of an article and its templated periphery. Some can, including Atomz and, as it turns out, Ultraseek. David Schnepper, who's with Ultraseek, informs me that you can exclude peripheral content by wrapping content with <!--startindex--> and <!--stopindex-->. I'd like to give that a try and see what difference it makes. Wider support for this kind of granular exclusion, in Google and elsewhere, would seem like a useful thing.
More broadly, I'd like to find a way to merge the search feature of this service with the navigational flavor of the infoworld explorer. I'm still not clear how best to combine the two styles. But my hunch is that there's a way, and I plan to keep noodling on the problem. Meanwhile, this is a step in the right direction.
Update: Turns out we were using startindex/stopindex, just not for blogs. This update should fix that for mine.
Further update: See also: Information architectures: print versus online.
Former URL: http://weblog.infoworld.com/udell/2006/02/13.html#a1386