Structured search, phase two

The next phase of my structured search project is coming to life. For the new version I'm parsing all 200+ of the RSS feeds to which I subscribe, XHTML-izing the content, storing it in Berkeley DB XML, and exposing it to the same kinds of searches I've been applying to my own content. Here's a taste of the kinds of queries that are now possible:

quotes from Dare Obasanjo

links from Tim Bray

links from Brent Simmons to InfoWorld.com

books mentioned by AKMA

books, with XQuery in the title, mentioned by Michael Rys

The paint's not dry on this thing yet. I have yet to normalize the dates, and I'm still getting the hang of DB XML, but here are some things that become immediately obvious:

Until now, I've thought the major roadblock standing in the way of more richly structured content was the lack of easy-to-use XML writing tools. But maybe I've been wrong about that. If it's going to be practical to XHTML-ize what current HTML writing tools, maybe we can make a whole lot more progress than I thought by working toward CSS styling standards that will also provide hooks for more powerful searching.

At the very least, this will be a nice laboratory in which to experiment with a growing pool of XML content, using a variety of XML-capable databases. My hope, of course, is to offer a service that's as useful to you -- the writers of the blogs I'm reading, aggregating and searching -- as it is to me. And ideally, useful to you in ways that invite you to think about how to make what you write even more useful to all of us. We'll see how it goes.


Former URL: http://weblog.infoworld.com/udell/2004/01/29.html#a901