Unified data theory

I've always regarded the Web as a programmable data source as well as a platform for the document/software hybrid that we call a Web page. Early on, programmable access to Web data entailed a lot of screen scraping. Nowadays it often still does, but it's becoming common to find APIs that serve up the Web's data.

The holistic view of that network [of databases] should be our focus. In [Kingsley] Idehen's view, you'll use something like SPARQL -- a query language for the semantic Web -- to traverse a graph of interlinked sites, and to merge interesting sources into a virtual collection. Then you'll dispatch queries to each member of that collection. They'll offer a range of query styles ranging from free text search to iteration over simple key/value pairs (accessed by way of RSS or Atom) to tree traversal (XPath, XQuery) and relational query (SQL). I think he's got it exactly right. [Full story at InfoWorld.com]

This week's column alerts the open source community to the arrival on the scene of Virtuoso, a universal server that supports a wide range of access methods and query styles. Yesterday I met with Anders Hejlsberg and Paul Vick to discuss LINQ (language integrated query), which takes apart all those access methods and query styles and then puts them back together again as a new style of data-oriented programming.

If you are a data hacker -- and what programmer isn't? -- the good times are getting ready to roll.

Former URL: http://weblog.infoworld.com/udell/2006/05/09.html#a1445