Mining the intranet

Of course sites such as Amazon and Google have reasons to create formal APIs and gate access to them. But on an enterprise intranet the threat is disuse, not overuse. You're publishing information that you want people to find, exploit, and recombine. When it's appropriate to use SOAP and WSDL -- for example, when queries require fancy authorization or complex inputs -- then do so. But when a simpler strategy will suffice, don't be ashamed to use it. Between the primordial tag soup of HTML and the formal realm of Web services lies a large and fertile middle ground: XHTML. Information that you publish in XHTML can be directly consumed by browsers, and it's much friendlier to spiders than ill-formed HTML. If you hope people will mine your intranet, make the job as easy as it can be. [Full story at]
I sometimes worry that I harp too much on these kinds of simple home truths. But Mike Champion's review of my XML 2003 keynote was a nice bit of validation:

Jon Udell gave a keynote speech on Tuesday that pierced the jaded, slightly cynical shell I've acquired after about 8 years in the XML world. He didn't talk about "maybe someday..." or "if only ...", he showed what a little imagination can do with the widely deployed XHTML, CSS, and XPath technologies today.


So why did this pierce my cynical shell? Most would agree that we need more metadata on the Web for it to live up to its full potential -- that's the very premise of the Semantic Web effort in which Tim Berners-Lee has invested much of the W3C's resources (and credibility). On the other hand, the historical difficulty of getting real people to put metadata in their content is believed by many to doom such efforts to failure. (Cory Doctrow's essay is the most colorful and cogent, if widely reviled, statement of this position). Udell's insight is that we can leverage the technology we have, salted by human vanity, to get usable metadata without technological breakthroughs or unrealistic demands on humans.

As Dorothea Salo recently pointed out, this isn't only my insight. I'm just one of the people who keeps on noticing, and drawing attention to, ways we can make more out of what we already have.

Former URL: