Thomas Bayes, a Presbyterian minister and mathematician born just over 300 years ago, would be shocked to see most of the e-mail messages that bid for our attention nowadays. He would be thrilled to know, however, that his statistical inference theorem has inspired a potent counterattack. An open source project called SpamBayes has emerged as a powerful weapon in the war on spam. There are a few different implementations of SpamBayes. I'll focus here on an Outlook add-in, written by renowned Python hacker Mark Hammond. I've been skeptical about the long-term prospects for content-based e-mail filtering. But the Python-based SpamBayes engine, and Hammond's brilliant add-in (also written in Python), are rapidly making me a believer. [Full story at InfoWorld.com]This article was, among other things, an experiment in combining print journalism with blogging. When I wrote the review, I was so excited about the results I was getting with SpamBayes that I wanted to blog it immediately. Instead, I wrote a companion piece online and then, based on feedback, another one the next day. The latter URL will presciently appear in the print article that InfoWorld subscribers will receive next week. This time-travel effect is kind of cool, but it also demonstrates what I think is a really powerful kind of print/online synergy.
I love the idea of a story that begins online, is snapshotted for a print readership, and then continues online. I first tried the experiment in 1996, when I was researching a BYTE cover story. Why, I wondered, should the Internet serve only as a mechanism for after-the-fact feedback? Why couldn't I post the general outline of the story I was envisioning, in order to attract perspectives that could usefully shape the story? That's just what I did, and it was a transforming experience. An engineer at JPL told me about a compelling use of Java for distributed data visualization, and that became a central element of the story.
The SpamBayes review worked a bit differently. It was already in production when I blogged the companion pieces, but there was still time to incorporate feedback into the review. (I see that one correction wasn't made, though. I'd asked to remove the reference to Mac OS X mail from this -- "Several e-mail programs, including the Mail program bundled with Mac OS X, use Bayesian techniques" -- because of what I learned here.)
A more complete example of print/online synergy is an article I'm working on right now, a companion piece to a review of J2EE servers. I posted some initial thoughts here, and collected lots of useful feedback both in comments and privately. Based on that feedback, I've realized that the original "is J2EE/EJB overkill?" theme was a bit dated. The folks I've talked to have gotten past that issue. They are choosing from the smorgasbord of J2EE services in thoughtful and clever ways. But it also became clear that the "agility versus robustness" theme continues to resonate with everybody. I hope that the story I'll write tomorrow or Monday will do justice to that theme. But the material I've gathered, from interviews with Adam Bosworth, Marc Fleury, Steve Muench, Annrai O'Toole, and others -- plus online commentary and email correspondence, is far more than an 800- or 1000-word InfoWorld print article can accommodate. Happily the blog exists, and can carry the theme forward at greater length, over time, and in collaboration with other blogs.
A great deal has been said and written about weblogs and journalism, but I've not seen the following point articulated clearly. In a world full of weblogs, written from all kinds of perspectives, information and opinion are commodities. But selection, analysis, synthesis, and coherent storytelling -- the highest and best functions of journalism -- are arguably more valuable than ever. That value cannot be delivered from an ivory tower, though. It must flow from an intense collaboration with what Dan Gillmor calls the former audience.
Former URL: http://weblog.infoworld.com/udell/2003/05/18.html#a694