New directions in source code analysis

Large-scale software systems are staggeringly complex works of engineering. Bugs inevitably come with the territory and for decades, the software profession has looked for ways to fight them. We may not see perfect source code in our lifetime, but we are seeing much better analysis tools and promising new approaches to remedy the problem.

TDD (test-driven development) is one increasingly popular approach to finding bugs. The overhead can be substantial, however, because the test framework that ensures a program's correctness may require as many lines of code as the program itself. Run-time checking is another popular approach. By injecting special instrumentation into programs or by intercepting API calls, tools such as IBM's Rational Purify and Compuware's BoundsChecker can find problems such as memory corruption, resource leakage, and incorrect use of operating system services. TDD and run-time checking are both useful techniques and are complementary. But ultimately, all errors reside in the program's source code. Although it's always important for programmers to review their own code (and one another's), comprehensive analysis demands automation.

One compelling demonstration of the power of automated source code analysis is Coverity's Linux bugs database. Viewable online, this April 2004 snapshot pinpointed hundreds of bugs in the Linux 2.6 source code. Coverity's analyzer, called SWAT (Software Analysis Toolset), grew out of research by Stanford professor Dawson Engler, now on leave as Coverity's chief scientist.

In the Windows world, a static source code analyzer called PREfast, which has been used internally at Microsoft for years, will be included in Microsoft Visual Studio 2005 Team System. PREfast is a streamlined version of a powerful analyzer called PREfix, a commercial product sold in the late 1990s by a company called Intrinsa. Microsoft acquired Intrinsa in 1999 and brought the technology into its Programmer Productivity Research Center. Full story at [InfoWorld.com]

For more background on this story, see this set of links I posted to del.icio.us. Although I haven't drawn attention to it until now, this is another aspect of my ongoing quest to bring more transparency and accountability to journalism -- or anyway, to the little corner of the journalistic world that I inhabit when I'm wearing my journalistic hat.

Elsewhere I've outlined my general approach to blog/print synergy. Social bookmarking is a natural complement to that set of strategies. Here are some of the outcomes I foresee:

Passive amplification. By "amplification" I mean using the Net to amplify the effects of my research efforts. Such amplification can occur, for example, when I navigate from my set of sourcecodeanalysis links to everyone's sourcecodeanalysis links. Of course if nobody else is using that tag, as is currently true in this case, then those two sets are identical and I've gained no wider view of the subject.

Recently, though, del.icio.us has added a related-tag feature. So for example I can navigate from judell/sourcecodeanalysis to rickduarte/codeanalysis -- where I'll find some overlapping links but also some new ones.

Active amplification. Passive amplification flows automatically from posting research links in a public place where a correlation engine, such as del.icio.us, can work on them. Active amplification is a new concept I haven't actually tried yet. Here's how it would work. Consider, for example, the reading list I accumulated for a recent feature article on Microsoft security. Because this story was posted on the editorial calendar, I was flooded with email from public relation folk hoping to inject their clients' perspectives into the story. What if I ask people -- and not just PR folk, but anyone with a perspective that ought to be considered -- to route links to a del.icio.us tag that I'll preannounce?

I've already established the prececent of blogging, in advance, the key points I see as relevant to a story in progress, and inviting the story's constituency to help me refine that agenda. The idea here would be to extend that collaborative process to the development of the story's background reading list.

As with all schemes involving a world-writable database, this one will be inherently spammable. But the two-tier arrangement -- everyone's links versus my links -- affords some control.

Here's the protocol I envision:

For a story on TOPIC, I'm collecting links at del.icio.us/judell/TOPIC. To monitor that evolving collection, subscribe to del.icio.us/rss/judell/TOPIC. If you would like to suggest links I've missed, please post them to del.icio.us/YOURNAME/TOPIC. I guarantee that I'll monitor the aggregation of contributed links at del.icio.us/rss/tag/TOPIC. I don't guarantee a response to follow-up email requests. If I judge an item you've posted to be significant, however, you'll see it show up at del.icio.us/judell/TOPIC.

Interesting, eh? Next time I write a big feature story I'll give this a try, and we'll see whether theory translates into practice.

Former URL: http://weblog.infoworld.com/udell/2004/11/01.html#a1105