Server-side XSLT filtering

When Michael Rothwell wrote to tell me about how and why he uses server-side XSLT, and built an XSLT ISAPI filter to do it, the reasons weren't immediately obvious to me. If you're running IIS, why not just use the existing MSXML capabilities? The reasons finally began to sink in when I read his answer to my question, which has now become part of his XSLT Filter for IIS subsite. His key points include:

- MSXML doesn't operate in a transparent pipelining mode:

Microsoft's XML and XSLT COM components (MSXML) can be useful, but require generation of xml and transformation to be explicitly handled in the ASP application; building or acquiring the XML, instantiating the XML and XSLT parsers, doing the transformation, and handling the output must be done explicitly. It is not something you can just add to an existing application and realize immediate benefits -- you have to re-structure portions of the application to explicitly use XML and XSLT.

- The filter approach supports incremental enhancement of existing apps:

Using an XSLT filter is a sort of half-way point in the separation of presentation from logic. Most web applications mix presentation and logic together. ASP-based sites seem to be as or more prone to this than other web application frameworks. Upgrading and standardizing the presentation of a web application is often difficult and time-consuming. By using and XSLT filter at the output stage, some extra separation and modularization can be achieved. It also allows complete separation of presentation and logic if one wishes, but does not force an immediate move to that paradigm.

- Using the XMLSoft libraries (Gnome's libxml/libxstl) makes this technique available both on Unix/Linux (by way of Apache::ASP) and Windows:

At home, I use Linux and MacOS X most of the time. At work, I use IIS and ASP most of the time. I searched around for an XSLT filter for IIS, and could not find one (update: there is an XSL filter from Microsoft , but it's kludgy). So, I wrote one. I did not use the MSXML COM libraries, because COM isn't the fastest, and it's more or less a royal pain in the butt to program with from C or C++. Instead, I chose the XML and XSLT libraries from xmlsoft.org, because they are fast, portable, and easy to use from C. Also, LibXML has the HTML parsing mode I mentioned earlier, which provides a lot of advantages over using a straight (and strict) XML parser.

This last point about HTML parsing is especially interesting. When I reviewed Traction , for example, I learned from its developers that the software will absorb HTML into its XML data store. Since they kept talking about an XML parser, I was puzzled. Every XML parser I know barfs at the first instance of non-well-formedness. Turns out that Traction's roots are in SGML, and it uses an SGML parser that doesn't behave this way.

There are good reasons for XML parsers to be strict. But there's a world full of generated HTML that doesn't end tags properly or quote attributes, and XML technology ought to be able to add value incrementally to that world. I really like Michael's approach. Thanks for taking the time to explain it!

Former URL: http://weblog.infoworld.com/udell/2002/08/07.html#a373