RSS redirection and regex/Frontier/XSLT XML hacking

After I posted yesterday's note about RSS redirection, Dave Winer wrote to remind me that there is a mechanism known to work for both Radio UserLand and NetNewsWire. It looks like this:

<?xml version="1.0"?> 
<redirect>
<newLocation>
 http:\//jonudell.net/udell/gems/longDescriptionFeed.xml
</newLocation>
</redirect>

If you hit my alternate feed with your browser, you'll see this XML redirect. I'm happy to report that I've just tested it successfully in both RU and NNW. Both adjust the original address to the new one. I'll be interested to know which other readers do, or don't, make the same adjustment.

The reason for the switch is that I wanted to clean up my primary feed. There's no reason for it to contain <content:encoded> anymore, and it confuses NewsMonster. So now, my primary feed is just a short description and an <xhtml:body>. My alternate feed contains the whole body encoded within the description, for folks who want to read the blog in a non-<xhtml:body>-aware aggregator.

Making these adjustments was trickier than I thought it would be. I'm narrating the process here partly so I can remember it later, and partly because it glosses some recent discussion about XML processing[1, 2]. There were two parts to the task:

  1. Copy the original feed -- that is, what my alternate RSS writer produces. Remove <description> and <xhtml:body> from the copy, and rename <content:encoded> to <description>, in order to create the alternate feed.

  2. Modify the original feed in-situ, removing <content:encoded>, to create the standard feed.

I tried three approaches: regular expression hacking, Frontier-style XML hacking, and XSLT.

The regex approach

Despite my earlier regex advocacy, I didn't have much luck. That's partly, I guess, because Radio's regex.dll doesn't think quite like Perl's regex engine. Anyway, it got to be a mess.

The Frontier approach

To use the Frontier-style approach, I started like so:

xml.compile ("c:\\radio\\www\\rss.xml", @rss);

This failed embarrassingly, because (as Dave kindly pointed out) I was trying to compile the filename, not the content. This is the right way to turn your RSS file into a Frontier table:

xml.compile (file.readWholeFile("c:\\radio\\www\\rss.xml"), 
  @scratchpad.rss);
edit ( @scratchpad.rss );

The edit statement brings up a Frontier editing window, where you can inspect the RSS file as a Frontier table. You can also write code to walk around inside the table, making changes as needed. You have to use special verbs to get at the addresses of the subtables, which I found confusing, but this hybrid interactive/programmatic approach is nifty. It started to add up to a lot of code to do what I wanted, though, so I decided to try things the XSLT way.

The XSLT approach

I started with msxsl.exe, a command-line tool for running XSLT transforms. The first stylesheet, longDescription.xml, was a minor variation on the one that was already transforming my primary feed into my alternate feed (by way of the W3C XSLT service). The only change needed here was to remove <xhtml:body>, so I added this template:

<xsl:template match="xhtml:body">
</xsl:template>

Here's the second stylesheet, xhtmlBody.xml:

<?xml version="1.0"?> 
<xsl:stylesheet 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
  xmlns:content="http://purl.org/rss/1.0/modules/content/" 
  xmlns:dc="http://purl.org/dc/elements/1.1/" 
  xmlns:xhtml="http://www.w3.org/1999/xhtml" 
  version="1.0">
<xsl:output method="xml" indent="yes" encoding="us-ascii"/>
<xsl:template match="node() | @*">
  <xsl:copy>
    <xsl:apply-templates select="@* | node()"/>
  </xsl:copy>
</xsl:template>
<xsl:template match="//content:encoded">
</xsl:template>
</xsl:stylesheet>

The first template in this stylesheet is something the XSLT geeks call "the identity transform." It just echoes the input to the output. But as you do that, you get the chance to override aspects of the transform. In this case, my second template matches <content:encoded> and does nothing with it. As a result, that element drops out of the feed.

To integrate this with Radio UserLand, I packaged up my two command-line transforms into a CMD file:

c:\\radio\\tools\\msxsl.exe c:\\radio\\www\\rss.xml \\
  c:\\radio\\www\\gems\\longDescription.xml \\
  -o c:\\radio\\www\\gems\\longDescriptionFeed.xml
c:\\radio\\tools\\msxsl.exe c:\\radio\\www\\rss.xml \\ 
  c:\\radio\\www\\gems\\xhtmlBody.xml -o c:\\radio\\www\\rss.xml

Then, I added this line to my alternate RSS writer:

launch.application("c:\\radio\\tools\\fixrss.cmd");

There's one more XSLT stylesheet involved here. The alternate feed's original address invokes longFeed.xml, which used to look a lot like longDescription.xml, but now simply returns the XML redirect.

Here's a tip, by the way. When you're hacking around with your Radio feeds, turn upstreaming off. Otherwise you'll torment your subscribers.

Whew! That was kind of confusing, but I think everything's straightened out now. (Sanity check: is the primary feed valid? Is the alternate feed valid?) As the Perl guys like to say, There's More Than One Way to Do It. Were Radio's regex engine more Perl-like, I'd probably have solved the problem that way. It's a reflex. Were I more accomplished at Frontier XML hacking, I might have gotten the quickest result using xml.compile and friends. In this case, however, XSLT wound up being my weapon of choice. I'm glad to have figured out how to incorporate it into my Radio repertoire.

Finally, I'm really glad to know that RSS redirection works, at least for RU and NetNewsWire.


Former URL: http://weblog.infoworld.com/udell/2003/04/17.html#a670