Shortening RSS descriptions to lead sentences

I'm really enjoying my ability to scan a lot of sources in my Radio news aggregator. What slows me down, though, are the channels with long or irregularly-formatted descriptions. One of the worst offenders is the Privacy Digest . I want to scan that channel, but the cluttered format of its descriptions makes it really hard to do so.

By contrast, take a look at Eric Snowdeal 's feed. Although it's mildly disconcerting that the truncation doesn't respect word or sentence boundaries, the effect -- especially in the aggregator -- is to make this feed much more easily scannable.

For my own reading convenience, I may wind up pulling feeds out of Radio's aggregator database and reformatting them. But on the flip side, I've been feeling kind of antisocial about dumping these longish essays into an RSS feed that I myself wouldn't want to read.

So for now, I've changed this line in system.verbs.builtins.radio.weblog.writeRssFile:

add ("<description>" + adritemcache^.text + "</description>")

to this:

local (s = adritemcache^.text);
s = string.replaceAll( s, "<" , "<" );
s = string.replaceAll( s, ">" , ">" );
s = string.replaceAll( s, "&quot;" , '"' );
s = string.replaceAll( s, "&apos;" , "'" );
s = string.replaceAll( s, "&amp;nbsp;" , " " );
regex.subst("<[^>]+>", "", @s);
add ("<description>" + string.firstSentence(s) + "</description>")

Radio's string.firstSentence seems like just the ticket. A lead sentence is supposed to be special. Knowing that it's all a reader of your RSS feed might see (after the title) makes it even more so.

Of course, it's completely non-kosher to hack into a system script that can be overwritten by UserLand at any time, as happened most recently a few days ago:

4/7/02; 9:55:21 AM by DW {
Do the macro processing or descriptions unconditionally, and in-line instead of calling radio.string.processMacros, which was only processing the descriptions if it contained "<%". This broke shortcuts. Prior art is the Manila-Blogger Bridge Tool, which unconditionally processes macros.}

Nevertheless, you've got to love the openness of a system that makes it possible, and easy, to do this non-kosher thing.

< P>


Former URL: http://weblog.infoworld.com/udell/2002/04/11.html#a186