My report on June 3's access_log sparked a fair bit of commentary. I realized today, though, that I neglected to account for the rss.xml requests that were answered with an HTTP 304 ("not modified") response. The RSS community owes Simon Fell a vote of thanks for noticing, about a year ago, that the volume of redundant RSS traffic could be radically reduced by exploiting the HTTP/1.1 ETag and If-None-Match headers. Aggregators promptly jumped on Simon's suggestion, and soon after things got a lot faster and more efficient.
From a log analysis perspective, the result is that an RSS request can look like this:
"GET /udell/rss.xml HTTP/1.1" 304 0 "http://ranchero.com/software/netnewswire/" "NetNewsWire Lite/1.0.2 (Mac OS X)"
Or this:
"GET /udell/rss.xml HTTP/1.1" 200 39861 "http://ranchero.com/software/netnewswire/" "NetNewsWire Lite/1.0.1 (Mac OS X)"
In the former case, the newsreader was informed that there was no change to the RSS file since the last time it was fetched. In the latter case, there was a change, and 39861 bytes of RSS data were shipped to the client.
For my 55354 log entries on June 3, the big picture looks like this:
total requests: 55354 requests for rss.xml: 19165 for rss.xml, non-304: 6986
Here's a more detailed breakdown:
rss.xml requests | http 200 responses | % of 200s | http 304 responses | % of 304s | |
total | 19165 | 6986 | 36% | 12179 | 64% |
newswire | 4280 | 1179 | 28% | 3101 | 72% |
sharpreader | 2941 | 506 | 17% | 2435 | 83% |
newsgator | 1105 | 189 | 17% | 916 | 83% |
feedreader | 1128 | 640 | 57% | 488 | 43% |
Clearly Simon's technique is saving everybody a ton of bandwidth.
Does this diminish the importance of RSS readers? It depends what you count. If you subtract the rss.xml requests from the total, there were 36,189 non-RSS requests. But of course, many of these were for images of coffee cups and other UI paraphernalia. The number of requests for HTML pages was only 7752, a number only slightly greater than the 6986 RSS requests that yielded fresh content. That's quite remarkable. And yes, I am curious to see whether and how today's <xhtml:body> experiment might affect that ratio.
Update: More fiddling yields this:
"views" 1 | unique IPs | adjusted unique IPs 2 |
adjusted % | |
HTML | 8410 | 1306 | 996 | 75% |
RSS | 6685 | 2005 | 1695 | 85% |
1 For HTML, requests for */ or *.html. For RSS, requests for rss.xml, excluding HEAD requests and 304 responses.
2 For HTML, IPs found in HTML requests and not in RSS requests. For RSS, the inverse.
It appears there are two quite disjoint populations of readers. 75% of the HTML requests come from IP addresses not seen from RSS readers, and 85% of the RSS requests come from IP addresses not seen from HTML requestors. Another way to look at this is to combine all the IP addresses, dedupe them, and check how many appear twice. The number of IPs unique to the HTML + RSS combination is 3001. Of these, only 310 appear twice.
Former URL: http://weblog.infoworld.com/udell/2003/06/08.html#a716