User agents revisited

It's been a while since I took a look at my own browser stats. So long that the term is really obsolete, given the rise of the RSS newsreader. We might as well just call the things that fetch web pages what they technically are: user agents. Anyway, I started by looking for a comprehensive list of user-agent signatures, and found a promising candidate at PGTS. (Got a better one? Let me know.) Their compilation of about 6600 user-agent strings seemed reasonably current. I ran yesterday's 55000 log entries for this blog through it and got this:

unclassified 30085 54.350
MSIE 17554 31.712
Mozilla 3852 6.959
Safari 1380 2.493
Netscape 842 1.521
Opera 611 1.104
Galeon 433 0.782
Konqueror 170 0.307
Python-urllib 170 0.307
Java 82 0.148
Powermarks 52 0.094
Lynx 38 0.069
Crazy Browser 18 0.033
iCab 15 0.027
OmniWeb 14 0.025
PHP 14 0.025
lwp-trivial 13 0.023
Wget 8 0.014
CFNetwork 2 0.004
Download Ninja 1 0.002

Clearly that unclassified category wants to be unpacked. So I scanned the log for user-agent names, producing a list like this:

amaya/5.1
aolbrowser/1.0
curl/7.7.1
curl/7.9.8
gazz/2.1
gnome-vfs/1.0.1
iCab/2.8
iCab/2.9
Mozilla/4.5
iCab/2.9.1

I threw away the versions, deduped, and scanned my log entries again, giving preference to the PGTS list (bolded in the tables) but then falling back to my secondary names (italicized in the tables). Of the many interesting points that could be drawn from this data, I'll just focus on one for now. Browsers whose names begin with "Mozilla" make up almost a third of what was the unclassified category. Those plus the Mozillas recognized by the PGTS list add up to about 25%, versus MSIE's 32%. Meanwhile, as I showed yesterday, Mozilla has become a platform that can support a rather interesting XML application -- a specialized information viewer, with its own built-in structured search engine -- on Windows, Mac, and Linux.

Having reached this point after long struggle, will the Mozilla project now find a sponsor worthy of its ambition? I hope so.

Here's the revised table:

MSIE 17554 31.712
Mozilla 11052 19.966
NetNewsWire 4339 7.839
Mozilla 3852 6.959
SharpReader 2998 5.416
Radio 2364 4.271
Safari 1380 2.493
Feedreader 1123 2.029
NewsGator 1114 2.013
Wildgrape 924 1.669
Netscape 842 1.521
Syndirella 673 1.216
Opera 611 1.104
Web 581 1.050
RssBandit 554 1.001
Java 479 0.865
Galeon 433 0.782
unclassified 377 0.681
nntp 340 0.614
AmphetaDesk 287 0.518
curl 220 0.397
LWP::Simple 218 0.394
Konqueror 170 0.307
Python-urllib 170 0.307
clevercactus 150 0.271
Hep 133 0.240
Soup 130 0.235
gnome-vfs 129 0.233
PHP 107 0.193
Wget 106 0.191
Python-urllib 100 0.181
SwitchCrawler 94 0.170
Genecast 86 0.155
Java 82 0.148
Hapax 78 0.141
Broked 72 0.130
Straw 59 0.107
http://www.almaden.ibm.com/cs/crawler 55 0.099
blagg 54 0.098
libwww-perl 53 0.096
Powermarks 52 0.094
PostNuke: 49 0.089
Syndic8 48 0.087
Hatena 41 0.074
Googlebot 39 0.070
Lynx 38 0.069
NIF 37 0.067
Awasu 36 0.065
Scooter 34 0.061
rssSearch 33 0.060
Frontier 31 0.056
MagpieRSS 30 0.054
MovableType 30 0.054
Opera 30 0.054
Channel 30 0.054
Aggie 28 0.051
Zao 28 0.051
CFMX 24 0.043
ia_archiver 24 0.043
spnlib 24 0.043
KNewsTicker 24 0.043
Edu_RSS 24 0.043
XSA 24 0.043
servalBlagg.py 23 0.042
mt-rssfeed 21 0.038
Twisted 21 0.038
OpenTextSiteCrawler 19 0.034
Dual 19 0.034
Crazy Browser 18 0.033
ScoopRDF 16 0.029
timboBot 16 0.029
iCab 15 0.027
OmniWeb 14 0.025
PHP 14 0.025
ActiveRefresh 14 0.025
lwp-trivial 13 0.023
Popdexter 12 0.022
larbin_2.6.2 12 0.022
QuepasaCreep 11 0.020
FeedDemon 11 0.020
MyHeadlines 11 0.020
IdeaLibHttp 10 0.018
Fresh 9 0.016
ovidiubot 8 0.014
RSSMirandaPlugin 8 0.014
Browser 8 0.014
lwp-trivial 8 0.014
Wget 8 0.014
effnews 8 0.014
janes-blogosphere 7 0.013
FAST-WebCrawler 6 0.011
RPT-HTTPClient 6 0.011
Microsoft 5 0.009
FeedOnFeeds 5 0.009
vw-http 4 0.007
Gazette 4 0.007
vspider 4 0.007
eCatch 4 0.007
synerge 4 0.007
httpSocket 3 0.005
Mail 3 0.005
Feedster 3 0.005
Plucker 3 0.005
DMonitor 3 0.005
MobiPocket 2 0.004
grimp: 2 0.004
NPBot 2 0.004
The 2 0.004
ColdFusion 2 0.004
MnogoSearch 2 0.004
ASPseek 2 0.004
iSiloX 2 0.004
EbiNess 2 0.004
linkhype.com 2 0.004
MiracleAlphaTest 2 0.004
LinkWalker 2 0.004
CFNetwork 2 0.004
SURF 1 0.002
InfoMinder 1 0.002
PocketFeed 1 0.002
Watchfire 1 0.002
daypopbot 1 0.002
htdig 1 0.002
Blogosphere 1 0.002
Internet 1 0.002
Download Ninja 1 0.002
lachesis 1 0.002
Calzilla 1 0.002
Openbot 1 0.002
LinkScan 1 0.002
FlickBot 1 0.002
BlogBot 1 0.002
MSProxy 1 0.002

Former URL: http://weblog.infoworld.com/udell/2003/06/04.html#a712