The browser war, or What server logs are telling us

Introduction

I run small web server since 1995. It is not terribly different from thousands of similar sites; mostly stuff which I find useful, or entertaining, or just plain funny enough to keep. The most frequently visited section is the archive of song lyrics; I have picked once famous University of Wisconsin collection and added what I have gathered over years. I have also quite large amount of scanned photos made in several trips over the world, very outdated pile of game solutions and hints, and other similar garbage of which Internet is quite full. My users (laboratory co-workers) also keep some interesting things, from collection of links to astronomy picture galleries. Except one relatively large part, 99.9% of the information is in English. But now you are probably starting to wonder where my thoughts are going and should you read any further.

In January 1999 while upgrading server operating system from Slackware 3.4 to Slackware 3.6, the following idea has visited my mind: what if I will take server logs for 1998 and try to extract something useful from them before archiving to CD or deleting? The logs actually contain a lot of information, but this article only deals with single but important aspect: what browser was used to view my pages. Technically, it is called 'User Agent Log' (browser is an agent from server's point of view).

What logs do contain and how I analyzed them

I was using not-very-up-to-date version of Apache server, and did not read its documentation thoroughly enough, so I have missed the directive which produces logs in Extended Common Format. As a result, the logs only contain browser identifier, without relation to the data about from where the user came, which page he viewed etc. Therefore I can't do anything on it except to make some raw statistics. But even that appeared to be quite interesting.

The total size of the logfile is about 100MB uncompressed (this is pure User-Agent log, without Access, Referrer or Error logs). It contains 2,757,324 records. Of them, at least 400,000 belong to various web indexers, robots and mass downloaders, so we discard them from the start. Remaining 2,350,000 entries will be considered real browser data, and further percentages will be given relative to this number.

The every line of the log contains a string which identifies the user agent (which is not necessarily the WWW browser). Sometimes it is as simple as

Mozilla/2.02 (OS/2; I)
but sometimes it's as complex as
Mozilla/2.01S (X11; I; IRIX 5.3 IP22)  via proxy gateway  CERN-HTTPD/3.0 libwww/2.17
The initial processing was to run
sort agent_log | uniq -c | sort -r >agent_log.stats
This gave the listing where every browser is listed on its own line, together with number indicating how many hits have been generated by it. This file is only 970KB. However, every version of the browser is on its own line, and for some sorts of queries this is not quite useful. When counting all hits from all versions of some browser, it is much easier to use simple fgrep:
fgrep -i lynx agent_log | wc -l

What is the most popular browser?

Here's "Top 30". It lists specific browser versions (we'll sum them later). Of course, preinstalled software such as MSIE 3/4 on Windows takes several top lines; there are quite a lot of people who won't change anything if it works for them or at least seems to work most of the time.
 121277	Mozilla/4.0 (compatible; MSIE 4.01; Windows 98)
 119546	Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)
 119033	Mozilla/4.0 (compatible; MSIE 4.0; Windows 95)
 110693	Mozilla/3.0 (compatible; MSIE 3.0)
  97114	Mozilla/2.0 (compatible; MSIE 3.02; Update a; Windows 95)
  94184	Mozilla/4.04 [en] (Win95; I)
  79309	Mozilla/3.0 (Win95; I)
  61271	Mozilla/2.0 (compatible; MSIE 3.01; Windows 95)
  61214	Teleport Pro/1.28
  58402	Mozilla/2.0 (compatible; MSIE 3.0; Windows 95)
  55084	Mozilla/4.05 [en] (Win95; I)
  39879	Mozilla/4.0 (compatible; MSIE 4.01; Windows NT)
  39485	Mozilla/2.0 (compatible; MSIE 3.02; Windows 95)
  32486	Arkanavt/1.02.015 (compatible; Win16; I)
  31868	Mozilla/4.03 [en] (Win95; I)
  29425	Teleport Pro/1.29
  27537	Mozilla/4.01 [en] (Win95; I)
  25959	Mozilla/3.01 (Win95; I)
  25100	Mozilla/3.01Gold (Win95; I)
  21457	Mozilla/2.02 (OS/2; I)
  20410	Mozilla/3.01 (WinNT; I)
  19758	WebZIP/2.32 (http://www.spidersoft.com)
  18604	Mozilla/3.0 (Win95; I; HTTPClient 1.0)
  17280	IBM-WebExplorer-DLL/v1.2 
  17027	Mozilla/4.0 [en] (Win95; I)
  17010	Mozilla/2.0 (compatible; MSIE 3.02; Update a; AK; Windows 95)
  16915	Mozilla/2.0 (compatible; MSIE 3.02; AK; Windows 95)
  16897	Mozilla/3.0
  16082	Mozilla/4.0 (compatible; MSIE 4.0; Windows NT)
  15107	Mozilla/4.04 [en] (WinNT; I)
As you probably know, many browsers (including MSIE) pretend to be called Mozilla which is some kind of internal name for Netscape Navigator. Apparently this is done to fool some overly clever web designers who check the browser id string and generate different output for different browsers or direct them to different pages. Usually browsers not from Netscape Corp. have the keyword "compatible" in their id string, to be distinguished from genuine Netscape software. MSIE also contains its own signature; for example: Mozilla/2.0 (compatible; MSIE 3.02; Windows 95). This is how we can tell Navigator from IE and some others (Opera, BeOS NetPositive).
2164967 92.1% Mozilla (total)
1067083 45.4% Netscape
1077591 45.8% MSIE  
  20293  0.9% Mozilla (neither Netscape nor MSIE)
So to speak, the two most popular browsers have got a tie in this race.

Life beyond Netscape and IE

The most popular machine among Internet surfers is undoubtedly PC, and the most popular OS is Windows. If you don't run Windows, you are usually either have Netscape (Mac, most free and commercial Unixes, OS/2) or you don't have it and have to stick to whatever OS vendor can offer (BeOS, VM/CMS, QNX, less widely used Unixes). IE runs only on Windows -- with two exceptions -- one important (MacOS) and one not (Solaris and HP-UX, slightly over 100 hits total). In other words, there's little competition outside of Windows. On Windows however, Netscape and IE aren't the only choices.

The most touted alternative browser is Opera. It has produced 4734 hits (0.2%). Not quite big number. Arkanavt for Windows 3.1 has collected about 32,500 hits (1.3%); it is little known outside of Russia. Teleport Pro is an `offline browser' which means it is closer to robots than to real browsers (robots are different because they are downloading more pages than human will ever want to read). Various versions and direct derivatives of Windows Mosaic did not reach 500 hits:

    314	SPRY_Mosaic/v8.24 (Windows 16-bit) SPRY_package/v4.00
    111	SPRY_Mosaic/v8.32 (Windows 16-bit) SPRY_package/v4.00
     39	SPRY_Mosaic/v8.08 (Windows 16-bit) SPRY_package/v4.00
     29	Spyglass_Mosaic/2.10 Windows Datastorm/3.00
     18	SPRY_Mosaic/v8.25 (Windows 16-bit) SPRY_package/v3.10
     12	SPRY_Mosaic/v7.36 (Windows 16-bit) SPRY_package/v4.00
     11	Multilingual_Mosaic/1.0e Win32 Accent/69
      6	Multilingual Mosaic/1.0c Win32 Accent/0024
All these numbers tell us that two giants (Netscape and IE) dwarf any competitive products. Should I remind you that their total share is about 91%?

However, good Web writer should always keep in mind that even if Windows is popular, it is not the only game in town. Linux quickly gains its share, loyal users of commercial Unixes won't give up their boxes and OS/2 while being proclaimed dead for years is still in use. The most crossplatform browser is undoubtedly Lynx. But since it is textmode-based not much people use it.

   8687 0.37% Lynx (virtually any OS)
  17541 0.75% IBM WebExplorer (OS/2)
    959 0.04% NetPositive (BeOS)
WebExplorer is an older browser for OS/2. Its development has been stopped by IBM in approximately 1994; it supports HTML 2.0 and tables. The hit number is probably not authoritative because it's my primary browser. BeOS does not seem to have user base of any significance and I really wonder why this number is so low. Do these people only tend to visit Be-specific sites? Or maybe most BeOS users are software developers who don't have enough time to aimlessly wander around? :-)

How often do people upgrade their browser software

We only can analyze Netscape and MSIE here because other browsers just don't give enough hits to produce anything close to being statistically interesting. Have a look at the data below:
   7154   0.6% Netscape 1.x
  69622   6.5% Netscape 2.x
 443537  41.5% Netscape 3.x
 544036  51.0% Netscape 4.x (current)
     93   0.0% Netscape 5   (alpha; Open source Mozilla project)
1067083        Netscape total

      4   0.0% MSIE 1.x
  17455   1.6% MSIE 2.x
 532351  49.4% MSIE 3.x
 520922  48.3% MSIE 4.x (current)
   6799   0.6% MSIE 5.x (beta)
1077591        MSIE total

The data tells us that at least half of the installed browser base is from previous version. This is very suspectible generalization and apparently it depends on when new versions are released. Nevertheless users of MSIE seem to upgrade more often; at least number of MSIE 1.x and 2.x users is negligible, unlike corresponding counts for Navigator. The number of hits from Netscape 1 (which does not support even frames) is rather surprising.

8.5% of users of Windows 98 have replaced their built-in IE4 with Navigator:

  13649   8.5% Netscape on Win98
 159645 100.0% Windows 98 (total)

Robots, funny and rare browsers

The top lines of the statistics show how search engines were doing their work:
  89880	Slurp/2.0 (slurp@inktomi.com; http://www.inktomi.com/slurp.html)
  33227	AltaVista Intranet V1.0 msu.ru webmaster@msu.ru
  33191	ArchitextSpider
  30277	InfoSeek Sidewinder/0.9
  12545	Aport
   8113	Scooter/1.0 scooter@pa.dec.com
   6229	Googlebot/1.0 googlebot@googlebot.com http://googlebot.com/
   4047	Scooter/2.0 G.R.A.B. X2.0
   4044	StackRambler/1.4
   3776	Scooter/2.0 G.R.A.B. V1.1.0
   3376	ia_archiver/1.6
   2909	Openfind Robot/1.1A2
    631	WebCrawler/3.0 Robot libwww/5.0a
    564	ArchitextSpider/ libwww/5.0a
    472	Lycos_Spider_(T-Rex)
The most active are Inktomi (Slurp), Excite (ArchitextSpider) and Infoseek. Second line tells me that somebody wants to index Moscow State University pages. Google Search is a newcomer; personally, I like its clean and uncluttered interface. Hope they won't pollute it with ad banners too soon. Google usually gives very relevant links (unlike many other search engines). Aport and Rambler are Russian search engines. Altavista (Scooter) is rather quiet, WebCrawler is forgotten, and how do I supposed to trust Lycos after that number of hits? Perhaps they buy datasets from others?

Some people prefer to alter their browser their browser id string to look funny:

    343	Nutscrape/1.0 (CP/M; 8-bit)
     36	Lord Vishnu/Transcendental (Vaikuntha;Supreme Personality of Godness)
     21	TakSebeProxy/0.0 (CP/M; 1-bit)   -- you have to be Russian to understand this
     14	Nutscrape/1.0 (CP/M; 8-bit) via NetCache version NetApp Release 3.2.1: Thu May 21 16:33:01 PDT 1998
     11	Microsoft Pocket Internet Explorer/0.6   -- hmmm... will it explore my pockets???
      9	James_Bond/007 (CP/M; 8-bit)
      8	Nutscrape/1.001beta (CP/M; 8-bit)
      6	Psychoscape/1.11 (MaSaDoSa;12-bit)
      6	None/1.0 (CP/M; 8-bit) 
      6	MFC_Tear_Sample     -- what is that???
      5	Nutscrape/1.1 (DRDOS; 8-bit)
      5	InterNetscape Operasaic Browser, Zeta Release (The one that don't crash)   -- doesn't crash???
      4	This is not text/Warpzilla 0.005
      4	Nutscrape-1.0 (CP/M; 8-bit)
      4	I am not a number, I am a human being!
      2	Boomscape/1.0 (Dog; 4-bit)
      2	Bidon/1.0 (CP/M; 12-bit)
      1	The Knights Who Say Ni (Highlander; M; There Can Be Only One)
      1	Nutscrape/1.0 (DRDOS; 8-bit)
      1	Nutscrape/1.0 (CP/M; 8-bit) 
      1	MSIE/4.0 (not Windows,really,AWEB for Amiga!); (Spoofed by Amiga-AWeb/3.2)
      1	Fake Browser #2
      1	BPFTP/1.07/ILLEGAL COPY!  THIS PROGRAM HAS BEEN MODIFIED!
The top line reflects a hint from Squid configuration file about using fake_user_agent directive.

There are browsers for rather obscure operating systems or devices:

   7269	Mozilla/3.0 WebTV/1.2 (compatible; MSIE 2.0)
    342	Nokia-Communicator-WWW-Browser/1.0 (Geos 3.0 Nokia-9000)
     48	xChaos_Arachne/1.20;overlaid;beta 7 (DOS x86; 640x480,16c; info@main.naf.cz; http://www.naf.cz/arachne/)
     42	Charlotte/2.1.0 VM_ESA/2.2.0 CMS/13
      1	ArcWeb/1.91 (Acorn RISC OS; StrongARM)
      1	Mozilla/1.22 (compatible; NetBox/1.0 R77; NEOS 5.15)
      1	Mozilla/2.0 (compatible; QNX Voyager 1.0 ;Photon)
      1	Mozilla/2.0 EasyRider-XT/1.3.5.10 (ARM; 32bit; compatible; MSIE 2.0; IA-PAL) libwww/2.17 modified  

Operating systems

We all know that Windows 95/98 is the most widely used operating system (at least now).
Windows 95           1372044  63.9%
Windows NT            184982   8.6%
Windows 98            159645   7.4%
Windows 3.1            96809   4.5%
Win32 ?               126123   5.8%
  Windows total      1939603  90.4%
Linux                  25601   1.2%
Solaris/SunOS          17906   0.8%
FreeBSD                 6981   0.3%
HP-UX                   4287   0.2%
IRIX                    3516   0.16%
BSD/OS                  2298   0.11%
AIX                     1868   0.09%
OSF/1                   1534   0.07%
  Unix total           63991   3.0%
Macintosh              78189   3.6%
OS/2                   54433   2.5%
WebTV                   7288   0.3%
Amiga                   1297   0.06%
BeOS                     959   0.04%
------------------------------------
Total                2145760   100%
Since not all browsers indicate on which system they run, the total does not match total number of hits (the offset is about 8.5%). You can draw your own conclusions from this table.

Disclaimer

I hope you do understand that these figure must be taken with a grain of salt (or glass of tequila if you prefer). Statistics have proved to be very susceptible to many pitfalls! For example, I use OS/2 and therefore its share in the operating systems roster is essentially artificially inflated. On the contrary, since my site is in Russia and percentage of Russian users is higher than usual, Macintosh has got quite low number (Macs are virtually unknown here). Not all browsers support Russian Internet encondings and so on, the list of factors goes ad infinitum!

Further reading: list of browsers on BrowserWatch, web server survey on NetCraft.

Copyright (C) 1999 Sergey Ayukov. No part of this text can be reproduced without permission.
20 March 1999


return to essays | return to homepage | send comment