Speedup Awstats by using GeoIP instead of DNS Lookups

Awstats is probably the best open source web stat program available (with other projects like webalizer and analog, no longer being maintained). Besides many security problems that are found all the time in Awstats, the main problem that bothered me is its speed. Awstats is written in perl, and this means that it will be considerably slower than a similar program written in C for example (like webalizer for example). There is not much we can do to speed awstats, but on high traffic sites (with logs growing over a few Gigs of data per day) this can be a real problem…

What I intended to show here is how DNS lookups can affect awstats performance and why you should not use them. Why would you want DNS lookups enabled anyway? (by the way, in default configuration DNS lookups are off). You might want to use DNS lookups to resolve the IPs to hostnames and awstats will use this information for its country statistics. I don’t see any other reason why you would want to use DNS lookups while running awstats (but hey, I might be wrong, and you might use it for something else ;)). Now why is this not a good idea?

  • first of all the country stats will not be very useful. They will not contain very relevant information if looking only on the hostname to find the country…

  • second, awstats doesn’t have a DNS caching mechanism (like webalizer has for example) to keep the IPs already resolved in a local file during consecutive runs. It has a feature that will cache the results in a file, but only during the same run… Now this sucks…

  • DNS lookups will considerably slow down awstats because it will have to resolve each IP from the logs to the hostnames, and this operation is ’expensive'

So if you are using DNS lookups only to get country stats, then stop doing that… Start using GeoIP to get much faster results, and much more accurate reports. To enable the awstats GeoIP plugin you need the perl port of the Maxmind GeoIP library installed first, and you can use the free country database (that in my opinion provides great results). The installation of the perl GeoIP module is peaty simple (just be sure to install the GeoIP C library first), and in case you want to see some short details on the installation you can check this small post: “Install Geo::IP Perl Module on Debian”. There is also the possibility to use the smaller perl module GeoIP::Free but from my experience the GeoIP module gives much more accurate results.

Once you have the GeoIP perl module installed all you have to do is:

  • have DNS lookups off:

    DNSLookup=0
    
  • enable the GeoIP plugin:

    #LoadPlugin="geoip GEOIP_STANDARD /pathto/GeoIP.dat"
    LoadPlugin="geoip GEOIP_STANDARD /usr/share/GeoIP/GeoIP.dat"
    

Now in order to demonstrate this statement, I have done some tests on a large apache log file ~1G (gathered on a live server, with real data). The same identical file was used for each one of the tests.
Hardware: CPU: 2 x Xeon CPU 3.20GHz, RAM: 4G RAM, HDD: SCSI
Log file: 1,8G size, 7169736 records

Here are the results:

  • Test 1: no GeoIP and no DNS lookups:

    • time to complete 15m56s
    • 7499 records / sec
  • Test 2: no GeoIP and DNS lookups:

    • time to complete 66m58s
    • 1784 records / sec
  • Test 3: GeoIP and no DNS lookups:

    • time to complete 17m32s
    • 6815 records / sec

Conclusion: as we can see Test 1 is the fastest (obviously as no DNS or GeoIP lookups are done). This is the base on how fast awstats can run on this box. As we can see Test 2 (with DNS lookups) takes approximately 4 times more to complete (320% decrease in speed), while Test3 (with GeoIP lookups) shows a decrease in performance of 10% (approx. the same as stated in the awstats docs, where it states 8%). Given the advantages GeoIP has over DNS lookups I think that this shows clearly the advantage of using GeoIP.

For official awstats benchmarks, you can check this page: http://awstats.sourceforge.net/docs/awstats_benchmark.html

comments powered by Disqus