Better HTTP logging for RubyForge

07 Aug 2006

RubyForge hosts lots of projects - over 1800. Each project has its own virtual host, like ruby-doom.rubyforge.org. Until recently, this also meant that each project had its own Apache log file. This made generating whizzy Webalizer charts easy. But it also meant that when Apache started up it opened 1800+ log files. This was bad because it chewed up file descriptors, and although setting fs.file-max can make more room for that, it seemed ugly. Also, having 1800 open log files seemed a bit clunky since projects don't get much traffic on their virtual hosts.

But anyhow, all that's changed now. I added a %h to the front of the LogFormat directive and all the logs now go to one file, which grows to around 100 MB each day. At midnight a cron job uses the Apache splitlog utility to split out all the entries into individual files. It then archives the master file (which, by the way, compresses down to about 4 MB) and restarts Apache with an empty log. Finally, Webalizer runs on the separate log files to make the charts.

I also tweaked the mod_log_config code (src/modules/standard/mod_log_config.c) to batch up the log writes - so BUFFERED_LOGS is enabled and LOG_BUFSIZE is set to 16384 vs its default value of 512. I'm not sure how much (if any) this helps, but I would think it would help a little to avoid cutting a new entry to disk for every hit.

Props to the folks at Slacksite for doing a nice writeup of how to do this. I'm using their logcron script with some modifications - for example, using bzip2 vs gzip for compression.