RubyForge hosts lots of projects - over 1800. Each project has its own virtual host, like ruby-doom.rubyforge.org. Until recently, this also meant that each project had its own Apache log file. This made generating whizzy Webalizer charts easy. But it also meant that when Apache started up it opened 1800+ log files. This was bad because it chewed up file descriptors, and although setting fs.file-max
can make more room for that, it seemed ugly. Also, having 1800 open log files seemed a bit clunky since projects don't get much traffic on their virtual hosts.
But anyhow, all that's changed now. I added a %h
to the front of the LogFormat
directive and all the logs now go to one file, which grows to around 100 MB each day. At midnight a cron job uses the Apache splitlog
utility to split out all the entries into individual files. It then archives the master file (which, by the way, compresses down to about 4 MB) and restarts Apache with an empty log. Finally, Webalizer runs on the separate log files to make the charts.
I also tweaked the mod_log_config
code (src/modules/standard/mod_log_config.c
) to batch up the log writes - so BUFFERED_LOGS
is enabled and LOG_BUFSIZE
is set to 16384 vs its default value of 512. I'm not sure how much (if any) this helps, but I would think it would help a little to avoid cutting a new entry to disk for every hit.
Props to the folks at Slacksite for doing a nice writeup of how to do this. I'm using their logcron
script with some modifications - for example, using bzip2 vs gzip for compression.