A strange thing happened when importing a large number of logfiles into AWStats. Apparently only three out of twelve months were being imported correctly. Even over several years, only the same three months would be imported, the rest discarded.
I pipe the Apache logfile into Cronolog, to make sure the logs get put into the following folder structure: YYYY/MM/DD/access.log This makes it easy to find something in the log files. Each night, the giant access.log file gets split up into several files, one for each vhost. This can easily be accomplished by appending the vhost to each log line. A script then regexes for the vhost, puts it in the relevant log folder for the specific vhost and gzips the logfile. This way I only need to have two running cronolog processes during the day, instead one for each of the vhosts.
So, when migrating a lot of sites I had to reprocess the AWStats as well, seeing as I didn't copy the AWStats databases. This is quite easy, as you can specify in the .conf file which files it needs to parse. I put in the following :
LogFile="find /path/to/site.com/logs/ | grep access.log.gz | xargs zcat |"
This will find all the files in the log folder, filter out the access logs, and zcat them for input into AWStats. This is where it went wrong, and most of the months went missing. The cause was actually really simple. AWStats expects logfiles to be fed in chronological order. Running find on itself clearly showed that the result was not in chronological order (indicated by the folder names).
This is apparently a difference in the find implementation. The old server, running CentOS, sorted the find output perfectly, so I never encountered this problem before. The new server, running a different Linux flavor mixes up the find result.
The solution was obvious :
LogFile="find /path/to/site.com/logs/ | grep access.log.gz | sort | xargs zcat |"