Cute Tricks With Perl and Apache (test document for filtering)

Author: Lincoln Stein Date: 7/3/98


PART I: WEB SITE CARE AND FEEDING

These scripts are designed to make your life as a Webmaster easier, leaving you time for more exciting things, like tango lessons.


Logs! Logs! Logs!

Left to their own devices, the log files will grow without limit, eventually filling up your server's partition and bringing things to a grinding halt. But wait! Don't turn off logging or throw them away. Log files are your friends.


Log rotation

Script I.1.1 shows the basic script for rotating log files. It renames the current ``access_log'' to ``access_log.0'', ``access_log.0'' to ``access_log.1'', and so on. The oldest log gets deleted. Run it from a cron job to keep your log files from taking over. The faster your log files grow, the more frequently you should run the script.

---------------- Script I.1.1: Basic Log File Rotation ----------

 #!/usr/local/bin/perl
 $LOGPATH='/usr/local/apache/logs';
 @LOGNAMES=('access_log','error_log','referer_log','agent_log');
 $PIDFILE = 'httpd.pid';
 $MAXCYCLE = 4;

 chdir $LOGPATH;  # Change to the log directory
 foreach $filename (@LOGNAMES) {
    for (my $s=$MAXCYCLE; $s >= 0; $s-- ) {
        $oldname = $s ? "$filename.$s" : $filename;
        $newname = join(".",$filename,$s+1);
        rename $oldname,$newname if -e $oldname;
    }
 }
 kill 'HUP',`cat $PIDFILE`;

-----------------------------------------------------------------


Log rotation and archiving

But some people don't want to delete the old logs. Wow, maybe some day you could sell them for a lot of money to a marketing and merchandising company! Script I.1.2 appends the oldest to a gzip archive. Log files compress extremely well and make great bedtime reading.

---------- Script I.1.2: Log File Rotation and Archiving ---------

 #!/usr/local/bin/perl
 $LOGPATH    = '/usr/local/apache/logs';
 $PIDFILE    = 'httpd.pid';
 $MAXCYCLE   = 4;
 $GZIP       = '/bin/gzip';

 @LOGNAMES=('access_log','error_log','referer_log','agent_log');
 %ARCHIVE=('access_log'=>1,'error_log'=>1);

 chdir $LOGPATH;  # Change to the log directory
 foreach $filename (@LOGNAMES) {
   system "$GZIP -c $filename.$MAXCYCLE >> $filename.gz" 
        if -e "$filename.$MAXCYCLE" and $ARCHIVE{$filename};
    for (my $s=$MAXCYCLE; $s >= 0; $s-- ) {
        $oldname = $s ? "$filename.$s" : $filename;
        $newname = join(".",$filename,$s+1);
        rename $oldname,$newname if -e $oldname;
    }
 }
 kill 'HUP',`cat $PIDFILE`;

-----------------------------------------------------------------


Log rotation, compression and archiving

What's that? Someone broke into your computer, stole your log files and now he's selling it to a Web marketing and merchandising company? Shame on them. And on you for letting it happen. Script I.1.3 uses idea (part of the SSLEay package) to encrypt the log before compressing it. You need GNU tar to run this one. The log files are individually compressed and encrypted, and stamped with the current date.

---------- Script I.1.3: Log File Rotation and Encryption ---------

 #!/usr/local/bin/perl
 use POSIX 'strftime';
 
 $LOGPATH     = '/home/www/logs';
 $PIDFILE     = 'httpd.pid';
 $MAXCYCLE    = 4;
 $IDEA        = '/usr/local/ssl/bin/idea';
 $GZIP        = '/bin/gzip';
 $TAR         = '/bin/tar';
 $PASSWDFILE  = '/home/www/logs/secret.passwd';
 
 @LOGNAMES=('access_log','error_log','referer_log','agent_log');
 %ARCHIVE=('access_log'=>1,'error_log'=>1);
 
 chdir $LOGPATH;  # Change to the log directory
 foreach $filename (@LOGNAMES) {
     my $oldest = "$filename.$MAXCYCLE";
     archive($oldest) if -e $oldest and $ARCHIVE{$filename};
     for (my $s=$MAXCYCLE; $s >= 0; $s-- ) {
         $oldname = $s ? "$filename.$s" : $filename;
         $newname = join(".",$filename,$s+1);
         rename $oldname,$newname if -e $oldname;
     }
 }
 kill 'HUP',`cat $PIDFILE`;
 
 sub archive {
     my $f = shift;
     my $base = $f;
     $base =~ s/\.\d+$//;
     my $fn = strftime("$base.%Y-%m-%d_%H:%M.gz.idea",localtime);
     system "$GZIP -9 -c $f | $IDEA -kfile $PASSWDFILE > $fn";
     system "$TAR rvf $base.tar --remove-files $fn";
 }

-----------------------------------------------------------------


Log Parsing

There's a lot you can learn from log files. Script I.1.4 does the basic access log regular expression match. What you do with the split-out fields is limited by your imagination. Here's a typical log entry so that you can follow along:

portio.cshl.org - - [03/Feb/1998:17:42:15 -0500] ``GET /pictures/small_logo.gif HTTP/1.0'' 200 2172