At the end of this chapter we discuss the Apache::File class, which provides advanced functionality for HTTP/1.1 requests, and a discussion of the various ``magic'' globals, subroutines and literals that mod_perl recognizes.
my $query = $r->args; my %in = $r->args;
One trap to be wary of: if the same argument name is present several times (as can happen with a multi-selection list in a fill-out form), assignment of args() to a hash will discard all but the last argument. To avoid this, you'll need to use the more complex argument processing scheme described in the next chapter.
my $c = $r->connection;
POST, which generally occurs when the remote client is submitting the contents
of a fill-out form, the $r->content method returns the submitted
information, but only if the request content type is of type application/x-www-form-urlencoded. When called in a scalar context, the entire string is returned. When
called in a list context, a list of parsed name=value pairs are returned.
To handle other types of PUT or POSTed content, you'll need to use a module such CGI.pm or Apache::Request or use the
read() method and parse the data yourself. Ways of doing this as well as a module
that simplifies the task, are described in the next chapter.
NOTE: you can only call content() once. If you call the method more than once, it will return undef (or an empty list) after the first try.
Examples:
my $fname = $r->filename;
unless (open(FH, $fname)) {
die "can't open $fname $!";
}
my $fname = do_translation($r->uri); $r->filename($fname);
When finfo() is called, it points the cached stat information into Perl's special
filehandle _ which Perl uses to cache its own stat operations. You can then perform file
test operations directly on this filehandle rather than on the file itself,
which would incur the penalty of another stat() system call. For convenience,
finfo() returns a reference to the _ filehandle, so file tests can be done directly on the return value of finfo().
The following three examples all result with the same value for
$size. However the first two avoid the overhead of the implicit
stat() performed by the last.
my $size = -s $r->finfo;
$r->finfo; my $size = -s _;
my $size = -s $r->filename;
It is possible for a module to be called upon to process a URL that does not correspond to a physical file. In this case, the stat() structure will contain the result of testing for a nonexistent file, and Perl's various file test operations will all return false.
The Apache::Util package contains a number of routines that are useful for manipulating the contents of the stat structure. For example, the ht_time() routine turns Unix timestamps into HTTP-compatible human readable strings. See the Apache::Util manpage and The Apache::Util Class section later in this chapter for more details.
Example:
use Apache::Util qw(ht_time);
if(-d $r->finfo) {
printf "%s is a directory\n", $r->filename;
}
else {
printf "Last Modified: %s\n", ht_time((stat _)[9]);
}
POST and
PUT requests. This protocol exactly mirrors the C language API described in
Chapter 10 and provides for timeouts and other niceties. Although the Perl
API supports them, Perl programmers should generally use the simpler read() method instead.
This method takes an optional argument. The type of lookup performed by this method is affected by this argument as well as the value of the HostNameLookups directive. Possible arguments to this method, whose symbolic names can be imported from the Apache::Constants module using the :remotehost import tag, are one of:
In recent versions of Apache, double-reverse name lookups are always performed for the name-based access checking implemented by mod_access.
my $remote_host = $r->get_remote_host;
# same as above use Apache::Constants qw(:remotehost); my $remote_host = $r->get_remote_host(REMOTE_NAME);
# double-reverse DNS lookup use Apache::Constants qw(:remotehost); my $remote_host = $r->get_remote_host(REMOTE_DOUBLE_REV) || "nohost";
The success of the call also depends on the status of the IdentityCheck configuration directive. Since identity checks can adversely impact Apache's performance, this directive is off by default.
Example:
my $remote_logname = $r->get_remote_logname;
Examples:
my %headers_in = $r->headers_in; my $headers_in = $r->headers_in;
Once you have copied the headers to a hash, you can refer to them by name. See Table 9.1 for a list of incoming headers that you may need to use. For example, you can view the length of the data that the client is sending by retrieving the key ``Content-length'':
%headers_in = $r->headers_in;
my $cl = $headers_in{'Content-length'};
You'll need to be aware that browsers are not required to be consistent in their capitalization of header field names. For example, some may refer to ``Content-Type'' and others to ``Content-type''. The Perl API copies the field names into the hash as is, and like any other Perl hash, the keys are case-sensitive. This is a potential trap.
For these reasons it's better to call headers_in() in a scalar context and use the returned tied hash. Since Apache::Table sits on top of the C table API, lookup comparisons are performed in a case-insensitive manner. The tied interface also allows you to add or change the value of a header field, in case you want to modify the request headers seen by handlers downstream. This code fragment shows the tied hash being used to get and set fields:
my $headers_in = $r->headers_in;
my $ct = $headers_in->{'Content-Length'};
$headers_in->{'User-Agent'} = 'Block this robot';
It is often convenient to refer to header fields without creating an intermediate hash or assigning a variable to the Apache::Table reference. This is the usual idiom:
my $cl = $r->headers_in->{'Content-Length'};
Certain request header fields such as ``Accept,'' ``Cookie'' and several other request fields are multivalued. When you retrieve their values, they will be packed together into one long string separated by commas. You will need to parse the individual values out yourself. Individual values can include parameters which will be separated by semicolons. Cookies are common examples of this:
Set-Cookie: SESSION=1A91933A; domain=acme.com; expires=Wed, 21-Oct-1998 20:46:07 GMT
A few clients send headers with the same key on multiple lines. In this case you can use the Apache::Table::get() method to retrieve all of the values at once.
For full details on the various incoming headers, see the documents at http://www.w3.org/Protocols. Non-standard headers, such as those that exxperimental browsers transmit, can also be retrieved with this method call.
Field Description
Accept MIME types that client accepts Accept-encoding Compression methods that client accepts Accept-language Language(s) that client accepts Authorization Used by various authorization/authentication schemes Connection Connection options, such as I<Keep-alive> Content-length Length, in bytes, of data to follow Content-type MIME type of data to follow Cookie Client-side Data From E-mail address of the requesting user (deprecated) Host Virtual host to retrieve data from If-modified-since Return document only if modified since specified If-none-match Return document if it has changed Referer URL of document that linked to the requested one User-agent Name and version of the client software
undef, the header will be removed from the list of header fields:
my $cl = $r->header_in('Content-length');
$r->header_in($key, $val); #set the value of header '$key'
$r->header_in('Content-length' => undef); #remove the header
The key lookup is done in a case insensitive manner. The header_in() method predates the Apache::Table class, but remains for backwards compatibility and as a bit of a shortcut to using the headers_in method.
HEAD request it wants to receive the HTTP response headers only. Content
handlers should check for this by calling header_only() before generating the document body. The method will return true in the
case of a HEAD request, and false in the case of other requests. Alternatively, you could
examine the string value returned by method() directly, although this would be less portable if the HTTP protocol were
some day expanded to support more than one header-only request method.
Example:
# generate the header & send it $r->send_http_header; return OK if $r->header_only;
# now generate the document...
Do not try to check numeric value returned by method_number() to identify a header request. Internally, Apache uses the M_GET
number for both HEAD and GET methods.
GET, HEAD or POST. Passing an argument will change the method, which is occasionally useful
for internal redirects (Chapter 4) and for testing authorization
restriction masks (Chapter 6).
Examples:
my $method = $r->method;
$r->method('GET');
If you update the method, you probably want to update the method_number accordingly as well.
M_GET, M_POST,
M_PUT and M_DELETE. Passing an argument will set this value, mainly of use for internal
redirects and for testing authorization restriction masks. If you update
the method number, you probably want to update the method() accordingly as well.
Note that there isn't an M_HEAD constant. This is because Apache sets the method number to M_GET when it receives a HEAD request and sets header_only() to return true.
Example:
use Apache::Constants qw(:methods);
if ($r->method_number == M_POST) {
# change the request method
$r->method_number(M_GET);
$r->method("GET");
$r->internal_redirect('/new/place');
}
There is no particular advantage of using method_number() over method() for Perl programmers, other than being very slightly more efficient.
uri_components structure. The parsed_uri()
method will return an object blessed into the Apache::URI class, which provides methods for fetching and setting various parts of the
URI. See The Apache::URI Class for details.
Example:
use Apache::URI (); my $uri = $r->parsed_uri; my $host = $uri->hostname;
If you provide an argument to path_info(), you can change the value of the additional path information.
Examples:
my $path_info = $r->path_info;
$r->path_info("/some/additional/information");
Note that in most cases, changing the path_info() requires you to sync the uri() with the update. In this example, we calculate the original uri minus any path info, change the existing path info, then properly update the uri:
my $path_info = $r->path_info; my $uri = $r->uri; my $orig_uri = substr $uri, 0, length($uri) - length($path_info); $r->path_info($new_path_info); $r->uri($orig_uri . $r->path_info);
my $protocol = $r->protocol;
This method is read-only.
Example:
sub handler {
my $r = shift;
return DECLINED unless $r->proxyreq;
# do something interesting...
}
POST and PUT
requests. It should be used when the information submitted by the browser
is not in the application/x-www-form-urlencoded format that the content() method knows how to handle.
Call read() with a scalar variable to retrieve the read data, and the length of the data to read. Generally you will want to ask for the entire data sent by the client, which can be recovered from the incoming Content-length field:*
my $buff;
$r->read($buff, $r->header_in('Content-length'));
Internally, Perl sets up a timeout in case the client breaks the connection prematurely. The exact value of the timeout is set by the Timeout directive in the server configuration file. If a timeout does occur, the script will be aborted.
Within a handler you may also recover client data by simply reading from
STDIN using Perl's read(), getc() and readline (<>) functions. This works because the Perl API ties STDIN to
Apache::read() before entering handlers.
*At the time of this writing, HTTP/1.1 requests which do not
have a
Content-Length header, such as one that uses chunked encoding, are not properly handled by
this API.
Example:
my $s = $r->server;
This method is read-only.
Example:
my $request_line = $r->the_request; print LOGFILE $request_line;
Note that the_request() is functionally equivalent to this code fragment:
my $request_line = join ' ', $r->method, $r->uri, $r->protocol;
Examples:
my $uri = $r->uri;
$r->uri("/something/else");
Most of the methods in this section are concerned with setting the values of the outgoing HTTP response header fields. We give a list of all of the fields you are likelyt o use in Table 9.2. For a comprehensive list, see the HTTP/1.0 and HTTP/1.1 specifications found at http://www.w3.org/Protocols.
Field Description
Allowed The methods allowed by this URI, such as POST Content-Encoding The compression method of this data Content-Language The language in which this document is written Content-Length Length, in bytes, of data to follow Content-Type MIME type of this data Date The current date (GMT) Expires Date the document expires Last-Modified Date the document was last modified Link The URL of this document's "parent," if any Location The location of the document in redirection responses ETag Opaque ID for this version of the document Message-Id The ID of this document, if any MIME-Version The version of MIME used (currently 1.0) Pragma Hints to the browser, such as "no-cache" Public The requests that this URL responds to (rarely used) Server Name and version of the server software Set-Cookie Give the browser a client-side cookie WWW-Authenticate Used in the various authorization schemes Vary Criteria that can be used to select this document
Example:
my $bytes_sent = $r->bytes_sent;
Table 9.3 lists the headers that trigger special actions by cgi_header_out().
Header | Actions
-----------------------------------------------------------------------------
Content-Type | Set $r->content_type to the given value
Status | Set $r->status to the integer value in the string
| Set $r->status_line to the given value
Location | Set Location in the headers_out table to the given value
| and perform an internal redirect if URI is relative
Content-Length | Set Content-Length in the headers_out table to the
| given value
Transfer-Encoding | Set Transfer-Encoding in the headers_out table to
| the given value
Last-Modified | Parse the string date, feeding the time value to
| ap_update_mtime() and invoke ap_set_last_modified()
Set-Cookie | Call ap_table_add() to support multiple Set-Cookie headers
Other | Call ap_table_merge() with given key and value
You generally can use the Apache::Table or header_out() methods to achieve the results you want. cgi_header_out() is provided for those who wish to create a CGI emulation layer, such as Apache::Registry. Those who are designing such a system should also look at send_cgi_header(), described below in Sending Data to the Client.
Getting or setting content_encoding() is equivalent to using headers_out() or header_out() to change the value of the ``Content-encoding'' header. Chapters 4 and 7 give examples of querying and manipulating the content encoding field.
Examples:
my $enc = $r->content_encoding;
if($r->filename =~ /\.gz$/) {
$r->content_encoding("gzip");
}
content_languages() is a convenient interface to the lower-level header_out and headers_out methods.
Examples:
my $languages = $r->content_languages; $r->content_languages(['en']);
Examples:
my $ct = $r->content_type;
$r->content_type('text/plain');
OK, DECLINED or
DONE, Apache aborts processing and throws an error. When an error is thrown,
application programs can catch it and replace Apache's default processing
with their own custom error handling routines by using the ErrorDocument
configuration directive. The arguments to ErrorDocument are the status code to catch and a custom string, static document, or CGI
script to invoke when the error occurs.
The module-level interface to Apache's error handling system is custom_response(). Like the directive, the method call takes two arguments.* The first argument is a valid response code from Table 3.1. The second is either a string to return in response to the error, or a URI to invoke to handle the request. This URI can be a static document, a CGI script, or even a content handler in an Apache module. Chapters 4 and 6 have more extensive coverage of the error handling system.
Examples:
use Apache::Constants qw(:common); $r->custom_response(AUTH_REQUIRED, "sorry, I don't know you."); $r->custom_response(SERVER_ERROR, "/perl/server_error_handler.pl");
Unlike ordinary header fields, error fields are sent to the browser even when the module aborts or returns a error status code. This allows modules to do such things as setting cookies when errors occur, or implementing custom authorization schemes. Error fields also persist across internal redirects when one content handler passes the buck to another. This feature is necessary to support the ErrorDocument mechanism.
Examples:
my %err_headers_out = $r->err_headers_out;
my $err_headers_out = $r->err_headers_out;
$r->err_headers_out->{'X-Odor'} = "Something's rotten in Denmark";
Example:
my $loc = $r->err_header_out('Location');
$r->err_header_out(Location => 'http://www.modperl.com/');
$r->err_header_out(Location => undef);
When called in a scalar context, this method returns a hash reference tied to the Apache::Table class. This class provides an interface to the underlying headers_out data structure. Fetching a key from the tied hash will retrieve the corresponding HTTP field in a case insensitive fashion, and assigning to the hash will change the value of the header so that it is seen by other handlers further down the line, and ultimately affects the header that is sent to the browser.
The headers that are set with headers_out() are cleared when an error occurs, and do not persist across internal redirects (in which a module hands off its content-handling responsibility to a different URI). To create headers that persist across errors and internal redirects, use err_headers_out(), described below.
Examples:
my %headers_out = $r->headers_out;
my $headers_out = $r->headers_out;
$headers_out->{Cookie} = 'SESSION_ID=3918823';
The ``Content-type'', ``Content-encoding'' and ``Content-language'' response fields have special meaning to the Apache server and its modules. These fields occupy their own slots of the request record itself and should always be accessed using their dedicated methods rather than the generic headers_out() method. If you forget, and use headers_out() instead, Apache and other modules may not recognize your changes, leading to confusing results. In addition, the ``Pragma: no-cache'' idiom, used to tell browsers not to cache the document, should be set indirectly using the no_cache() method.
The many features of the Apache::Table class are described in more detail in its own section.
If passed a single argument, header_out() returns the value of the corresponding field from the outgoing HTTP response header. If passed a key/value pair, header_out() stably changes the value of the corresponding header field. A field can be removed entirely by passing undef as its value. The key lookups are done in a case insensitive manner.
Examples:
my $loc = $r->header_out('Location');
$r->header_out(Location => 'http://www.modperl.com/');
$r->header_out(Location => undef);
Chapter 7 gives examples of how to use handler() to create a handler that dispatches to other modules based on the document's type.
Example:
my $handler = $r->handler;
if($handler eq "cgi-script") {
warn "shame on you. Fixing.\n"
$r->handler('perl-script');
}
handler() cannot be used to set handlers for anything but the response phase. Use set_handlers() or push_handlers() to change the handlers for other phases (see mod_perl Specific Methods).
Examples:
$current_flag = $r->no_cache(); $r->no_cache(1); # set no-cache to true
Unlike most of the other methods, this one is read only.
Example:
my $date = scalar localtime $r->request_time; warn "request started at $date";
*In case you were wondering, the epoch began at 00:00:00 GMT
on January 1, 1970, and is due to end in 2038. There's probably a good
explanation for this choice.
OK status code, but actually send the browser a non-OK status.
Call the method with no arguments to retrieve the current status code. Call it with a numeric value to set the status. Constants for all the standard status codes can be found in Apache::Constants.
Examples:
use Apache::Constants qw(:common);
my $rc = $r->status; $r->status(SERVER_ERROR);
Example:
my $status_line = $r->status_line;
$r->status_line("200 Bottles of Beer on the Wall");
If you update the status line, you probably want to update status() accordingly as well.
The print() method is similar to Perl's built-in print() function except that all the data you print eventually winds up being displayed on the user's browser. Like the built-in print() this method will accept a variable number of strings to print out. However, the Apache print() method does not accept a filehandle argument for obvious reasons.
Like the read() method, print() sets a timeout so that if the client connection is broken the handler won't hang around indefinitely trying to send data. If a timeout does occur, the script will be aborted.
The method also checks the Perl autoflush global
$|. If the variable is non-zero, print() will flush the buffer after every command, rather than after every line.
This is consistent with the way the built-in print() works.
Example:
$r->print("hello" , " ", "world!");
An interesting feature of the Apache Perl API is that the STDOUT filehandle is tied to Apache so that if you use the built-in print() to print to standard output, the data will be redirected to the request object's print() method. This allows CGI scripts to run unmodified under Apache::Registry, and also allows one content handler's output to be transparently ``chained'' to another handler's input. The TieHandle Interface section later in this chapter goes into more detail on how filehandles can be tied to the Perl API, and Chapter 4 has more to say about chained handlers.
Example:
print "hello world!"; # automatically invokes Apache::print()
There is also an optimization built into print(). If any of the arguments to the method are scalar references to strings, they are automatically dereferenced for you. This avoids needless copying of large strings when passing them to subroutines.
Example:
$a_large_string = join '', <GETTYSBURG_ADDRESS>; $r->print(\$a_large_string);
Example:
$r->printf("Hello %s", $r->connection->user);
Don't call rflush() if you don't need to, as it causes a performance hit.* This method is also
called automatically after each
print() if the Perl global variable $| is set to non-zero.
Example:
$r->rflush;
*If you are wondering why this method has an r prefix, it is carried over from the C API I/O methods (described in Chapter
10), all of which have an ap_r prefix. This is the only I/O method from the group for which there is a
direct Perl interface. If you find that the r
prefix is not pleasing to the eye, this is no accident. It is indended to
discourage the use of rflush() due to the perfomance implications.
Don't forget to put a blank line at the end of the headers, just as a CGI script would:
$r->send_cgi_header(<<EOF); Status: 200 Just Fine Content-type: text/html Set-Cookie: open=sesame
EOF
You're welcome to use this method even if you aren't emulating the CGI environment, since it provides a convenient one-shot way to set and send the entire HTTP header, however, there is a performance hit associated with parsing the header string.
As an aside, this method is used to implement the behavior of the PerlSendHeader directive. When this directive is set to ``On'', mod_perl scans the first lines of text printed by the content handler until it finds a blank line. Everything above the blank line is then sent to send_cgi_header().
This method is generally used by content handlers that wish to send the browser the unmodified contents of a file.
Example:
my $fh = Apache::gensym(); # generate a new filehandle name open($fh, $r->filename) || return NOT_FOUND; $r->send_fd($fh); close($fh);
Because setting the document's MIME type is such a common operation, the Perl version of this API call allows you to save a few keystrokes by specifying the content type as an optional argument to send_http_header(). This is exactly equivalent to calling content_type() followed by send_http_header().
Examples:
$r->send_http_header;
$r->send_http_header('text/plain');
A content type passed to send_http_header() will override any previous calls to content_type().
Example:
$r->chdir_file($r->filename);
Example:
$r->child_terminate;
The hard_timeout() method initiates a ``hard'' timeout. If the client read or write operation takes longer than the time specified by Apache's Timeout directive, then the current handler will be aborted immediately and Apache will immediately enter the logging phase. hard_timeout() takes a single string argument which should contain the name of your module or some other identification. This identification will be incorporated into the error message that is written to the server error log when the timeout occurs.
soft_timeout(), in contrast, does not immediately abort the current handler. Instead, when a timeout occurs control returns to the handler, but all reads and write operations are replaced with no-ops so that no further data can be sent or received to the client. In addition, the Apache::Connection object's aborted() method will return true. Like hard_timeout() you should pass this method the name of your module in order to be able to identify the source of the timeout in the error log.
The reset_timeout() method can be called to set a previously initiated timer back to zero. It is usually used between a series of read or write operations in order to avoid killing the timeout and restarting it completely.
Finally, the kill_timeout() method is called to cancel a previously initiated timeout. It is generally called when a series of I/O operations are completely done.
The examples below will give you the general idea of how these four methods are used. Remember, however, that in the Perl API these methods are not really necessary because they are called internally by the read() and print() methods.
# typical hard_timeout() usage
$r->hard_timeout("Apache::Example while reading data");
while (... read data loop ...) {
...
$r->reset_timeout;
}
$r->kill_timeout;
# typical soft_timeout() usage
$r->soft_timeout("Apache::Example while reading data");
while (... read data loop ...) {
...
$r->reset_timeout;
}
$r->kill_timeout;
The required argument is an absolute URI path on the current server. The server will process the URI as if it were a whole new request, running the URI translation, MIME type checking, and other phases before invoking the appropriate content handler for the new URI. The content handler that eventually runs is not necessarily the same as the one that invoked internal_redirect(). This method should only be called within a content handler.
Do not use internal_redirect() to redirect to a different server. You'll need to do a full redirect for that. Both redirection techniques are described in more detail in the next chapter.
Example:
$r->internal_redirect("/new/place");
Apache implements its ErrorDocument feature as an internal redirect, so many of the techniques that apply to internal redirects also apply to custom error handling.
Example:
$r->internal_redirect_handler("/new/place");
With the exception of the logging phase, which is run just once for the primary request, secondary requests are run through each of the transaction processing phases, and the appropriate handlers are called each time. There may be times when you don't want a particular handler running on a subrequest or internal redirect, either to avoid performance overhead or to avoid infinite recursion. The is_initial_req() method will return a true value if the current request is the primary one, and false if the request is the result of a subrequest or an internal redirect.
Example:
return DECLINED unless $r->is_initial_req;
is_main() is commonly used to prevent infinite recursion when a handler gets reinvoked after it has made a subrequest.
return DECLINED unless $r->is_main;
Like is_initial_req() this is a read-only method.
main() will return the request object of the parent request, the top of the chain. last() will return the last request in the chain. prev() and next() will return the previous and next requests in the chain, respectively. Each of these methods will return a reference to an object belonging to the Apache class, or undef if the request doesn't exist.
The prev() method is handy inside an ErrorDocument handler to get at the information from the request that triggered the error. For example, this code fragment will find the URI of the failed request:
my $failed_uri = $r->prev->uri;
The last() method is mainly used by logging modules. Since Apache may have performed several subrequests while attempting to resolve the request, the last object will always point to the final result.
Example:
my $bytes_sent = $r->last->bytes_sent;
Should your module wish log all internal requests, the next() method will come in handy. Example:
sub My::logger {
my $r = shift;
my $first = $r->uri;
my $last = $r->last->uri;
warn "first: $first, last: $last\n";
for (my $rr = $r; $rr; $rr = $rr->next) {
my $uri = $rr->uri;
my $status = $rr->status;
warn "request: $uri, status: $status\n";
}
return OK; }
Assuming the requested URI was /, which was mapped to /index.html by the DirectoryIndex configuration, the example above would output these messages to the ErrorLog:
first: /, last: /index.html request: /, status: 200 request: /index.html, status: 200
The next() and main() methods are rarely used, but are included for completeness. Handlers that
need to determine whether they are in the main request should call $r->is_main() rather than
!$r->main(), as the former is marginally more efficient.
For example, given this <Location> section:
<Location /images/dynamic_icons>
SetHandler perl-script
PerlHandler Apache::Icon
</Location>
then location() will return /images/dynamic_icons.
This method is handy for converting the current document's URI into a relative path. Example:
my $base = $r->location; (my $relative = $r->uri) =~ s/^$base//;
Both methods take a single argument corresponding to an absolute filename or a URI path respectively. lookup_uri() performs the URI translation on the provided URI, passing the request to the access control and authorization handlers, if any, and then proceeds to the MIME type checking phase. lookup_file() behaves similarly, but bypasses the initial URI translation phase and treats its argument as a physical file path.
Both methods return an Apache::SubRequest object, which is identical for all intents and purposes to a plain old Apache request object, as it inherits all methods from the Apache class. You can call the returned object's content_type(), filename() and other methods to retrieve the information left there during subrequest processing.
The subrequest mechanism is extremely useful, and there are many practical examples of using it in Chapters 4, 5 and 6. The following code snippets show how to use subrequests to look up the content type of a file and a URI:
my $subr = $r->lookup_file('/home/http/htdocs/images/logo.tif');
my $ct = $subr->content_type;
my $ct = $r->lookup_uri('/images/logo.tif')->content_type;
In the lookup_uri() example, /images/logo.tif will be passed through the same series of Alias, ServerRoot and URI rewriting translations that the URI would be subjected to if it were requested by a browser.
If you need to pass certain HTTP header fields to the subrequest, such as a particular value of Accept, you can do so by calling headers_in() before invoking lookup_uri() or lookup_file()
It is often a good idea to check the status of a subrequest in case something went wrong. If the subrequest was successful, the status value will be that of HTTP_OK. Example:
use Apache::Constants qw(:common HTTP_OK);
my $subr = $r->lookup_uri("/path/file.html");
my $status = $subr->status;
unless ($status == HTTP_OK) {
die "subrequest failed with status: $status";
}
When called with two arguments this method sets a note. When called with a single argument, it retrieves the value of that note. Both the keys and the values must be simple strings.
Examples:
$r->notes('CALENDAR' => 'Julian');
my $cal = $r->notes('CALENDAR');
When called in a scalar context with no arguments, a hash reference tied to the Apache::Table class will be returned. Example:
my $notes = $r->notes;
my $cal = $notes->{CALENDAR};
This method comes in handy for communication between a module written in Perl and one written in C. For example, the logging API saves error messages under a key named ``error-notes'', which could be used by ErrorDocuments to provide a more informative error message.
The LogFormat directive, part of the standard mod_log_config
module, can incorporate notes into log messages using the formatting
character %n. See the Apache documentation for details.
my $env = $r->subprocess_env;
my $docroot = $env->{'DOCUMENT_ROOT'};
Call the method with a single argument to retrieve the current value of the corresponding entry in the environment table, or undef if no entry by that name exists:
my $doc_root = $r->subprocess_env("DOCUMENT_ROOT");
You may also call the method with a key/value pair to set the value of an entry in the table:
$r->subprocess_env(DOOR => "open");
Finally, if you call subprocess_env() in a void context with no arguments, it will reinitialize the table to contain the standard variables that Apache adds to the environment before invoking CGI scripts and server-side include files:
$r->subprocess_env;
Changes made to the environment table only persist for the length of the request. The table is cleared out and reinitialized at the beginning of every new transaction.
In the Perl API, the primary use for this method is to set environment
variables for other modules to see and use. For example, a fixup handler
could use this call to set up environment variables that are later
recognized by mod_include and incorporated into server-side include pages. You do not ordinarily need
to call subprocess_env()
to read environment variables, because mod_perl automatically copies the
environment table into the Perl %ENV array before entering the response handler phase.
A potential confusion arises when a Perl API handler needs to launch a
subprocess itself using system(), backticks, or a piped open. If you need to pass environment variables to
the subprocess, set the appropriate keys in %ENV just as you would in an ordinary Perl script.
subprocess_env() is only required if you need to change the environment in a subprocess
launched by a different handler or module.
The method expects a code reference argument:
sub callback {
my $r = shift;
my $uri = $r->uri;
warn "process $$ all done with $uri\n";
}
$r->register_cleanup(\&callback);
The PerlSetVar directive can occur in the main part of a configuration file, in a <VirtualHost>, <Directory>, <Location> or <Files> section, or in a .htaccess file. It takes a key/value pair separated by whitespace.
In the following two examples, the first directive sets a key named ``Gate'' to a value of ``open''. The second sets the same key to a value of ``wide open and beckoning''. Notice how quotes are used to protect arguments that contain whitespace:
PerlSetVar Gate open PerlSetVar Gate "wide open and beckoning"
Configuration files can contain any number of PerlSetVar directives. If multiple directives try to set the same key, the usual rules of directive precedence apply. A key defined in a .htaccess file has precedence over a key defined in a <Directory>, <Location>, or <Files> section, which in turn has precedence over a key defined in a <VirtualHost> section. Keys defined in the main body of the configuration file have the lowest precedence of all.
Configuration keys set with PerlSetVar can be recovered within Perl handlers using dir_config(). The interface is simple. Called with the name of a key, dir_config() looks up the key and returns its value if found, or undef otherwise.
Example:
my $value = $r->dir_config('Gate');
If called in a scalar context with no arguments, dir_config() returns a hash reference tied to the Apache::Table class. See The Apache::Table Class for details.
my $dir_config = $r->dir_config;
my $value = $dir_config->{'Gate'};
Only scalar values are allowed in configuration variables set by
PerlSetVar. If you want to pass an array or hash, separate the items by a character
that doesn't appear elsewhere in the string and call split()
to break the retrieved variable into its components.
Example:
my $doc_root = $r->document_root;
If you are used to using the environment variable DOCUMENT_ROOT within your CGI scripts in order to resolve URIs into physical pathnames, be aware that there's a much better way to do this in the Apache API. Perform a subrequest with the URI you want to resolve, and then call the returned object's filename() method. This works correctly even when the URI is affected by Alias directives or refers to user-maintained virtual directories:
my $image = $r->lookup_uri('/~fred/images/cookbook.gif')->filename;
If you're interested in fetching the physical file corresponding to the current request, call the current request object's filename() method:
my $file = $r->filename;
Example:
my $port = $r->get_server_port;
If UseCanonicalName is configured to be On (the default), this method will return the value of the Port configuration directive. If no Port directive is present, the default port 80 is returned. If UseCanonicalName is Off and the client sent a Host header, then the method returns the actual port specified here, regardless of the value of the Port directive.
Example:
my $name = $r->get_server_name;
This method is sensitive to the value of the UseCanonicalName configuration directive. If UseCanonicalName is On (the default), the method will always return the value of the current ServerName configuration directive. If UseCanonicalName is Off, then this method will return the value of the incoming request's Host header if present, or the value of the ServerName directive otherwise. These values can be different if the server has several different DNS names.
The lower-level server_name() method in the Apache::Server class always acts as if UseCanonicalName were on.
Examples:
# return ServerRoot my $ServerRoot = $r->server_root_relative;
# return $ServerRoot/logs/my.log
my $log = $r->server_root_relative("logs/my.log");
The server_root_relative method can also be invoked without a request object by calling it directly from the Apache class. The example below, which might be found at the beginning of a Perl startup file, first imports the Apache module, and then uses server_root_relative() to add a site-specific library directory to the search path. It does this in a BEGIN {} block to ensure that this code is evaluated first. It then loads a local module named My::App, which presumably will be found in the site-specific directory.
#!/usr/bin/perl
# modify the search path
BEGIN {
use Apache():
use lib Apache->server_root_relative("lib/my_app");
}
use My::App ();
First we cover the interface to the earlier API. Later we'll discuss the Apache::Log class, which implements the 1.3 interface.
*In fact, the loglevel API now provides direct syslog support.
See the Apache documentation for the ErrorLog directive, which explains how to enable logging via syslog.
For example, this code:
$r->log_error("Can't open index.html $!");
results in the following ErrorLog entry:
[Tue Jul 21 16:28:51 1998] [error] Can't open index.html No such file or directory
[$DATE] [error] access to $URI failed for $HOST, reason: $MESSAGE
where $DATE is the time and date of the request,
$URI is the requested URI, $HOST is the remote
host, and $MESSAGE is a message that you provide. For example,
this code fragment:
$r->log_reason("Can't open index.html $!");
might generate the following entry in the error log:
[Tue Jul 21 16:30:47 1998] [error] access to /perl/index.pl failed for w15.yahoo.com, reason: Can't open index.html No such file or directory
The argument to log_reason() is the message you wish to display in the error log. If you provide an additional second argument, it will be displayed rather than the URI of the request. This is usually used to display the physical path of the requested file:
$r->log_reason("Can't open file $!", $r->filename);
This type of log message is most often used by content handlers that need to open and process the requested file before transmitting it to the browser, such as server-side include systems.
Example:
$r->warn("Attempting to open index.html");
$r->warn("HTTP dump:\n", $r->as_string);
[Tue Jul 21 16:51:51 1998] [warn] HTTP dump: GET /perl/index.pl HTTP/1.0 User-Agent: lwp-request/1.32 Host: localhost:9008
200 OK Connection: close Content-Type: text/plain
The Apache::Log API provides eight methods named for each of the severity levels. Each acts like the request object's error_log() method, except that it logs the provided message using the corresponding severity level.
In order to use the new logging methods, you must use Apache::Log
in the Perl startup file or at within your module. You must then fetch an Apache::Log object by calling the log() method of either an
Apache ($r->log()) or an Apache::Server object ($r->server->log(). Both objects have access to the same methods described below. However,
the object returned from the
$r->log() provides some additional functionality. It will include the client IP
address, in dotted decimal form, with the log message. In addition, the
message will be saved in the request's
notes table, under a key named ``error-notes''. It is the equivalent of the C
language API's ap_log_rerror() function (Chapter 10).
The methods described below can be called with one or more string arguments or a subroutine reference. If a subroutine reference is used, it is expect to return a string which will be used in the log message. The subroutine will only be invoked if the LogLevel is set to the given level or higher. This is most useful to provide verbose debugging information during development, while saving CPU cycles during production.
Example:
use Apache::Log (); my $log = $r->log; # messages will include client ip address my $log = $r->server->log; # message will not include client ip address
$log->emerg("Cannot open lock file!");
$log->alert("getpwuid: couldn't determine user name from uid");
$log->crit("Cannot open configuration database!");
$log->error("Parse of script failed: $@");
$log->warn("No database host specified, using default");
$log->notice("Cannot connect to master database, trying slave $host");
$log->info("CGI.pm version is old, consider upgrading") if
$CGI::VERSION < 2.42;
$log->debug("Reading configuration from file $fname");
$log->debug(sub {
"The request: " . $r->as_string;
});
For example, a script engine such as Apache::Registry or Apache::SSI might want to check if it's OK to execute a script in the current location using this code:
use Apache::Constants qw(:common :options);
unless($r->allow_options & OPT_EXECCGI) {
$r->log_reason("Options ExecCGI is off in this directory",
$r->filename);
return FORBIDDEN;
}
A full list of option constants can be found in the Apache::Constants manual page.
If the requested file or directory is password protected, auth_name() will return the realm name. An authentication module can then use this realm name to figure out which database to authenticate the user against. This method can also be used to set the value of the realm for use by later handlers.
Examples:
my $auth_name = $r->auth_name();
$r->auth_name("Protected Area");
my $auth_type = $r->auth_type;
unless (lc($auth_type) eq "basic") {
$r->warn(__PACKAGE__, " can't handle AuthType $auth_type");
return DECLINED;
}
The differences between Basic and Digest authentication are discussed in Chapter 6.
OK and the second will be the plaintext password entered by the user. Other
possible return codes include DECLINED, SERVER_ERROR and AUTH_REQUIRED, the meaning of each is described in Chapter 6.
Example:
my($ret, $sent_pw) = $r->get_basic_auth_pw;
You can get the username part of the pair by calling
$r->connection->user as described in The
Apache::Connection Class.
my($ret, $sent_pw) = $r->get_basic_auth_pw;
unless($r->connection->user and $sent_pw) {
$r->note_basic_auth_failure;
$r->log_reason("Both a username and password must be provided");
return AUTH_REQUIRED;
}
Although it would make sense for note_basic_auth_failure() to return a status code of AUTH_REQUIRED, it actually returns no value.
Authorization and access control modules gain access to this configuration
variable through the satisfies() method. It will return one of the three constants SATISFY_ALL, SATISFY_ANY or
SATISFY_NOSPEC. The latter is returned when there is no applicable
satisfy directive at all. These constants can be imported by requesting the
``:satisfy'' tag from Apache::Constants.
The following code fragment illustrates an access control handler that
checks the status of the satisfy directive. If the current document is forbidden by access control rules the
code checks whether
satisfy any is in effect, and if so, whether authentication is also required (using the some_auth_required() method call described next). Unless both these conditions are true, the
handler logs an error message. Otherwise it just returns the result code,
knowing that any error logging will be performed by the authentication
handler.
use Apache::Constants qw(:common :satisfy);
if ($ret == FORBIDDEN) {
$r->log_reason("Client access denied by server configuration")
unless $r->satisfies == SATISFY_ANY && $r->some_auth_required;
return $ret;
}
Example:
unless ($r->some_auth_required) {
$r->log_reason("I won't go further unless the user is authenticated");
return FORBIDDEN;
}
For this reason mod_perl's version of this function call,
Apache::exit(), does not cause the process to exit. Instead, it calls Perl's croak() function to halt script execution, but does not log a message to the ErrorLog. If you really want the child server process to exit, call Apache::exit() with an optional status argument of DONE (available in Apache::Constants). The child process will be shut down, but only after it has had a chance
to properly finish handling the current requests.
In scripts running under Apache::Registry, Perl's built-in exit() is overridden by Apache::exit() so that legacy CGI scripts don't inadvertently shoot themselves in the foot. In Perl versions 5.005 and higher, exit() is overridden everywhere, including within handlers. In versions of mod_perl built with Perl 5.004 handlers can still inadvertently invoke the built-in exit(), so you should be on the watch for this mistake. One way to avoid it is to explicitly import the ``exit'' symbol when you load the Apache module.
Here are various examples of exit():
$r->exit; Apache->exit; $r->exit(0); $r->exit(DONE);
use Apache 'exit'; #this override's Perl's builtin exit;
If a handler needs direct access to the Perl builtin version of exit() after it has imported Apache's version, it should call CORE::exit().
my $fh = Apache->gensym; open $fh, $r->filename or die $!; $r->send_fd($fh); close $fh;
Because of its cleanliness most of the examples in this book use the Apache::File interface for reading and writing files (See The Apache::File Class). If you wish to squeeze out a bit of overhead, you may wish to use Apache::gensym() with Perl's builtin open() function instead.
if($r->current_callback eq "PerlLogHandler") {
$r->warn("Logging request");
}
my $handlers = $r->get_handlers('PerlAuthenHandler');
Examples:
$r->set_handlers(PerlAuthenHandler => [\&auth_one, \&auth_two]); $r->set_handlers(PerlAuthenHandler => undef);
This method takes two arguments, the name of the phase you want to manipulate, and a reference to the subroutine you want to handle that phase.
Example:
$r->push_handlers(PerlLogHandler => \&my_logger);
Example:
do { #something } if Apache->module('My::Module');
This method can also be used to test if a C module is loaded. In this case, pass it the filename of the module, just as you would use with the IfModule directive. It will return a true value if the module is loaded.
Example:
do { #something } if Apache->module('mod_proxy.c');
if(Apache->define("SSL")) {
#the server was started with -DSSL
}
Apache->request() class method returns a reference to the current request object, if any.
Handlers that use the vanilla Perl API will not need to call this method
because the request object is passed to them in their argument list.
However, some modules may not have a subroutine entry point and therefore
need a way to gain access the request object. For example, CGI.pm uses this
method to provide proper mod_perl support.
Called with no arguments, request() returns the stored Apache request object. It may also be called with a single argument to set the stored request object. This is what Apache::Registry does before invoking a script.
Example:
my $r = Apache->request; # get the request Apache->request($r); # set the request
Actually, it's a little known fact that Apache::Registry scripts can access the request object directly via @_. This is slightly faster than using Apache->request, but has the disadvantage of being obscure. This technique is demonstrated in Subclassing the Apache Class.
directive(s)
that you wish Apache to process. Using string interpolation, you can use
this method to dynamically configure Apache according to arbitrarily
complex rules.
httpd_conf() can only be called during server startup, usually from within a Perl startup file. Because there is no request method at this time, you must invoke httpd_conf() directly through the Apache class.
Example:
my $ServerRoot = '/local/web'; Apache->httpd_conf(<<EOF); Alias /perl $ServerRoot/perl Alias /cgi-bin $ServerRoot/cgi-bin EOF
Should a syntax error occur, Apache will log an error and the server will exit, just as it would if the error was present in the httpd.conf configuration file. A more sophisticated way of configuring Apache at startup time via <Perl> sections is discussed in Chapter 9.
The Apache class supports the full TIEHANDLE interface, as described in
perltie(1). STDIN and STDOUT are already tied to
Apache by the time your handler is called. If you wish to tie your own input or
output filehandle, you may do so by calling tie() with the request object as the function's third parameter:
tie *BROWSER, 'Apache', $r; print BROWSER 'Come out, come out, wherever you are!';
Of course, it is better not hard code the Apache class name, as
$r might be blessed into a subclass:
tie *BROWSER, ref $r, $r;
my $subr = $r->lookup_file($filename); my $subr = $r->lookup_uri($uri);
The Apache::SubRequest class adds a single new method, run().
my $status = $subr->run;
When you invoke the subrequest's response handler in this way, it will do everything a response handler is supposed to, including sending the HTTP headers and the document body. run() returns the content handler's status code as its function result. If you are invoking the subrequest run() method from within your own content handler, you must not send the HTTP header and document body yourself, as this would be appended to the bottom of the information that has already been sent. Most handlers that invoke run() will immediately return its status code, pretending to Apache that they handled the request themselves:
my $status = $subr->run; return $status;
server_rec data structure, which contains lots of low-level information about the
server configuration. Within a handler, the current Apache::Server object can be obtained by calling the Apache request object's server() method. At Perl startup time (such as within a startup script or a module
loaded with PerlModule) you can fetch the server object by invoking Apache->server directly. By convention, we use the variable $s for server objects.
Examples:
#at request time
sub handler {
my $r = shift;
my $s = $r->server;
....
}
#at server startup time, e.g. PerlModule or PerlRequire my $s = Apache->server;
This section discusses the various methods that are available to you via
the server object. They correspond closely to the fields of the
server_rec structure, which we revisit in Chapter 10.
Example:
my $is_virtual = $s->is_virtual;
Example:
use Apache::Log (); my $log = $s->log;
The Apache::Server::log() method is identical in most respects to the Apache::log() method discussed earlier. The difference is that messages logged with Apache::log() will include the IP address of the browser and add the messages to the notes table under a key named ``error-notes''. See the description of notes() under Server Core Functions.
Example:
my $port = $r->server->port || 80;
This method is read-only.
Example:
my $admin = $s->server_admin;
This method is read-only.
Example:
my $hostname = $s->server_hostname;
This method is read-only.
Example:
my $s = $r->server; my $names = $s->names;
Example:
for(my $s = Apache->server; $s; $s = $s->next) {
printf "Contact %s regarding problems with the %s site\n",
$s->server_admin, $s->server_hostname;
}
my $s = Apache->server;
$s->log_error("Can't open config file $!");
my $s = Apache->server;
$s->warn("Can't preload script $file $!");
conn_rec data structure, which provides various low-level details about the network
connection back to the client. Within a handler, the connection object can
be obtained by calling the Apache request object's connection() method. The connection object is not available outside of handlers for the
various request phases because there is no connection established in those
cases. By convention, we use the variable $c for connection objects.
Example:
sub handler {
my $r = shift;
my $c = $r->connection;
...
}
In this section we discuss the various methods that are available through
the connection. They correspond closely to the fields of the C API conn_rec structure discussed at in Chapter 10.
Example:
if($c->aborted) {
warn "uh,oh, the client has gone away!";
}
Example:
if($c->auth_type ne 'Basic') {
warn "phew, I feel a bit better";
}
This method is read-only.
Example:
use Socket ();
sub handler {
my $r = shift;
my $local_add = $r->connection->local_addr;
my($port, $ip) = Socket::unpack_sockaddr_in($local_add);
...
}
For obvious reasons, this method is read-only.
Among other things, the information returned by this method and local_addr() can be used to perform RFC1413 ident lookups on the remote client even when the configuration directive IdentityCheck is turned off. Using Jan-Pieter Cornet's Net::Ident module for example:
use Net::Ident qw(lookupFromInAddr);
...
my $remoteuser = lookupFromInAddr ($c->local_addr,
$c->remote_addr, 2);
It is almost always better to use the high-level get_remote_host() method available from the Apache request object (see above). The high level method returns the dotted IP address of the remote host if its DNS name isn't available, and it caches the results of previous lookups, avoiding overhead if you call the method multiple times.
Example:
my $remote_host = $c->remote_host || "nohost"; my $remote_host = $r->get_remote_host(REMOTE_HOST); # better
This method is read-only.
Example:
my $remote_ip = $c->remote_ip;
The remote_ip() can also be changed, which is helpful if your server is behind a proxy such as the squid acelerator. By using the X-Forwarded-For header sent by the proxy, the remote_ip can be set to this value so logging modules include the address of the real client. The only subtle point is that X-Forwarded-For may be multi-valued in the case of a single request that has been forwarded across multiple proxies. It's safest to choose the last IP address in the list, since this corresponds to the original client.
Example:
my $header = $r->headers_in->{'X-Forwarded-For'};
if( my $ip = (split /,\s*/, $header)[-1] ) {
$r->connection->remote_ip($ip);
}
It is better to use the high level get_remote_logname() method which is provided by the request object. When the high level method is called the result is cached and reused if called again. This is not true of remote_logname().
Example:
my $remote_logname = $c->remote_logname || "nobody"; my $remote_logname = $r->get_remote_logname; # better
Example:
my $username = $c->user;
The five C data structures listed below are implemented as tables. This list is likely to grow in the future.
The TIEHASH interface is easy to use. Simply call one of the methods listed above in a scalar context to return a tied hash reference. For example:
my $table = $r->headers_in;
The returned object can now be used to get and set values in the headers_in table by treating it as an ordinary hash reference, but the keys are looked up case insensitively. Examples:
my $type = $table->{'Content-type'};
my $type = $table->{'CONTENT-TYPE'}; # same thing
$table->{'Expires'} = 'Sat, 08 Aug 1998 01:39:20 GMT';
If the field you are trying to access is multi-valued, then the tied hash interface suffers the limitation that fetching the key will only return the first defined value of the field. You can get around this by using the object-oriented interface to access the table (we show an example of this below), or use the each operator to access each key and value sequentially. The following code snippet shows one way to fetch all the Set-Cookie fields in the outgoing HTTP header:
while (my($key, $value) = each %{$r->headers_out}) {
push @cookies, $value if lc($key) eq 'set-cookie';
}
When you treat an Apache::Table objects as a hash reference, you are accessing its internal get() and set() methods (among others) indirectly. To gain access to the full power of the table API, you can invoke these methods directly by using the method call syntax.
Here is the list of publicly available methods in Apache::Table, along with brief examples of usage.
my $out = $r->headers_out;
for my $cookie (@cookies) {
$out->add("Set-Cookie" => $cookie);
}
Another way to add multiple values is to pass an array reference as the second argument. This code has the same effect as the previous example:
my $out = $r->headers_out;
$out->add("Set-Cookie" => \@cookies);
$r->notes->clear;
This example dumps the contents of the headers_in field to the browser:
$r->headers_in->do(sub {
my($key, $value) = @_;
$r->print("$key => $value\n");
1;
});
For another example of do(), see listing 7.12 from the previous chapter, where we use it to transfer the incoming headers from the incoming Apache request to an outgoing LWP HTTP::Request object.
my $ua = $r->headers_in->get('User-Agent');
my @cookies = $r->headers_in->get('Cookie');
get() is the underlying method that is called when you use the tied hash interface to retrieve a key. However the ability to fetch a multi-valued key as an array is only available when you call get() directly using the object-oriented interface.