9.2.1 Requesting URL Header From a Web Server

9.2.1  Requesting URL Header From a Web Server

We start with a program that communicates with a Web server requesting the header information for a file.

 Program 9.5

#!/usr/bin/perl
#file fetchHead1.pl

use LWP::UserAgent;
use HTTP::Request;
use URI;
use strict;
$" = "\n\t";

my ($url, $uri, $ua);
my ($headerRequest, $headerResponse, $headers);

#URL to fetch
$url = "http://www.cs.uccs.edu/~kalita";
$uri = URI->new($url);

#Creating a user agent and sending a request, and getting response
$ua = LWP::UserAgent->new();
$headerRequest = HTTP::Request->new(HEAD=>$uri);
$headerResponse = $ua->request ($headerRequest);
if ($headerResponse->is_success){
    $headers = $headerResponse->headers;
}
else{
    print $headerResponse->error_as_HTML;
    exit 0;
}

print  "-" x 60, "\n";
print "THE HEADER RESPONSE IS...\n", 
     $headerResponse->as_string, "\n";
print  "-" x 60, "\n";
print "status line: ", $headerResponse->status_line, "  status\n";
print  "-" x 60, "\n";

print "HEADERS AS STRING IS...\n", $headers->as_string, "\n";
print  "-" x 60, "\n";
print "content type: ", $headers->content_type, "\n";
print "date: ", $headers->date, "\n";
print "server: ", $headers->server, "\n";

printf "%-15s %-30s\n", "Header Name", "Header Value";
print  "-" x 60, "\n";
$headers->scan (\&headerScanner);
print  "-" x 60, "\n";

#callback subroutine to process header entries
sub headerScanner{
   my ($headerName, $headerValue) = @_;
   printf "%-15s %-30s\n", $headerName, $headerValue;
}

The program uses the LWP::UserAgent and HTTP::Request modules. It also uses a module called URI to create a Uniform Resource Identifier for the URL to fetch. It is not really necessary to create a URI out of a URL to use it. We do so for illustration purposes only.

The program is given a value for a URL it fetches. A URL is given in the form of a string. The program converts the URL to an URI object. A URI is a more generalized abstraction of an address than a URL. However, the differences are not important to us at this time.

 

$url = "http://www.cs.uccs.edu/~kalita";

$uri = URI->new($url);

 

As mentioned earlier, it is not necessary to convert the URL to a URI, a more generalized address, for the purpose of this program. Next, the program creates a user agent object using the following line.

 

$ua = LWP::UserAgent->new();

 

The program also creates an HTTP request and passes this request to the user agent.

 

$headerRequest = HTTP::Request->new(HEAD=>$uri);

 

The request is for header information, and takes the URI created earlier as an argument. The argument could have been a URL in string form as well. Next, the request is given to the user agent’s request method.

 

$headerResponse = $ua->request ($headerRequest);

 

The user agent’s request method opens a TCP-based communication socket with the server. If successful, the user agent converts the request to the proper HTTP format and sends it to the server over the socket. We discussed sockets in detail in Chapter 7. The LWP bundle of modules makes it unnecessary to perform low-level tasks such as creating sockets for Web client programming, making things simple for the programmer. The user agent waits for the response to come back from the server. When the response comes, the user agent composes an HTTP::Response object out of it automatically. Thus, $headerResponse is an instance of the HTTP::Response
class, whether the request was successful in fetching the header or not. The HTTP::Response class has two methods is_success and headers. If the HTTP request is successful in obtaining the HEAD information for the URI, the is_success returns true. If the request fails, the program prints the error sent to it by the server in HTML format and exits. The error_as_HTML method of the HTTP::Response class does so.

The HTTP::Response object has an internal format for storage in the program. To print the content of the object in string form, we can use the as_string method provided by the HTTP::Response object. If we look at the output printed below, we see that the statement below

 

print "THE HEADER RESPONSE IS...\n",

     $headerResponse->as_string, "\n";

 

prints the complete response that comes back from the server. The information printed by the statement is given later in this discussion. The first line is the HTTP response line we discussed earlier. It says the protocol used is HTTP/1.1. The request was successful and came back with a 200 status code. The status is given as OK in text form. Following this, there is a list of HTTP header and value pairs. The headers listed here are Connection, Date, Server, Content-Length, etc. The value of $headerResponse when printed as a string contains one or more blank lines at the end although the HTTP specification says there should be only one blank line. Thus, the HTTP::Response class has several useful methods such as is_success, is_error, as_string, error_as_HTML, etc. Another method that we use in this program is status_line that prints just the status line or the first line of the response. Two other useful methods that the HTTP::Response object has are called headers and content. Every HTTP request comes back with a response that contains headers. However, for the HEAD HTTP request, no content comes back. Thus, we capture only the headers in the response if the request is successful. The capturing is done in the program inside the if-else statement.

 

    $headers = $headerResponse->headers;

 

The headers method of the HTTP::Response class returns an object of yet another class called the

HTTP::Headers class that is also available as a part of the LWP bundle of packages. When sending out an HTTP Request, one can optionally create headers to be sent out with the request. On the flip side, when a response has come back, the headers that have come back can be captured for examination.

The HTTP::Headers class also has several methods that are quite useful in examining the contents of the headers. Examples of these methods are content_type, date, and server used in this program. One interesting method that HTTP::Headers class has is called scan. A call to scan is given below.

 

$headers->scan (\&headerScanner);

 

The argument taken by scan is a reference to a function. The function must take two arguments, the name of a header and the value of a header. The definition of such a function used in the program is given below.

 

sub headerScanner{

   my ($headerName, $headerValue) = @_;

   printf "%-15s %-30s\n", $headerName, $headerValue;

}

 

A function such as this is called a callback function. A callback function is applied to every header in turn. It is called with a header name and a single value. It can do whatever it pleases with the arguments. In this case, it simply prints them in the form of a formatted string. Note that if a header has several values, the function is called once for each value. The output of this program is given below.

 

------------------------------------------------------------

THE HEADER RESPONSE IS...

HTTP/1.1 200 OK

Connection: close

Date: Wed, 04 Apr 2001 20:01:12 GMT

Accept-Ranges: bytes

Server: Apache/1.3.14 (Unix)  (Red-Hat/Linux) PHP/3.0.18 mod_perl/1.23

Content-Length: 4225

Content-Type: text/html

ETag: "650328-1081-3ab64734"

Last-Modified: Mon, 19 Mar 2001 17:51:48 GMT

Client-Date: Wed, 04 Apr 2001 13:04:48 GMT

Client-Peer: 128.198.162.68:80

 

 

 

------------------------------------------------------------

status line: 200 OK  status

------------------------------------------------------------

HEADERS AS STRING IS...

Connection: close

Date: Wed, 04 Apr 2001 20:01:12 GMT

Accept-Ranges: bytes

Server: Apache/1.3.14 (Unix)  (Red-Hat/Linux) PHP/3.0.18 mod_perl/1.23

Content-Length: 4225

Content-Type: text/html

ETag: "650328-1081-3ab64734"

Last-Modified: Mon, 19 Mar 2001 17:51:48 GMT

Client-Date: Wed, 04 Apr 2001 13:04:48 GMT

Client-Peer: 128.198.162.68:80

 

 

------------------------------------------------------------

content type: text/html

date: 986414472

server: Apache/1.3.14 (Unix)  (Red-Hat/Linux) PHP/3.0.18 mod_perl/1.23

Header Name     Header Value                 

------------------------------------------------------------

Connection      close                        

Date            Wed, 04 Apr 2001 20:01:12 GMT

Accept-Ranges   bytes                        

Server          Apache/1.3.14 (Unix)  (Red-Hat/Linux) PHP/3.0.18 mod_perl/1.23

Content-Length  4225                         

Content-Type    text/html                    

ETag            "650328-1081-3ab64734"       

Last-Modified   Mon, 19 Mar 2001 17:51:48 GMT

Client-Date     Wed, 04 Apr 2001 13:04:48 GMT

Client-Peer     128.198.162.68:80            

------------------------------------------------------------