9.2.4 Automatically Filling a POST Form

9.2.4  Automatically Filling a POST Form

    We have seen how forms that use the GET method can be submitted automatically. We can do so with forms submitted using the POST method as well, although it takes a little more work to do so. The following program finds the price of a book given the ISBN number at an Internet bookstore called powells.com.

 Program 9.10

#!/usr/bin/perl
#file powellISBN1.pl 

use strict;
#use LWP::Debug qw(+);
use HTTP::Response;
use HTTP::Request;
use LWP::UserAgent;

my ($ua, $response, $url); 
my ($content, $req);
my $searchFor = "1565922433";
my $searchType = "ISBN";
print "ISBN = $searchFor\n"; 

$ua = LWP::UserAgent->new();
$ua->agent ("Mozilla/6.0; compatible");
$ua->timeout(200);

$url = "http://www.powells.com/search/DTSearch/search";
$req = HTTP::Request->new(POST=>$url);
$req->content(qq{isbn=$searchFor});

$response = $ua->request($req);
$content = $response->content();

########################************************************
###NEEDS MORE WORK.  IT'S PICKING UP THE FIRST PRICE 
#ONLY, BUT POWELLS.COM MAY RETURN SEVERAL ENTRIES FOR THE SAME BOOK. 
#########################*******************
my ($price) =  ($content =~ m#RESULT ITEM START.+?\$(.+?)<#si);
print "price = $price\n";

The program creates a user agent $ua and gives the user agent a couple of attributes. Using the agent method, it identifies itself as compatible to Mozilla/6.0 type browsers. It also sets a timeout of 200 seconds to get the result back. It then specifies the URL of the form’s action attribute in terms of the variable $url. This URL has to be found by examining the source of the appropriate Web page where the form for searching book prices at powells.com is situated. We determine that the HTML source of the form specifies that it is using the POST method. The GET method is default if nothing is specified. An HTTP::Request object is created using the URL as the argument for a POST form. The HTTP::Request object’s new method takes one or more arguments in the form of hash fields. One such hash field’s name is POST and the corresponding value is the URL. The HTTP::Request
object $req uses the content method to specify the form element names and their values. In this case, there is only one relevant form attribute and it is called isbn. Its value is given as the value of the variable $searchFor. qq is the quoting function that allows interpolation of variable values. It is like using double quotes to delimit a string. If there were several fields, each field’s name and value are written separated from the next pair by a comma.

The user agent uses the request method of the LWP::UserAgent object to send the HTTP request to the Web server. The Web server responds and the response is the result of the request method. The content method of the HTTP::Response object gives the content of the response, as opposed to the header. The response is very simply parsed to obtain the price of the book with the given ISBN number. Once again, to find out where the price occurs in the returned Web page, we need to fill in the form manually several times and see if we can determine a simple way to figure out where the price occurs in the returned result. In this specific case, the
price always occurs after the phrase RESULT ITEM START. However, in this particular store powells.com, the search for a single book can return several prices. Here, only the first one is captured.

We have seen in section 9.2.3 how the HTTP::Request::Common module can be used to process GET forms. The HTTP::Request::Common module can be used to submit POST forms as well. The following program is a rewrite of the program given earlier in this section.

 Program 9.11

#!/usr/bin/perl
#file powellISBN2.pl 

use strict;
#use LWP::Debug qw(+);
use HTTP::Response;
use LWP::UserAgent;
use HTTP::Request::Common;

my ($ua, $response, $url); 
my ($content, $req);
my $searchFor = "1565922433";
my $searchType = "ISBN";
print "ISBN = $searchFor\n"; 

$ua = LWP::UserAgent->new();
$ua->agent ("Mozilla/5.0");
$ua->timeout(600);

$url = "http://www.powells.com/search/DTSearch/search";
$req =   POST "$url", ["isbn" => $searchFor];
$response = $ua->request($req);
$content = $response->content();

my ($price) =  ($content =~ m#RESULT ITEM START.+?\$(.+?)<#si);
print "price = $price\n";

 

In this program, the statement that sends out the POST form is given below.

 

$req =   POST "$url", ["isbn" => $searchFor];

 

There is only one relevant field in the form. If there were several fields, the field name, value pairs are separated from each other by commas. Of course, instead of having the URL defined earlier, we could have specified the URL with the POST call directly, as given below.

 

$req = POST "http://www.powells.com/search/DTSearch/search",

     ["isbn" => $searchFor];