9.1.2 Filling a GET form on the Web: Automatically Finding Book Prices

9.1.2  Filling a GET form on the Web: Automatically Finding Book Prices

    Web pages are full of forms. A form is a way to interact with a Web server and get a response back. For example, we fill in a form to become a member of a mailing list, or to search for items using a search engine or an electronic store. A simple form can have only one or two elements in them whereas a complex form can have many elements to fill.

There are two ways that a filled form can be submitted to a Web server requesting an action on its part and a subsequent response. The two methods are called GET and PUT. The GET method sends all the form’s filled contents to the server in one network transmission whereas the PUT method needs an additional transmission, one to send the form’s URL and headers, and another to send the form’s data. The form’s data are accepted by the server and then processed by a program such as a CGI program and the response is sent back to the user either via the Web server or directly.

The designer of a form decides what method between the two to use. For simple forms, GET is sufficient. The differences between the two methods can be found in any book that discusses HTML and the HTTP protocol. Mimicking a form submitted with PUT is a little difficult to handle. However, GET-submitted forms can be treated as a regular URL as we see below.

The following program fills in a form at the Web site for the electronic book and music store called

http:://www.borders.com. The site has a simple form that allows a visitor to search for books using an ISBN number. This form is at the location

http://search.borders.com/fcgi-bin/db2www/search/search.d2w/BookISBN.

Before discussing the program, it must be noted that Web sites change frequently, and therefore, the form being discussed in the book may not exist at a future date. The form, as it looks on the Web site at the time of writing, is shown in Figure 9.17.

 

Figure 9.17:  A Search Form at www.borders.com

On the top left, we see a form that allows a visitor to search for book, music or video/DVD titles. There is a second form that allows search using a ten-digit ISBN number. If we look at the HTML for the second form, we see a form that looks like the following.

 

<TD BGCOLOR="#ffffcc"><FORM NAME="ISBNUPC_form" METHOD="get" action="Details">

           <B>Enter ISBN to look for:</B><SMALL><BR>&nbsp;<BR></SMALL>

           <INPUT TYPE="text" name="code" SIZE="30" MAXLENGTH="100">

           <INPUT TYPE="submit" VALUE="Find ISBN">

           <INPUT TYPE="hidden" NAME="mediaType" VALUE="Book">

           <INPUT TYPE="hidden" NAME="searchType" VALUE="ISBNUPC">

           <INPUT TYPE="hidden" NAME="prodID" VALUE="">

           <P><B>Instructions:</B><BR>

           Enter a book's 10-digit ISBN number. Do not use spaces.</P>

         </TD>

</FORM>

 

Finding the correct form in the source of the Web page may take a little time. The form is inside an HTML table’s cell represented by the tag TD. This form’s ACTION attribute refers to the URL Details. The form’s METHOD attribute has the value get. Note that HTML is case-insensitive in the way we specify tags such as FORM, and their attributes such as ACTION. Thus, the form submission takes place using the GET method of the HTTP protocol. The form has several hidden arguments: code, mediaType, searchType and prodID. They all have values specified in HTML. These values are sent to the server when the form is submitted by clicking on the button labeled Find ISBN. The code text field is the one we see in the form in the Web browser. In the program that follows, we automatically fill in this form and submit it to the Web
server. The Web server performs the search and returns an HTML-formatted page with the result of the search. We examine this page carefully and see that the price of the book is always preceded by certain keywords. We key in on this repeated occurrence and parse the returned page to obtain the price of the book, and print it on the screen.

 Program 9.4

#!/usr/bin/perl
#file bordersISBN1.pl

use strict;
use LWP::Simple;

my ($url, $content);
my $ISBN = "1565922433";
print "ISBN = $ISBN\n";

#Make up the URL to search for the book's ISBN
$url = "http://search.borders.com/fcgi-bin/db2www/search/search.d2w/Details?";
$url .= "code=$ISBN&mediaType=Book&searchType=ISBNUPC";

$content = LWP::Simple::get ($url);
my ($price) =  ($content =~ m#Our Price:.+?\$(.+?)#si);
print "price = $price\n";

The program uses the module LWP::Simple. It is given an ISBN number that happens to be 1565922433. Of course, this ISBN number could have been given as a command-line argument or read from the terminal after an appropriate prompt. The program creates a URL to request the server to initiate a search. To perform a search using a GET-submitted form, the server is sent the URL followed by a question mark, and then by one or more form field name, form field value pairs. The name and the value are separated by an equal sign. Different pairs are separated by the ampersand (&). Note that values must be supplied to hidden form elements with the correct values as found in the Web page for the form. This is the manner in which GET
submitted forms are sent by a browser to a server. In this case, the URL sent to the server is the following. We have broken it into two lines whereas it is sent as a single line to the server.

 

http://search.borders.com/fcgi-bin/db2www/search/search.d2w/Details?

            code=$ISBN&mediaType=Book&searchType=ISBNUPC

 

There are no intervening spaces. The search URL seems to work without a prodID value.

This program simply mimics what a Web browser does. When the form is received by the Web server, it does the needful to get the search performed, and then returns the results of the search. If we were actually doing this search using a Web browser, we see the result page that looks like the one given in Figure 9.18.

 

Figure 9.18:  A Search Form at www.borders.com

We carefully look at the HTML of this result page and result pages of several additional searches, and find that every time we perform an ISBN search, the price of the book charged by the store follows the keywords Our Price. We obtain the price by simply parsing the Web page’s HTML. Note that if we wanted to find additional information such as the title of the book, author, shipping information, etc., we will have to do additional search of the Web page’s HTML. The output of running this program is given below.

 

%bordersISBN1.pl

ISBN = 1565922433

price = 35.96

 

This printout says that the books with ISBN number 1565922433 is sold by www.borders.com at a price of $35.96. This program can be easily extended to read a sequence of ISBN numbers from a file or a database, and obtain prices for each one of the ISBN numbers. If prices can be fetched from several bookstores, we can very easily compare prices of books.