8.3.5 HTML Forms

8.3.5  HTML Forms

    A lot of CGI programs work with HTML forms. Such a CGI program accepts form information sent to it by a browser, and processes the information to produce results that are sent back to the browser for display as response. Usually, the CGI program takes other requested actions such as enrolling a person in a mailing list, starting the process needed to accept credit card payment and ship a book to the customer, etc. Therefore, is is necessary to understand the syntax of forms well. The reader is advised to consult a book such as HTML and XHML: The Definitive Guide [MK00], XHMTL By Example [Nav00] or Web Design in a Nutshell [Nie99].

  There is a very powerful Perl module called CGI.pm that makes many aspects of CGI programming much easier than starting from scratch [Ste98]. In this book, we examine only a few of the capabilities of the CGI.pm module. In particular, we use CGI.pm to obtain the parameters that have been passed to the CGI program. Many of the functions that CGI.pm provides allows one to produce HTML tags in the correct manner. However, the author of this book does not find much use
for most such constructs. If one produces HTML using a CGI program, the author feels that he or she should know HTML syntax. There is no need to learn another language that produces HTML code, albeit at a slightly higher level. There are too many programming languages to learn and use, and the author prefers one less!  And, HTML is extremely simple.

  There are two types of HTML forms on the Web: GET forms and POST forms. The two names are associated with the type of method for data submission used by the form. When an HTML form is filled with data by a viewer, the data needs to be sent to the server. Action is taken by the server based on the form data by delegating the task to a helper program such as a CGI program. This program is accessible on the Web using a URL, and this URL is available as the value of the ACTION attribute of the HTML form. Each data field in the
form has a name and each of the fields in the form that is filled has a value. A form field that remains unfilled has no associated value. The name-value pairs are sent to the Web server by the browser using a specific format. The form data is encoded using a standard and simple encoding process. Some character transformations are performed by the browser before the name-value pairs are constructed. For a name-value pair, the name is followed by an equal sign, and then the value. There are no spaces in-between. Name-value pairs are separated from each other with the ampersand character. Thus, the form data may look like the following assuming we have three fields in the form.

name1=value1&name2=value2&name3=value3

The difference between a GET form and a POST form lies in the manner in which data is sent to the server by the browser. In a GET form, the browser sends the ACTION URL and the form data to the browser in one transmission with a question mark character between the two.

URL?name-value-pairs

URL is the full URL of the program that is found in the ACTION attribute of the form. name-value-pairs follow the format given earlier. In a POST form, data is sent by the browser to the server in two transmissions unlike the single transmission in the case of a GET form. In the case of a POST form, the URL is sent alone first. Next, in a separate transmission, the name-values pairs constructed in the manner discussed earlier are sent.

 Since the data are sent in two different ways, a general servicing CGI program has to capture the data in two different ways. The encoding process is the same in the two cases, and hence, the decoding process is the same as well. The CGI.pm module makes the capturing of data easy. If we use the CGI.pm module, we do not have to worry about whether it uses the GET or the POST method. We do not have to know the encoding process so we can undo it to be able to decode. One of the functionalities that is most useful in the CGI.pm module is the param method that captures a piece of field data based on the name of the form’s field. It is also possible to write a form-handling CGI program without using the CGI.pm module [Gun00] although we do
not discuss it here.

    For a CGI program that responds to a user-filled HTML form, there are usually two things that have to work together. First, we must have an HTML file. A CGI program is usually called by the server when a browser submits an HTML form. An HTML form has an ACTION attribute whose value must be a URL that corresponds to a program that can take the form input and process it and return a result. For several years in the early years of the Web, CGI programs were the only way to handle forms, although now there are other mechanisms such as Microsoft ASPs and
Java servlets. Thus, the second requirement to handle form data is a program. The CGI program must examine the form data, see if they are acceptable (although JavaScript and other scripting languages can help by doing some filtering on the browser itself), and if they are appropriate produce the appropriate output in HTML format and send it back to the browser for display. The CGI program may be able to produce the output directly by performing some computation, but frequently a CGI program contacts a data source like a database and gets data from it before producing the HTML page it returns. The output of the CGI program is automatically sent to the browser that made the request to the CGI program by filling in the form.