8.3.4 Printing CGI Environment Variables

8.3.4  Printing CGI Environment Variables

  

The CGI standard allows a CGI program to be parameterized. In other words, a server can pass parameters to a CGI program that it invokes. Values of some of these parameters are supplied by the browser. When a browser sends a request for a page to a server, it sends additional information such as its identity, the machine on which it is running, etc. Such information is passed over to the CGI program so that if necessary the CGI program can use it. Among other things, the information can be used to customize the output for a specific browser. For example, if the program is being invoked from a personal digital assistant (PDA) such as palm top computer or a cellular phone, the CGI program can
produce smaller pages that can fit the PDA’s screen and that reduce the amount of information transmitted.

A Web server can parameterize a CGI program in two ways. 

•    The Web server directly sends the parameters and their values sent by the browser to the CGI program. These are usually names of HTML form fields and their values. These are sent following the same simple syntax that the browser uses to send them to the server. We see a discussion of this syntax in Section 8.3.5.

•    The Web server sends additional parameters to the CGI program using environment variables. The use of environment variables is an unusual convention for passing arguments. The Web server places certain parameter values in Unix or operating system environment variables and then invokes the CGI program. The CGI program inherits a copy of these environment variables. A CGI program can extract these values and use them.

In this section, we convert the program that prints the environments variables on the command line, first discussed in Section 8.3.3, into a CGI program. We do so by printing one or more HTTP headers to the standard output, followed by a page that follows HTML syntax. The meat of the program is still the while loop that prints the key-value pairs from the environment hash %ENV. The variable values are printed in the body of the page. This program prints the values of environment variables as seen by a CGI program. The environment variables define the context in which the CGI
program runs.

Whether a Perl program runs from the command-line or as a CGI program, the environment variables are available to it in the %ENV hash. However, the contents of %ENV are different in the two cases. When a Web server is started, it runs its own environment. That is, it knows quite a bit of information about its identity, the machine it is running on, the programs it can count on to call for various purposes, etc. These variables are available to a CGI program.

The environment variables that a Web server maintains may be long-term information. For example the IP address and the name of the machine on which the Web server is running usually never change. Another environment variable that does not change is the type of Web server software, say an Apache server. However, there are other environment variables such as the name and the IP address of the machine on which the browser is running change from one interacting browser session to another. A variable that records the name of the current CGI script obviously changes with every CGI program that is started by either clicking on an HTML link or by giving the name of the CGI program as a URL on the
location box of a browser.

 Program 8.5

#!/usr/bin/perl
#file environ.pl
use strict;

print "Content-type: text/html", "\n";
print "Pragma: no-cache", "\n\n";

print "", "\n";
print "Printing Environment Variables...", "\n";
print "

Printing Environment Variables...

"; my $key; foreach $key (sort (keys %ENV)){ print "The value of the ", $key, " field is ", $ENV{$key}, "
\n"; } print "", "\n";

Figure 8.8:  Printing Environment Variables in a CGI Program

We can run this CGI program from a browser using a URL such as the following assuming the script has been saved in the appropriate location on the server.

 

http://pikespeak.uccs.edu/cgi-bin/kalita/perlbook/environ.pl

 

In a Web page using HTML, the CGI program can be invoked by using a construct such as the following.

 

<a href="http://pikespeak.uccs.edu/cgi-bin/kalita/perlbook/environ.pl">

    Click for environment variables</a>

 

The output produced by the CGI program on a Web browser is shown in Figure 8.8. A cursory glance at Figure 8.8 and the output shown in Section 8.3.3 show that the contents of the %ENV hash in the two cases are entirely different. In the case of the CGI program, the %ENV hash stores information about the current state of the HTTP interaction, the participants in the interaction, and attributes of the participants as well as the interaction itself. We discuss some of these environment variables below. The reader is advised to look at a book such as Webmaster in a Nutshell
[SQ96], CGI Programming on the World Wide Web [Gun00], or Writing CGI Applications with Perl [MM01] for more details on these and other CGI environment variables.

CGI environment variables can be classified into several categories.

•    Server: Information such as the server name, the port used by the server, the version of the server protocol, etc.

•    Client: Information such as the identity of the remote host on which the client is running, the port used by the client, etc.

•    HTTP Protocol: Information about the protocol such as the protocol version, languages acceptable, character set used, etc.

•    CGI: Information about the CGI interaction such as its version, method used, name of the script, the query string, location of the script, etc.

•    Redirection: It may also contain information about redirection if the original HTTP request is redirected by the original server after modifying the query string to a new server at a new port with a new URL. Such redirection is not uncommon, especially in commercial sites such as a Web-based bookstore.

First, we specify some of the server environment variables.

•       SERVER_NAME, SERVER_ADDR, and SERVER_PORT give the name of the machine on which the server is running, its IP address, and the listening port used by the server, respectively.

•    SERVER_PROTOCOL specifies the version of the HTTP protocol used by the Web server.

•    SERVER_SOFTWARE gives a somewhat detailed specification of the Web server.

The client information includes fields such as   REMOTE_ADDR and REMOTE_PORT. Usually, a Web client is automatically given an unused port by the machine on which it is running. The HTTP protocol information includes the following and a few additional fields.

•    HTTP_HOST: It contains the name of the server or one of its aliases that the client says it wants to contact when it sends out the HTTP request.

•    HTTP_CONNECTION: This specifies if the network link between the server and the browser is for transmitting one document or for several documents. In the early days of the Web, the HTTP protocol transmitted only one document from a server to a client per transmission. If several documents needed to be sent from a server to the same client, the communication channel between the client and the server was opened and closed several times, once for each document. Therefore, if a Web page had several images that had to be loaded from the same Web server, each was obtained separately, and each involved elaborate opening and
closing of the channel of communication between the same pair of computers. This inefficiency can be gotten rid of in the currrent version of the HTTP protocol by using the Keep-Alive facility. If several files need to be transferred from the same server to the same client, Keep-Alive states that the server keep the line open till all required files have been transmitted. 

•    HTTP_ACCEPT_LANGUAGE: It specifies the languages the browser can accept. The list of languages and their abbreviations are specified in terms of a document called Request For Comments (RFC) 1766. A list is found in a book such as Web Design in a Nutshell [Nie99]. A few language codes are given in Table 8.8.

        

 


as

Assamese

 

el

Greek

 

 

en

English

 

es

Spanish

 

fi

Finnish

 

no

Norwegian

 

pt

Portuguese

 

ru

Russian

 

sv

Swedish

   

 

Table 8.8:  RFC 1766 Language Codes Used by the HTTP Protocol

•      HTTP_ACCEPT_CHARSET: It specifies the default character set or sets that the browser can display. This environment variable specifies one or more encodings for character sets. A character set is encoded in terms of a sequence of numbers. The HTML specification uses the ISO-8859-1 (Latin 1) character set for encoding documents. If we want to create an HTML document that is universally viewable, it must use this character set. It uses 8 bits for each character. ISO-8859-1 is an internationally standardized character set to used type accented characters. This character set contains all characters necessary for all
major languages of Western Europe. Table 8.9 gives a list of other ISO-8859 character sets that are internationally accepted for languages outside Western Europe. ISO-8859-1 is only one of the ISO-8859 standards. There are additional standards for Asian and other regions of the world. The Unicode specification, a universal character encoding scheme, is a superset of all these standards.

   It is hoped the Unicode will replace ASCII and Latin-1 in a few years everywhere. The Unicode handles practically any script and any language in the world, and also provides a comprehensive set of mathematical and technical symbols. The UTF-8 encoding allows Unicode to be used in a convenient and backward-compatible way in environments that, like Unix, were designed entirely around ASCII. UTF-8 is the way in which Unicode is going to be used under Unix, Linux and similar systems.

The asterisk (*) says that the browser accepts any character encoding other than the two explicitly specified. However, the browser may not know how to display the text that uses other character encoding schemes.

                    

 


8859-1

Europe, Latin America, Caribbean, Canada, Africa

 

8859-2

Eastern Europe

 

 

8859-3

SE Europe/miscellaneous (Esperanto, Maltese, etc.)

 

8859-4

Scandinavia/Baltic (mostly covered by 8859-1 also)

 

8859-5

Cyrillic

 

8859-6

Arabic

 

8859-7

Greek

 

8859-8

Hebrew

 

8859-9

Latin5, same as 8859-1 except for Turkish instead of Icelandic

 

8859-10

Latin6, for Lappish/Nordic/Eskimo languages

   

 

Table 8.9:  Some Non-Western European Language Standards

•        HTTP_ACCEPT: It is a list of Internet media types in which the client prefers to receive data. The browser decides how to display the information based on the media type. For example, a data item of type image/gif is a GIF graphic file and needs to be rendered by the browser. Certain content, say of type application/pdf, requires the browser to run an auxiliary application of plug-in. In this case, a PDF viewer such as the Adobe Acrobat Reader needs to be run. Internet media types used by HTTP closely
resemble Multipurpose Internet Mail Extension (MIME) types originally designed as a method for sending attachments in mail over the Internet. Like MIME, media types follow the type/subtype format. Asterisks (*) represent a wild card. */* means accept all formats. Although it nominally accepts all formats, the browser brings up a dialog box for a format it does not know how to deal with, asking the user what to do. See a book such as Webmaster in a Nutshell [SQ96] for a list of Internet media types.

•    HTTP_USER_AGENT: It is the browser or the client that issued the request.

•        HTTP_ACCEPT_ENCODING: It specifies the if the browser can accept compressed data and automatically uncompress it. Browsers such as Netscape, Internet Explorer and Lynx can all accept data that is compressed using the gzip technique. What additional compression methods are a acceptable depends entirely on the browser. Compressing textual data can reduce the amount of transmission by 70% or more.

The CGI information in the environment variables includes the following.

•    GATEWAY_INTERFACE: It specifies the version of the CGI standard that the Web server implements.

•       REQUEST_METHOD gives the method used for transmitting data. It is GET or PUT. A regular Web page is accessed using GET. An HTML form can be submitted using GET or POST. The Web browser uses a simple encoding scheme to send data to the server. The GET method sends the requested
URL and additional data specific to the request in one HTTP transmission. The POST method uses two transmissions: one for the URL, and the other for any accompanying data.

•      SCRIPT_NAME, and SCRIPT_FILENAME: These are two ways of specifying the script being executed.

•    QUERY_STRING provides the query that the client sends to the server. The query is essentially the data sent by the client to the server when it makes a CGI request. It is clearly seen only in the case of a GET request. It is what follows the question mark (?) after the URL when a GET form sends data to the server.

Document information and redirection information may also appear among the environment variables. There is no redirection information in Figure 8.8.