8.3.7 Security Issues in CGI Programs: Untaint.pm
8.3.7 Security Issues in CGI Programs: Untaint.pm
CGI programs are usually accessed through either links on Web pages, including from submit buttons (i.e., the ACTION attribute of a form). There are two primary issues one needs to be concerned about when dealing with a CGI program.
1. A CGI program, especially when called in the context of an HTML form, gets input from outside the CGI program. The data entered by an individual into the input and other boxes in an HTML form are sent to the CGI program on the server, across the network. Now-a-days, a browser can perform validation of the data entered using scripts in a language such as JavaScript. However, a CGI program should not make any such assumption, and therefore, do its part to validate the data.
2. The data sent by a CGI program to the server can potentially be very large in size, especially, if file uploading from a CGI program is allowed. If the data size is extremely large, not only can it slow down the server because it needs to allocate space, it may bring the server down causing a Denial of Service (DOS) attack. Even worse, the computer hosting the Web server may be brought to a state by the buffer overflow when it is possible to attack it using other means.
Both these issues are discussed below.
In Perl, any data that comes to a program from outside is considered tainted. Thus, any input to any program, not just a CGI program, can be considered tainted. However, it is crucial that we consider data input to a CGI program as tainted. There have been instances in the past when tainted data have caused damage to a Web server and the machine hosting the Web server.
The process of examining a tainted piece of information and validating it is called untainting or laundering. A programmer can do so fairly easily by passing the parameter data that arrives at a CGI program through a regular expression to see if the parameter’s value is what it expects. For example, in the program discussed in Section 8.3.6, the program can untaint the first name submitted from the form as follows.
$firstName = param ("firstName");
($firstName) = ($firstName =~ /^[a-zA-Z]+$/;
This piece of code captures the value of the CGI query parameter firstName, checks if it contains only one more alphabetic characters. It should ideally be able to check that it is a string whose length is bounded by a small integer, say 15 or 20. We are not doing this here.
There is a Perl module Untaint.pm that works in conjunction with the module Taint.pm that allows a Perl program, whether CGI or not, to untaint input that comes to the program from outside. For a Perl program to untaint inputs, whether from CGI queries, or from file input, or other outside sources, the program must run in taint mode. This can be done by running the Perl interpreter with -T option or switch. Thus, the first line in the CGI program that provides the location of the Perl interpreter must have the
-T option. It can have other options or switches as well.
A Perl program in the taint mode is very cautious. It does not trust any data or value that is not generated within the program. It does not allow the program to do much with such tainted or possibly corrupt or harmful data till the data have been untainted or laundered or validated.
Note that, ideally speaking, every CGI program, even the program that prints the value of CGI environment variables, discussed in Section 8.3.4 should untaint the values of the environment variables. The values come from outside the program and hence, cannot be trusted in the strictest sense till they are validated. It is possible that someone who has access to the computer where the Web server is located has put invalid, even harmful data (say, containing Unix shell special characters such as > or < or & with calls to
harmful programs) as values of environment variables. The following program is a rewrite of the program in Section 8.3.6 that echoes back values of CGI query parameters to the browser. This program works with the same HTML form shown in Figure 8.10. Of course, the value of the ACTION attribute must be changed to reflect the new CGI program.
Program 8.7
#!/usr/bin/perl -Tw
#assamListFormEcho1.pl
use CGI qw(:standard -nodebug);
use CGI::Carp qw(fatalsToBrowser);
use Untaint;
use strict;
#######Set CGI size limit; disable file upload
$CGI::POST_MAX = 1024; #max 1024 bytes posts
$CGI::DISABLE_UPLOADS = 1;
##########################
my $email = param ("email");
#$email = untaint (qr/^\w+@\w+$/, $email);
$email = untaint (qr/^[\w.]+@[\w.]+$/, $email);
my $firstName = param ("firstName");
$firstName = untaint (qr/^[a-z]+$/i, $firstName);
my $lastName = param ("lastName");
$lastName = untaint (qr/^[a-zA-Z]+$/, $lastName);
my $address = param ("surfaceAddress");
$address = untaint (qr/^[\d\w\s]+$/, $address);
my $telephone = param ("telephoneNumber");
$telephone = untaint (qr/^\d+$/, $telephone);
my $motherName = param ("motherMaidenName");
$motherName = untaint (qr/^[a-zA-Z]+$/, $motherName);
my $homeTown = param ("homeTown");
$homeTown = untaint (qr/^[a-zA-Z]+$/, $homeTown);
#######Write to browser
print "Content-type: text/html\n\n";
print <
Subscribe to Assam List
Subscribing to Assam List
This is the data you entered.
Email:
$email
First Name:
$firstName
Last Name:
$lastName
Address:
$address
Telephone Number:
$telephone
Mother's Maiden Name:
$motherName
Home Village/Town/City:
$homeTown
This is just a test form handling program that echoes what you entered
and does nothing else.
Thank you!
Assam List Administrators
BROWSER_TEXT
The program uses the CGI.pm module and imports the standard set of functions that is extensive and is sufficient for most needs. We use the functional interface to the CGI.pm module. It uses the -nodebug pragma to indicate that we do not provide input from the command-line for testing purposes. We can change it to -debug if needed. We can then provide inputs from the command-line for testing purposes. We use the
CGI::Carp.pm module and export the fatalsToBrowser function so that CGI errors are reported to the browser. We use Untaint.pm module to validate, untaint or launder the input from the CGI query. We should untaint the input as soon as they are obtained so that do not get any chance to corrupt other data by taking part in computation. In fact, in the taint mode, Perl chokes and dies if we try to use tainted data in most computations.
Let us look at one untainting episode.
my $email = param ("email");
$email = untaint (qr/^[\w.]+@[\w.]+$/, $email);
The email CGI query parameter’s value is read and is assigned to the scalar $email. The value is next validated using the untaint function of the Untaint.pm module. untaint takes two arguments: a regular expression compiled with the qr operator, and a value to untaint. The regular expression is used to perform the validation. The regular expression must
match the value for untaint to succeed. If untaint succeeds, it returns the value; if it fails, it croaks. croak is a function made available by the CGI::Carp.pm module, or the Carp.pm module. It is similar to die, except that it reports where the error is in the code that called. For example, if a certain function or method in a module called
X.pm is called and this function or method dies, croak reports not where the death occurred inside the X.pm module, but where it occurred in the code that called the method or function defined in X.pm. For example, if we entered a value that is unacceptable by the corresponding regular expression, for the email field, say jugal.kalita, the CGI program the message
shown in Figure 8.11 to the browser. To be doubly secure, beyond untainting, we should see that the value is within a reasonable length.
Figure 8.11: HTML Form whose email Field Cannot be untainted
In this program, we have untainted scalars. Arrays can be untainted as well. Elements of a hash can be validated using untaint, using different regular expressions for values corresponding to different keys, if necessary. One is requested to consult documentation on Untaint.pm to learn more details about how tainted values can propagate through a program, what operations are not allowed on untainted variables, and how data structures other than scalars can be untainted.
It is easy to note that untainting a scalar requires writing a regular expression that matches the expected value. This may not always be straigthforward. For example, if we want to validate a URL that has been input from a CGI program, it may require writing a complex regular expression that covers all possible ways in which a URL can be specified, e.g., with our without a port number, with or without the http:// in front, absolute or relative, or with or without a trailing /, etc. If we want to validate a credit card number, one may require complex regular expressions as well. Validating a date in all possible formats is not easy either. Therefore, there are a few modules that have been specifically written to launder different kinds of inputs. Some examples include CGI::Untaint::creditcard.pm, CGI::Untaing::date.pm, CGI::Untaint::email.pm, and
CGI::Untaint::url.pm.
Before finishing up the discussion of the program, it must be noted that it takes more precautions in addition to untainting the values of the CGI parameters. This is done right in the beginning of the program before any CGI parameter values have been captured using the param function.
$CGI::POST_MAX = 1024; #max 1024 bytes posts
$CGI::DISABLE_UPLOADS = 1;
The first line says that the maximum amout of data that the form can send to the CGI program is 1024 bytes. This includes all the input boxes and file uploads if any. The HTML form that calls the current CGI program does not have a file upload box. If there were one or more file upload boxes, the total size of all files and other input strings is within the byte limit specified by $CGI::POSTMAX. Thus, it should be set to a value that is reasonable for the purpose at hand. There are ways in which a CGI program can be coaxed to accept file uploads even though the corresponding HTML form does not have any upload box. Being extremely
security conscious, file uploading is disallowed by setting $CGI::DISABLE_UPLOADS to a value other than zero. If file uploads are desired, this variable should be set to 0.
It is extremely important that a CGI program is safe. Therefore, it is recommended that the two lines of code given above be placed on top of any CGI program. These two lines should be placed before any CGI related processing starts. If the functional interface to CGI.pm is used, the values should be set before any param or similar CGI.pm functions are called. If the object-oriented interface to CGI.pm is used, the values should be set before any call to the
new constructor of the CGI.pm module. Quite frequently, even cautious programmers forget to write these lines of code in their CGI programs. Thus, there is a module called CGI::Safe.pm that can be used to set these two values automatically. The default value allowd by CGI::Safe.pm for the $CGI::POSTMAX scalar is 512*1024 or 512 kilobytes. The default value for $CGI::DISABLE_UPLOADS is 1. The default values can be easily
overriden. The CGI::Safe.pm module takes a few other safety precautions as well. The interested reader is requested to consult documentation on CGI.pm and CGI::Safe.pm.
