8.2 The Apache Web Server
8.2 The Apache Web Server
To understand how a Web server serves static HTML or XML pages, or dynamic pages created by CGI programs, we need to be somewhat familiar with Web servers. There are several Web servers in common use. Of these, the Apache Web Server is the most popular. The Apache project has a home page at www.apache.org from where one can download it for free. The Apache Web server works on machines with Unix or Unix-like operating systems such as Linux, Sun Solaris, and BSD. It also works on Macintosh machines with OS X built on top of BSD Unix. The Apache Web server is also available for various flavors of the Windows operating
system from Microsoft, Inc. To understand how the Apache Web server works, one should look at documentation at www.apache.org or a book such Apache Desktop Reference [Eng01].
The following discussion pertains primarily to the Apache installation on Red Hat Linux, a variation of standard Unix. Details may be different on other operating systems. A Web server other than Apache may allow one to configure it using a GUI, but the basic ideas are the same. There are many decisions an administrator or a webmaster needs to make when setting up a Web server. The Apache Web server comes as a part of the standard distribution in all Linux distributions including Red Hat Linux. It is also available as a standard tool in Mac OS X.
Modern operating systems are very security conscious. In the latest versions of Red Hat Linux, for example, most every network service comes shut off by default. So, the Apache Web server daemon httpd needs to be explicitly started by the systems administrator. If we want the Web server to start at boot time, this needs to be assured by the systems administrator by changing the boot script. This can be done by running a program such as ntsysv on a Red Hat Linux machine and indicating that httpd is one of the services to be started when the machine boots.
In Red Hat Linux, the Apache Web server is configured by editing the configuration file which is usually located at /etc/httpd/conf/httpd.conf. On Mac OS X Server, the configuration file for the Apache Web server is located at
/Library/WebServer/Configuration/apache.conf. This file can be edited using a text editor. The information contained is very crucial and thus, utmost caution should be exercised in editing the file. Only the root or super-user has permission to edit it. Editing this file without a clear understanding of what is being changed is an invitation to trouble.
The configuration file has many well-defined lines or sections that are described reasonably well with explanatory text or comment. A few of the lines relevant to our present discussion are shown below. These lines are called directives to the server. The directives are shown in the order in which they appear.
The first directive in the file from the Red Hat Linux Apache configuration file is the following.
ServerType standalone
This line specifies that the server runs as a stand-alone daemon process. A daemon process is a Unix server program. A server is a program that runs all the time doing nothing, but waiting for client requests. When a client makes a request, it responds appropriately. There is another possibility for running the Web server on a Unix machine, but the second option is not advised at this time since it is inefficient. The next directive in the file is the following.
ServerRoot "/etc/httpd"
This line specifies the root directory in which the administrative files related to the server resides. Typically, it contains sub-directories conf/ and logs/. The conf/ directory stores configuration files including the httpd.conf file under discussion. The logs/ directory contains details of Web page accesses as well as errors encountered in Web page accesses. A directive that follows a little later in the httpd.conf file specifies a listening port.
Port 80
This directive specifies the port number on the system at which the Web server receives requests from clients. The standard HTTP server port number is 80 although other port numbers can be used. Acceptable port numbers range from 0 to 65,535 () although numbers below 1024 are usually reserved. Another directive that follows shortly is the following.
DocumentRoot "/var/www/html"
This directive sets the directory from which the Apache Web server serves documents. In this case, Web pages reside on the system at /var/www/html. One can create sub-directories under this directory, as needed. The server usually attaches this path in front of the path obtained from a requested URL to create the fully qualified path to a requested Web page on the sever. For the Apache Web server program to work, one should avoid trailing slashes in specifying a directory in the configuration file, unless instructed to do so. The next directive of interest is the following.
UserDir public_html
The UserDir directive is the name of the directory that is appended to a user’s home directory if a ~user request in received. Thus, if kalita is a user on the machine pikespeak.uccs.edu, the URL
http://pikespeak.uccs.edu/~kalita/jk1.jpg is translated to the directory public_html in
kalita’s home directory on pikespeak.uccs.edu. So, if the user kalita’s home directory is
/home/kalita, the URL is translated to the path
/home/kalita/public_html/jk1.jpg. The next directive we discuss specifies the name of the default file that works as directory index.
DirectoryIndex index.html index.htm index.shtml index.php index.php4 index.php3 index.cgi
This is the name of the file to use when a client requests a directory by specifying a slash at the end of a URL. When several files are listed, the server looks for these files in sequence. Thus, if one types the URL http://pikespeak.uccs.edu/ in a browser’s Location box, the request is sent to
pikespeak.uccs.edu where a Web server is running. The Web server looks in the directory
/var/www/html and then looks for one of the files listed above. It looks for an index file in the sequence given above. As a second example, the URL
http://pikespeak.uccs.edu/~kalita/ looks for an index file in the directory /home/kalita on the server pikespeak.uccs.edu. Another directive that is relevant to our discussion is the following.
ScriptAlias /cgi-bin/ "/var/www/cgi-bin/"
The ScriptAlias directory specifies the directory where server scripts are found. A document found in an ScriptAliased directory is automatically treated as an application by the server. Thus, whenever a path such as
http://pikespeak.uccs.edu/cgi-bin/ is found, the /cgi-bin/ portion of the URL makes the server program aware that it is a script and thus, an application program must be run. In addition, the /cgi-bin/ portion of the URL is replaced by the server to its ScriptAlias. In this case, it becomes the path
/var/www/cgi-bin/ on the pikespeak.uccs.edu machine. The path given as ScriptAlias to the string /cgi-bin/ is the system CGI directory. Usually, the Apache Web server is set up such that in addition to the CGI files in the system directory (here, /var/www/cgi-bin), an individual user has his or her CGI programs in his or her own home directory. This can be achieved on a Unix machine by creating a soft link
from the system CGI directory to a sub-directory in the user’s home directory. A soft link is a software based pointer from a name to a file or a directory that exists. A user needs to consult one’s systems administrator to be able to create soft links so that the user’s CGI programs can be placed in the user’s home directory instead of the system CGI directory.
There is one more change that needs to be made to the httpd.conf file so that CGI programs actually work. The httpd.conf file, as it comes from the manufacturer, does not allow the execution of CGI programs. The section of httpd.conf that deals with CGI programs looks like the following to begin with.
<Directory "/var/www/cgi-bin">
AllowOverride None
Options None
Order allow,deny
Allow from all
</Directory>
This section needs to be changed so that the ExecCGI is an option. We also may allow FollowSysmLinks as an option so that users can have their CGI programs in their home directories, as briefly alluded to a little earlier. This allows the Web server to follow symbolic or soft links when starting a CGI program. For more details, the reader is requested to consult one’s systems administrator. The change is incorporated in the directive section as shown below.
<Directory "/var/www/cgi-bin">
AllowOverride None
Options ExecCGI FollowSymLinks
Order allow,deny
Allow from all
</Directory>
There are many other directives in the configuration file. We have discussed just a few of the directives to make the basics of setting up a server clear. In summary, the configuration file specifies a host of details essential for the functioning of the Web server such as the port at which the server listens, the directory where log and configuration files are, the directory where static Web pages are, and the directory in which CGI script files are situated.
