Chapter 9

On Web Client Programming

In this chapter, we discuss at length how to write programs that work as Web clients. The World Wide Web is a ubiquitous application that sits atop the Internet which is a diffused network of networks encompassing the globe. It came into existence in 1990 with the first Web server and Web client developed by Berners-Lee. The first client was textual whereas the dominant browsers or clients at this time, viz., Netscape Navigator and Microsoft Internet Explorer are GUI-based.

As we are well-aware, the World Wide Web employs the client-server model of computing. We discuss client-server computing at length in Chapter 7. The dominant servers at this time are the Apache Server, and Microsoft’s IIS Server. Other severs include AppleShare IP that runs on Macintosh computers. Web servers are extremely complex programs that serve Web pages to Web clients. Commercial Web clients are extremely complex programs as well. As a result, we do not venture into writing of complex clients, but simple ones that can fetch Web pages and perform useful computation on these Web pages. By definition, a Web client communicates with Web servers. A Web server usually listens on port 80 on the machine that hosts it. Web clients and servers communicate using the HTTP protocol. A protocol defines a language’s syntax as well as imposes constraints on what can be said when. Web pages are written mostly using the HTML language although increasingly the language called XML is finding wider acceptance. In addition, when dynamic Web pages are served by a Web server, the Web server does not respond directly to a request from a client, but obtains responses by talking to an intermediary. This intermediary can take various forms such as a CGI program, an ASP program, a PHP program, or a JSP program.

In this section, we discuss how to write programs that can send a request to a Web server, get a response back, and perform useful computation with the page that is returned. Unlike a commercial Web browser, we do not format the page returned to make it pretty, or deal with graphics, audio or video. However, we write programs that can be immediately useful. For example, a comparison-shopping program that presents a table of prices for a specific product from competing on-line stores performs Web client programming. A program that monitors various on-line auction sites performs Web client programming. A program that obtains news stories from several Web sites and presents the stories best suited to an individual’s tastes performs Web client programming.

The communication takes place between a Web server and a Web client uses the HTTP protocol that sits atop the TCP protocol we discuss in Chapter 7. The TCP protocol sits on top of the IP protocol which is discussed at length in any book on computer networking. Thus, Web clients and servers must understand TCP and IP. A Web client such as a program that fetches a single Web page, or crawls the Web fetching many relevant pages, can be written employing sockets that use the TCP protocol. We wrote such programs in Chapter 7.

However, Perl makes writing Web clients much easier than starting from scratch using TCP-based sockets. It has a set of related modules for developing fairly sophisticated Web clients. We discuss the most important such modules in this section.