7.4 Sockets
7.4 Sockets
The Unix operating system uses sockets as one means for allowing two computer programs to communicate. The two programs can reside on the same machine, but it is more interesting if the programs are on two different computers on a network, possibly the Internet. In such a case, it is possible for two programs running on two different computers, whether across the hall or across the world, to communicate. Such programs work using the client-server model of computing. The server is a program that runs continuously on a computer. It usually does not do anything on its own except wait for some request to come to it. The client program does not run all the time. It runs
only when the user activates it.
A client usually sends a request to a server program running on the same or another machine in the network. When a server program gets a request from a client program, it performs the requested operation, and sends the result back to the client. This is how all modern service-based programs run. Some of the most prominent and useful client server programs are Web servers and clients (browsers), electronic mail servers and clients (e-mail readers), and ftp servers and clients for transferring files between two machines on a network.
Sockets are a tool that can be used to build such client-server programs. As we already know, there are two counterparts in such programs—the client runs on one machine, and the server runs on (possibly) another machine. For the two programs to communicate, each program must create a socket. When a client program sends a request to the server, the client sends the information to a certain port on the server. The server has a socket attached to this port. The port is simply a specialized memory location that the server program checks frequently to see if there is a request or some data sitting at it. Once the server finds something on the port, it performs the operations requested with the
data given, obtains any results and sends the results to the socket associated with its port. The socket is smart enough to send the information back through the network to the client program’s machine at the specified port used by the client. The client program has a socket associated with the local port on its machine. It reads the information that has arrived at the socket and does whatever it deems appropriate.
A specific example pair is a Web server and a Web client. The Web server runs a daemon process on a machine connected to the Internet. A daemon is the Unix name for a process that runs all the time. On a Unix machine, the Web server is called the httpd daemon. The httpd daemon process listens to a certain port. The default port is 80 although someone setting up a Web server can specify another port such as 1080 or 8080.
The Web client is a browser program that runs on a user’s machine. Examples of Web browsers are Netscape Communicator, Microsoft Internet Explorer, NCSA Mosaic, Opera, Lynx, etc. The user uses a browser program to get on the Web. When a user specifies a URL such as http://www.shillong.com and wants to read a Web page, the browser creates a socket associated with a specific port on the local machine. This socket is authorized to talk to a socket on port 80 of the machine serving the Web pages for www.shillong.com. When the request for a page arrives at port 80 of
www.shillong.com, the HTTP server works on the request. It retrieves the page requested by the browser and sends it to the socket on the specific port on the machine running the client program. Once the browser gets the requested HTML file, it displays the file and its contents.
The socket is an interface so that programmers can develop their own distributed applications. The socket interface has not been standardized by any institution although they are a de facto standard. This is because sockets work very easily with the Unix environment. Sockets are seamlessly integrated with the Unix operating system. An implementation of the socket programmer interface is available for the Windows operating systems also. It is called WinSock and has become the standard in the IBM-compatible PC world. There are sockets that work with Macintosh computers also, whether the older Mac OS (up to OS 9), or Unix-based OS X and later.
The socket interface is very symmetrical in server programs and client programs. With minor differences, the same sequence of systems calls is executed to set up connections by the client program, and to accept a connection by a server program.
Sockets work with two lower-level network protocols. A protocol defines a precise sequence of steps for communication. Thus, a protocol is a very constrained language for communication. The two lower-level protocols that sockets work with are the Transmission Control Protocol (TCP) and the User Datagram Protocol (UDP). We do not need to know many details about these protocols to write socket programs. One is advised to read a book such as Computer Networks and Internets [Com99], or Data and Computer Communications [Sta97] for details regarding how networks work and the differences between TCP and UDP.
In very simple terms TCP is a connection-oriented protocol. Thus, the TCP protocol can be likened to the way telephones work. When two people talk on the phone, there is a clear physical line of communication established between the two parties. The voice data goes back and forth between the two individuals on this line. Similarly, for TCP to work, a line or channel of communication must be established between the client and the server. Well-known applications such as the World Wide Web that use the higher level (application layer) protocol called HTTP use TCP as the protocol at a lower level. TELNET used for remotely logging on to a computer, and FTP used for transferring files between two
computers use TCP as a lower level protocol.
The UDP is also a lower level protocol at the same level as TCP in network operations. UDP is connectionless. In other words, it is like the postal system. When we mail a letter at the post office for a specific destination, no direct physical line of communication is established between the source and the destination. The mailed letter moves from postal station to postal station, and finally is delivered to the destination address by a mailman. The UDP protocol is similar. When data is sent by UDP, there is no physical line of communication established between the source and the destination. The data is sent from computer to computer on the route till it arrives at the destination. Remote
Procedure Calls (RPC) available on Unix and other systems use UDP as a lower-level protocol.
There are other significant differences between TCP and UDP, but we do not discuss them in this book.
