On Data and Communication Security
In Chapter 7, we discussed at length, how two programs, more correctly, two processes can communicate using the interface of sockets. It does not matter whether the two communicating programs are situated on the same machine, on two different machines in the same room or in different continents. We assume that all computers involved are on the Internet.
Computers on the Internet communicate through a framework called the TCP/IP Architecture. In the TCP/IP architecture, the communication takes place in four layers. They are
• the application layer,
• the transport layer,
• the Internet layer, and
• the network interface layer.
When we write socket-based programs, we deal with the application layer. Most programs, whether simple like the programs in Chapter 7 or complex such as a Web browser, use sockets behind the scene for communication. A socket, in abstract terms, is a five-tuple containing the following components: the source machine’s name or IP address, the source machine’s port, the destination machine’s name or IP address, the destination port and the protocol used. The protocol is usually TCP, but can be UDP as well.
Although we program using sockets, the actual communication between programs takes place at lower levels, below what the application program thinks it is. The commands and data that are transmitted by an application program, are translated into what the transport layer, the layer immediately below the application layer, understands. This is where the protocol specified in a socket description helps. If the protocol specified in the socket is TCP, the application level information is converted into information that uses the TCP specifications. If the socket uses UDP, the translation takes place into the UDP specification. These are complex protocols and we need not worry about their details. Information at this level is broken into small chunks called packets. The TCP or UDP packets are the ones that are sent across the Internet.
The Internet layer, the layer immediately below the transport layer, primarily uses a language and a set of communication rules called the Internet Protocol (IP). The TCP or UDP-based information is not really sent across the Internet. It is translated to IP-based information. The information is once again in small chunks called IP packets. IP packets are also known by the term datagrams.
Finally, the IP datagrams are converted into what is acceptable to the lowest layer, the network interface layer. The network interface layer deals with the actual physical network, and converts the information into a form that the network hardware can understand. An example of an actual physical layer is the Ethernet. Thus, the commands and data that the application program sends through sockets are actually transmitted as pieces of information at the lowest layer.
The Internet is a network of many networks. These networks use different underlying technologies. Network hardware and software on the Internet route the information at the network interface layer from network to network. There are specialized computers called routers that help guide the information along toward the destination. To go from the source to the destination, the information may have to go through many such routers. A router forwards a packet from one network to the next, on the way from the source to the destination.
Finally, the information arrives at the destination. The lowest level information is translated back into the Internet layer, i.e., into IP packets. The IP packets that arrive are converted back to transport layer (TCP or UDP) information. The packets are put together into a whole by the software at the transport level. The composed information is translated back to the application layer, and the socket at the application layer at the destination is given the commands and the data that were sent to it by the application at the source. The application program at the destination deals with the data appropriately, and possibly sends a response back to the source.
Thus, information moves from the source to the destination in small chunks called packets. There may be tens of thousands of packets at various layers involved in one communication in the application layer. There are copies of packets all along the way from the source computer to the destination computer. There are ways using which the packets can be viewed and captured in transit. When a captured message is read, the privacy of the message is violated. The captured packets can then potentially be used maliciously. An eavesdropper may want to change the commands or the accompanying data in part or whole. The eavesdropper may try to destroy the original information, make something up, and send the fake information. These are some of the security issues with Internet-based communication.
The main information security objectives that a cryptographic algorithm or tool provides are the following [MvOV96, Nic99, Sta99]:
Data Confidentiality: The content of information should be secret, except for those who are authorized to see it.
Data Integrity: It should be impossible to forge or tamper with the data in storage or during transmission. Thus, one must be able to detect data manipulation by unauthorized programs or individuals. Tampering includes operations such as insertion, deletion and substitution.
Authentication: Authentication involves identifying an entity or the data itself. Two parties entering into a communication may be required to identify each other. It should be possible to authenticate data delivered over a communication network in regards to the origin, the date of origin, the time sent, etc.
Non-repudiation: An entity should not be able to deny actions it has performed. For example, the receiver of a message should be able to prove that the message was sent by the purported sender. The sender should also be able to prove that the message was received by the purported receiver. When there is a dispute, there must be a means to resolve and find out who performed what action.
In this chapter, we deal with issues in data and communication security. This chapter discusses the concepts of message digests and the functionality Perl provides for creating message digests. A message digest is a short fingerprint or footprint of the message that is more or less unique for all practical purposes. However short or large the data or the message is, the digest is the same size for the same digesting algorithm. The message digest can be computed by the sender and sent together with the message or separately to the recipient. The receiver can compute the digest again on the received message. If the digest sent by the sender and the one computed by the receiver are the same, the message can be considered untampered.
Cryptographic algorithms are classified into two broad categories:
• Conventional or symmetric or private-key cryptographic algorithms,
• Asymmetric or public-key cryptographic algorithms.
Conventional cryptographic algorithms have been used from pre-historic times, primarily in diplomacy and at times of war, to send and receive sensitive information. In addition to the data to encode, conventional cryptographic algorithms use an additional piece of information called the key to scramble or encode data. The key can be a number, a string, or a sequence of bits. For conventional cryptography, the key is usually small in size. Latest conventional cryptographic techniques usually use keys that are 8 or more bytes long. Conventional cryptographic algorithms use only one key, and hence, are called symmetric. The key is sent by the sender to the receiver using a secure channel, e.g., during times of war, using heavily armed guards. The key is used to encipher or encode the data. The encoded data is sent over an unsecured channel to the receiving party. The receiving party decodes the data using the
copy of the key he or she has. With the advent of computers, conventional cryptographic methods have become very complex. A symmetric method called the Data Encryption Standard (DES) was used widely by the government and commercial enterprises for securing data for more than two decades. Recently, the strength of security provided by DES has been questioned and found inadequate. In 2001, a new conventional cryptographic algorithm called Rijndael was adopted as the Advanced Encryption System (AES) by the US to become the future encryption workhorse. In this chapter, we primarily discuss Perl’s implementation of DES although we briefly note other conventional cryptographic algorithms.
This chapter also discusses a very well-known public-key cryptographic algorithm or information scrambling algorithm called the RSA algorithm. This algorithm is based on some very interesting and salient properties of large prime numbers. There are other public-key cryptographic algorithms, but the RSA algorithm is the most celebrated and most widely known. In the RSA algorithm, every individual is in possession of two large related numbers called keys: a public key and a private key. The need for two large keys for each individual instead of one relatively small key for each pair of participants distinguishes public-key cryptography from conventional cryptography. Suppose there are two parties A and B that communicate. A has two keys, A’s public key and A’s private key. B has two keys as well, a public key and a private key. The sender A encrypts a message or scrambles a message the receiving party’s
i.e., B’s public key. By previous agreement B has made B’s public key publicly available to B or any other potential sender of encoded data to B. Thus, B’s public key is potentially known to the whole world. However, an individual keeps his or her private key secure. Scrambling involves performing mathematical computation with large numbers. The scrambled or encoded message is sent by A over normal unsecured channel or line of communication to the receiver B. The receiver B is in possession of a second key called B’s private key. B performs a mathematical computation on the received message using B’s private key and obtains the original message.
In cryptographic terms, the original message is called the plaintext or the cleartext, and the encoded message is called the ciphertext. The two keys in the RSA algorithm are extremely large numbers, say 1024 bits in length each. They are related to each other based on certain properties of hugely large prime numbers. The sender A knows the relation between the two. In fact, A thinks of two large prime numbers first, and then obtains A’s public and private keys later. However, A does not let others know what the prime numbers are. A also guards A’s private key with his life. Although A’s public key is known to B and whole world, it is mathematically impossible to compute A’s private key from it, using any computer that exists today, or even hundreds and thousands of computers, or even millions of computers, if the keys are sufficiently large in bit size. This is because of some surprising,
seemingly simple properties of large prime numbers. This algorithm was invented by three individuals: Rivest, Shamir and Adelman, three professors at MIT in the late 1970s. The algorithm bears their name. It was a discovery of geniuses. It changed the manner in which data is encrypted or encoded. For a while, an algorithm such as the RSA algorithm was of primary importance mostly to the government, particularly, the military, and possibly, large financial institutions. The government used algorithms like this for secure communication, military or otherwise. The banking and the financial industry used it too. However, with the invention of the World Wide Web in the late 1980s and its enormous and increasing popularity, encryption has become important to the common man so that private data can be securely communicated between a Web browser and a Web server. This can contain data such as bank records, credit card information used during electronic shopping, and medical records, among