7.3.3 Forking and Files
7.3.3 Forking and Files
When we fork() a child process, separate copies are created for the data handled by the processes. This is true for filehandles also. That is, a child process gets a copy of a filehandle. In other words, there are two filehandles with the same names (in two separate data segments). However, it is the same file with which the two filehandles are associated. Let us make this clear again. There are two filehandles (each with the same name), but only one file. So, if we write to the same filehandle, the same file is written to. If we read from the same filehandle, the pointer moves ahead in the same file.
We will first look at a program that opens a file and then creates two processes. The two processes write to the same file.
Program 7.15
#!/usr/bin/perl use strict; open (OUT, ">out.txt"); fork (); #two processes writing to the same file print OUT "After forking\n"; close OUT;
This program does not write to the file before forking. The two processes: parent and child, write to the file. They have two copies of the filehandle OUT. However, each copy is associated with the same file out.txt that is open for output. Therefore, the same line gets written two times to the file. Here is the output of this program.
After forking
After forking
Next, we modify this program a little bit so that the program writes to the file before forking. Here is the program.
Program 7.16
#!/usr/bin/perl use strict; open (OUT, ">out.txt"); #one process writing print OUT "Before forking\n"; fork (); #two processes writing print OUT "After forking\n"; close OUT;
When we examine the contents of the file out.txt, we see the following.
Before forking After forking Before forking After forking
This is not what we expected. It seems wrong that the string Before forking gets printed twice. Only one process is running before the fork and this should not have happened. However, understanding how a computer performs input-output functions readily explains the anomaly. When Perl writes to a filehandle (i.e., the associated file), it usually buffers the output. This means that a statement that prints to a file does not really write to the file right away, i.e., after every print statement. In reality, a
print statement writes to a buffer and the buffer is written out to the physical file from time to time, or when the filehandle is closed. In this case, the parent process writes to the buffer associated with the filehandle OUT. Then, before the buffer is written out to the file, the parent process forks creating a child process. Now, the parent process continues to have its OUT filehandle associated with the file. The child process creates a copy of the OUT filehandle; this filehandle is also associated with the same file. At the time of forking, a copy is made of the buffer associated with the OUT filehandle for the child. The child gets whatever was present in its own copy of the buffer. Thus, the copy of the buffer the child gets also has the line Before forking just as the original copy of the buffer that the parent’s filehandle has. After forking, both filehandles write to their individual buffers. At a certain point before the filehandle is closed (or when the filehandle is closed) the two buffers are written out to the physical file. This explains the contents of the output file.
To avoid this situation, we can write code that flushes the parent’s filehandle just before forking. Flushing means we force the filehandle to be emptied and written out to the physical file. There are several ways we can flush a buffer, and from the moment of flushing, make the file handle unbuffered. In the program that follows, we use a module called IO::Handle that has a function that helps us flush a buffer. The function is called autoflush.
Program 7.17
#!/usr/bin/perl use IO::Handle; use strict; open (OUT, ">out.txt"); print OUT "Before forking\n"; #flush the buffer associated with the parent process autoflush OUT 1; #flush the output, i.e., print to file immediately fork (); #Both processes print this line print OUT "After forking\n"; close OUT;
The statement that flushes the buffer is given below.
autoflush OUT 1;
autoflush takes two arguments: the filehandle, and an integer. 1 means unbuffer the output, that is flush what’s already in the buffer and from now on, print it right away to the file instead of putting it in a buffer to be written to the file later. Therefore, in this program, the parent process’s buffer is written out before the fork. After the fork both processes have copies of the filehandle OUT and both are unbuffered. Therefore, the output printed to the file is
what we expect.
Before forking After forking After forking
We must note that unbuffering is quite likely to slow the program down. This is because the actual process of writing out to file on the disk is time consuming. The buffers are RAM and writing to RAM is much faster than writing to disk.
Discussing how files can be shared for reading becomes a little complex and is discussed later in the chapter. In Perl, it is difficult to read a line of text in an unbuffered mode
