6.3 Reading Directories Recursively
6.3 Reading Directories Recursively
We now write a program that reads a directory recursively and prints the list of files in the directory and in all subdirectories. We expand the subroutine used in the previous program and make it recursive for achieving our objective.
Many people find it difficult to write recursive programs. Recursive programs can be avoided in many situations, but in a context like the present one, they are the natural choice. A recursive subroutine calls itself one or more times. For a recursive subroutine to terminate, the subroutine must have one or more termination conditions that must be reached by the subroutine after a sequence of calls. Finding the termination conditions and specifying them in the program correctly is half the battle in writing recursive programs. Recursive subroutines are usually more expensive than subroutines that use loops, but if time efficiency is not of crucial concern, they are useful and natural in many
situations.
The program that follows has a subroutine called readdirR that is called recursively.
Program 6.3
#!/usr/bin/perl
use strict;
sub readdirR{
my @FDList = @_;
my $first = shift @FDList;
if (!$first){
return ();
}
elsif (-f $first){
return ($first, readdirR (@FDList));
}
elsif (-d $first){
opendir DIR, $first || warn "Cannot open directory $first: $!";
my @files = readdir DIR ;
closedir DIR;
@files = grep {$_ !~ /^[.]{1,2}$/} @files;
@files = map {"$first/$_"} @files;
return ($first, readdirR (@files), readdirR(@FDList));
}
}
####main program###########
$" = "\n";
my @FDToRead = @ARGV;
if (!@FDToRead)
{@FDToRead = ".";}
my @allFiles = readdirR (@FDToRead);
@allFiles = sort @allFiles;
print "@allFiles\n";
If the program is stored in file readdirR0.pl, the following are some example calls in Unix.
%readdirR0.pl *
%readdirR0.pl .
%readdirR0.pl ..
%readdirR0.pl a
In the first call, the program’s command line arguments consist of the names of all the files in the current directory. In the second call, the program is given the current directory as the only command line argument. In the third call, the program is given the parent directory of the current directory as its argument. In the fourth call, the argument is a file or directory with the name a. Thus, the program takes a list of files and directories as argument. The program returns the names of all the files and directories included recursively in the arguments given. If the program is called with a list of files only, the same list is
returned. But, if there are one or more directories in the command line arguments, these directories are opened recursively and the list of files in them returned. If there is no command line argument given, it lists the list of files in the current directory recursively.
Let us first look at the main part of the program. The program takes the command line arguments and calls the subroutine readdirR with the list of command line arguments. If no command line argument is given, readdirR is called with the directory named . which in the case of Unix stands for the current directory. readdirR reads the list of all files and directories recursively and returns the result list which is sorted and printed.
Let us now look at the subroutine readdirR. It is a recursive subroutine. In other words, there are one or more calls to itself inside its body. We see three recursive calls.
As mentioned earlier, a recursive subroutine must have one or more termination conditions. Otherwise, it runs for ever. Not providing a correct termination condition is one very common error that people make in writing recursive programs. The subroutine readdirR uses the local variable name @FDList for the list of files and directories sent to it. Here F stands for files and D stands for directories. The subroutine obtains the first element of @FDList by shifting it and calls it $first. Depending on what this first element is, it takes various actions in the code that follows. Now, it is possible that @_, the list of files and directories passed to the subroutine, is empty to begin with, i.e., before assigning @FDList and shifting it. This condition is used as a termination condition in what follows.
When readdirR is called the first time, it always gets a non-empty list of files and directories. The subroutine makes one or more recursive calls depending on the arguments the initial call gets. These recursive calls may make other recursive calls in turn. The recursive calls can be visualized in the form of a recursion tree showing which call makes which other calls with what arguments. The recursion tree must end in leaves that correspond to the termination conditions. In other words, every leaf corresponds to a call to readdirR with an empty argument. This call returns
immediately and returns nothing. When a recursive call returns, it passes the result back to the previous call which processes them to obtain its own results. Results are passed back until the initial call to the subroutine returns its results to the main program. This returned value is used to assign the variable @allFiles in the main program. This returned value contains all the files and directories with their relative path names starting from the current directory. This list is sorted and printed.
The subroutine readdirR has an if-elsif-elsif statement. The main action takes place here. Before anything is done by the subroutine, it tests to see if the termination condition corresponding to an empty input list @_ is satisfied. If the condition is satisfied, no recursive call is made and the subroutine does nothing. This ensures termination. The input list @_ is empty if the value of $first obtained by
shifting @_ is empty.
Next, the subroutine proceeds by case. Depending on whether $first is a file or a directory, it takes appropriate actions discussed below.
If $first is a file, the subroutine performs the following return statement.
return ($first, readdirR (@FDList));
To compute the returned value, a recursive call is made to readdirR with the argument @FDList, that is, the list of all files and directories but the first. The argument passed to the recursive call of readdirR is smaller than the argument passed to the original call. Similarly, other recursive calls that this recursive call makes have still smaller arguments, ensuring eventual termination of the program. This recursive call makes as many recursive calls as necessary to
readdirR. Once all recursive calls are finished and it has the list of all files and directories, it makes a list by placing $first in the beginning of the list returned by the recursive call. This new list contains all the files and directories that we are seeking and is returned to the main program.
If $file is a directory itself, the subroutine opens this directory, reads the list of files in it, and removes the two special files named . and .. referring to the current directory and the parent directory, respectively. It makes a call to map to prepend each file name with the name of the directory. The call is given below.
@files = map {"$first/$_"} @files;
This can be done with a loop also. This step is crucial because each Perl program has a home directory where it is situated and if it has to open and read files and directories that are not situated in the home directory, the names have to be qualified with directory information. Otherwise, the program will not be able to access the files or directories although it has their names. Finally, the subroutine makes two recursive calls and creates a list of files by placing $first in the front and returns the resulting list. This is the list of all files and directories requested. The return statement is given below.
return ($first, readdirR (@files), readdirR(@FDList));
The first recursive call recursively reads the list of all files and directories in the files and directories contained within the directory $first, except of course the . and .. directories. The second recursive call obtains the list of files and directories contained in the rest of the argument list passed to the original subroutine.
We now show the list printed by this program for several calls. Let the file where the program is stored be readdirR0.pl. The call
%readdir .
produces the output like what is given below. Here % is the Unix prompt.
.
./a
./a/a1
./a/a2
./a/a3
./a/ad1
./a/ad1/ad1a
./a/ad1/ad1b
./a/ad1/ad1c
./a/ad1/ad1d1
./a/ad2
./a/ad3
./argv.pl
./b
./c
./copy.pl
./copy0.pl
./copy1.pl
./file-age.pl
./file-read.pl
./file-read1.pl
./fileread.pl
./include
./junk
./oldest.pl
./printfile.pl
./printfile1.pl
./printfile2
./printfile2.1
./printfile2.2
./read-file.pl
./readdir.pl
./readdir1.pl
./readdirR.pl
./readdirR0.pl
./readdirR1.pl
./readdirR2.pl
./readdirR3.pl
./readdirR4.pl
./remove
./remove/rmTeXfiles.pl
./remove/rmTeXfiles1.pl
./remove/rmTeXfiles2.pl
./tls
./tls2
The output contains the list of all files in the current directory. The . is contained in this list because it is the name of the directory with which we call the program. If the call were
%readdirR0.pl *
the same list is produced, but it does not contain the . as an element. Also, the file and directory names are not prefixed with ./ as it is in the earlier call. Finally, if we make the call
%readdirR0.pl a
the output is the following.
a
a/a1
a/a2
a/a3
a/ad1
a/ad1/ad1a
a/ad1/ad1b
a/ad1/ad1c
a/ad1/ad1d1
a/ad2
a/ad3
We now present another recursive program that does exactly the same as the program given above, however, it does it a little differently.
Program 6.4
#!/usr/bin/perl
use strict;
sub readdirR{
my ($dir) = @_;
opendir DIR, $dir || warn "Cannot open directory $dir: $!";
my @files = readdir DIR ;
closedir DIR;
@files = grep {$_ !~ /^(\.|\.\.)$/} @files;
my @simpleFiles = grep -f, (map {"$dir/$_"} @files);
my @directories = grep -d, (map {"$dir/$_"} @files);
my @recursiveFileList = map {readdirR ($_)} @directories;
return (@simpleFiles, @directories, @recursiveFileList);
}
##main program###########
$" = "\n";
my @FDToRead = @ARGV;
if (!@FDToRead)
{@FDToRead = ".";}
my @simpleFiles = grep -f, @FDToRead;
my @directories = grep -d, @FDToRead;
my @recursiveFiles = map {readdirR ($_);} @directories;
my @allFiles = sort (@FDToRead, @recursiveFiles);
print "@allFiles\n";
This program uses map liberally. In the main program, we separate out the command line arguments into two lists: @simpleFiles—a list of files and @directories—a list of directories. It then calls the subroutine readdirR once for each member of the list @directories. This is accomplished in the line given below.
my @recursiveFiles = map {readdirR ($_);} @directories;
The subroutine readdirR takes only one argument–a directory. The map function is used to call readdirR on each element of @directories. The names of the files and directories returned by this recursive call is stored in @recursiveFiles. The main program then creates the list @allFiles by concatenating two lists:
@FDToRead—the list of files and directories given as command line argument; and
@recursiveFiles—the list of files and directories obtained by recursively reading the directories from among the command line arguments.
The subroutine readdirR gets only one directory as argument. It reads the list of files in the directory and removes the special files . and .. from the list. From this list, like in the main program, it separates out the files and directories, and then recursively reads the directories. It returns the list of files and directories after all recursive calls terminate by executing the statement given below.
return (@simpleFiles, @directories, @recursiveFileList);
