6.14 Exercises
6.14 Exercises
1. (Easy: Documentation Reading)
Read Perl documentation on file tests by running
%perldoc perlfun
There is quite a bit of additional material in this section of the Perl on-line documentation. Therefore, you will have to search a little to find information on the filetest operators.
2. (Medium to Hard: Removing Comments, Recursive Directory Processing, Text Processing, Long)
Write a program that removes all comments from Perl programs. Given a directory containing sub-directories and program files dispersed among the sub-directories, it creates a modified mirror of the top-level directory. Let it create a new top-level directory with the suffix .new attached to it. It creates any contained directories with the same names as the original, and stores program files after modifying them by taking out the comments. It checks to see if the first line of a program file has a comment starting at column 1. It does not remove this comment. A comment starts with #.
In general, a comment can start at any column. A comment starting with # continues till the end of the current line. The program name is given as a command-line argument. The modified program has the same name as the original, but is written to a different directory.
Assume the name of a Perl script file ends in .pl or .pm. The program modifies only Perl scripts. The modified program have the same name as the original, and are written to a different directory that has been created earlier. It copies non-Perl files as they are, to the right location in the mirror.
3. (Medium: Recursive Directory Processing, Text Processing, Long)
Write a program that takes a command-line argument that is the name of a directory containing embedded directories and program files. The program creates a mirror of the top-level directory. It asks for the location of the Perl interpreter on your computer. It then looks at the directory and traverses it recursively, and if the name of a file ends in .pl, checks to see if the first line is a comment specifying the location of the Perl interpreter. If not, it adds the comment as seen in programs in this book. The first comment line is not needed on PCs and older Macintosh operating systems. However, it is not a bad idea to place them if we
want the programs to be portable across systems. Remember paths on PCs and older Macintoshes are written differently than on a Unix machine. Files that are not Perl programs are copied as they are.
4. (Medium: Recursive Directory Processing)
Write a program that prints the name of every file in a directory and the number of lines in it. It prints the result to a file. The directory has embedded directories in it. Print the results in the form a horizontal histogram. Scale the numbers so that the histogram lines fit the screen. Use repetition of an alphabetic character, say x, to draw the lines.
5. (Medium: Recursive Directory Processing, File Tests)
Write a program that prints out the names of all text files in a directory recursively. The directory name is given as command-line argument. It prints the results in a manner so that containment of directories and files is clear.
6. (Medium: Recursive Directory Processing, Text Processing, Letter Frequency Counting)
Write a program that counts the frequency of letters in English (irrespective of case). Ignore all non-alphabetic characters. Process as much text as you can obtain, from any source. For example, process all the text files that you have under a certain large directory. Let the directory contain sub-directories. As you encounter a word, you will have to increment the number of times it has occurred so far. Use a hash to store frequency of occurrence. A word is a key and its frequency of occurrence is its value.
Study what are called Huffman codes that can be used to represent letters. Generate Huffman codes for the letters based on their frequency count.
7. (Medium: Recursive Directory Processing, Text Processing)
Write a program that reads a directory recursively and creates a mirror. It looks at text files and breaks long lines into smaller lines. It takes a command-line argument that is an integer n and checks to make sure that it is an integer. It breaks lines after the last non-blank character that occurs before the nth column of input. It removes the original files and directories once the mirroring is over.
8. (Medium: Recursive Directory Processing, Text Processing)
Write a program to remove trailing blanks and tabs from every file in a directory, recursively. Also, delete more than one consecutive blank line. Make a mirror first and once the mirroring is complete, remove the original directories and files.
9. (Medium: Recursive Directory Processing, Text Processing)
Write a program that reads a directory recursively. It reads the text files in it a paragraph at a time. It reports the lengths of the longest and the smallest paragraphs in the whole directory structure, in terms of numbers of words. It also reports the file names where they occur. The program also reports the lengths of the shortest and the longest sentences, in numbers of words along with the file names. Make any simple assumptions you need regarding when a sentence starts.
10. (Medium: Recursive Directory Processing, Text Processing, Program File Dependencies)
In Perl you can include one file in another by writing either use file to include a module in your program. Write a program that takes a set of directories as command line argument and prints out the names of all the modules that have been used in the Perl programs in the contained directories, recursively. It prints the modules only once. It then checks to see if the modules are available in a library directory given to the program as the first command line argument.
11. (Medium: Recursive Directory Processing, Text Processing, Text Substitution)
Write a subroutine replace that examines a file given to it and replaces the occurrence of $old by $new everywhere in the file. The function takes three arguments as specified in the example given below.
$old = "MCI, Inc.";
$new = "MCI WorldComm, Inc.";
$file = "index.html";
replace ($old, $new, $file);
Assume that $old can have several words that can occur in one or more consecutive lines of the file. For example, MCI, can occur in one line, and Inc. can occur in the next line. There can be other whitespaces between two words.
Now write a subroutine replaceR that is like replace, but performs the substitution recursively in all directories. That is, the substitution is performed in every file of every directory given as argument. A generic call to this function is given below.
replaceR ($old, $new, @FDList);
Here @FDList is a list of files and directories. You can call replace from replaceR if you want.
12. (Medium: Recursive Directory Processing, Soft Links)
In all the programs we have seen in this Chapter and the Exercises, we consider a directory structure to be a tree. However, it can be a little more complicated. There can be links from a name to a directory or file. For example, in Unix, we can make soft links using the ln command with the -s option. Thus, we can actually have a graph or a network instead of a tree. A tree has no loops whereas a graph or a network does. Modify all the programs in these Exercises so that they work with network of files.
13. (Easy: File Processing, Directory Processing)
Redo all the directory and file processing exercises so far using the file and directory modules discussed. Compare the amount of time taken using the two approaches, especially if the data being handled, i.e., the file structures are large.
14. (Medium: Tar and Untar, Software Installation, Research)
Being able to archive a set of directories and files is useful for writing software for a complex project. Suppose you can archive the relevant files for a project in the tar format. The tarred archive can be transferred to a machine where the project is to be installed. Study how the installation process for a project can be automated as much as possible. Describe how this can be done so that a Perl-based complex project can be installed on a Linux, a Macintosh, and a Windows machine. The same project tar ball is to be installed in the three different platforms. Remember a complex project normally would have many files that need to be housed at different locations in the target
machine.
