On Pattern Matching and Text Processing
One of Perl’s main strengths is its elaborate facilities for looking for patterns in textual documents. We can write a short program that can scan through many hundreds of files and look for simple or complex textual patterns. The program can also perform a global substitution in each of these files where one word or pattern is substituted by another. For example, a person managing a World Wide Web site for an organization or company may have a hierarchy of directories and files where he or she stores all relevant files. Assume there are several thousand files in this directory structure. Suppose the name of the company is AssamSoft. Now suppose the company buys another company called Maoi Technologies and changes its name to AssamMaoi Technologies. The person who manages the Web pages has the daunting task of scanning thousands of files and changing every reference of AssamSoft to the new name AssamMaoi Technologies. Of course it will take a long time to go through all the directories and all the files. It is possible that certain directories and files will be missed. It is also possible that certain occurrences of AssamSoft will be missed even in files that are scanned manually. This becomes an arduous task taking many days or weeks. It becomes an expensive and error-prone task.
However, a language such as Perl can come to rescue in a situation such as this. It is possible to write a Perl script that is quite short and performs this task without leaving out any files or directories or any occurrences of AssamSoft in the files that it scans. In addition, the program does so a lot faster than any human being can.
Situations such as this arise frequently in big organizations. This can happen in many
situations. It is possible that a person in charge of Web pages for an organization wants to see how many links to other Web pages there are in his or her site. It is possible he or she wants to check if people are inserting huge graphic or audio or video files in their Web pages. He or she wants to find out if there are links in the thousands of Web pages in his site that are dead in that they lead to pages that do not exist any more or were wrongly typed in the first place.
Among the high level languages that are widely available and popular, Perl has the most sophisticated pattern matching capabilities. We look at Perl’s pattern matching capabilities in depth in this chapter.
