4.8.2 The m//x Modifier

  Quite frequently, regular expressions become quite complex and hence, difficult to read. An example of a complex regular expression that we have seen earlier in this chapter is repeated below.

 

/\\cite{([A-Z][a-zA-Z]*(19|20)?\d{2},)*[A-Z][a-zA-Z]*(19|20)?\d{2}/

 

In such a situation, readability may be improved by allowing whitespaces (space characters, tabs, return characters, etc.) to separate out characters or groups of characters inside a regular expression.

The m//x modifier allows us to use such needed spaces inside the specification of a regular expression. If we need to specify a space character inside a regular expression, it must be escaped except inside a character class. Space characters inside a character class are not ignored. That is, elements in a character class still need to be written compactly, without intervening space. Finally, the x modifier allows us to use # in a line of code inside the specification of the regular expression to indicate the beginning of a comment.

Here is a rewrite of the program that uses the complex regular expression given above using the x modifier.

 Program 4.28

#!/usr/bin/perl

while (<>){
    if (/
        \\                #backslashed \
        cite              #the string
        {                 #the brace
         (                #BEGIN SUB-PATTERN 
          [A-Z]           #citation index starts with an uppercase letter
          [a-zA-Z]*       #follow by zero or more letters
          (19|20)?        #first two digits of  year are 19 or 20 if given
          \d{2},          #last two year digits
         )*               #END SUB-PATTERN, repeat subpattern 0 or more times

        [A-Z][a-zA-Z]*(19|20)?\d{2} #the same subpattern

       /x)
    {
    print $_;
    }
}