4.8.2 The m//x Modifier
Quite frequently, regular expressions become quite complex and hence, difficult to read. An example of a complex regular expression that we have seen earlier in this chapter is repeated below.
/\\cite{([A-Z][a-zA-Z]*(19|20)?\d{2},)*[A-Z][a-zA-Z]*(19|20)?\d{2}/
In such a situation, readability may be improved by allowing whitespaces (space characters, tabs, return characters, etc.) to separate out characters or groups of characters inside a regular expression.
The m//x modifier allows us to use such needed spaces inside the specification of a regular expression. If we need to specify a space character inside a regular expression, it must be escaped except inside a character class. Space characters inside a character class are not ignored. That is, elements in a character class still need to be written compactly, without intervening space. Finally, the x modifier allows us to use # in a line of code inside the specification of the regular expression to indicate the beginning of a comment.
Here is a rewrite of the program that uses the complex regular expression given above using the x modifier.
Program 4.28
#!/usr/bin/perl
while (<>){
if (/
\\ #backslashed \
cite #the string
{ #the brace
( #BEGIN SUB-PATTERN
[A-Z] #citation index starts with an uppercase letter
[a-zA-Z]* #follow by zero or more letters
(19|20)? #first two digits of year are 19 or 20 if given
\d{2}, #last two year digits
)* #END SUB-PATTERN, repeat subpattern 0 or more times
[A-Z][a-zA-Z]*(19|20)?\d{2} #the same subpattern
/x)
{
print $_;
}
}
