4.3.2 Character Classes

4.3.2  Character Classes

  It is possible to specify that a single character in a string must be a member of a certain set or class. For example, if we write

 

/[abcdefghijklmnopqrstuvwxyz]/

 

we are saying that this pattern matches a single alphabetic character. So, if we match this pattern against the string

 

123456 xyz

 

it matches at x. Some other examples of classes that we can state are given below.

 

/[0123456789]/

/[a-h]/

/[a-zA-Z0-9_]/

 

Here, the first class matches a single digit. The second class matches any lower case character between a and h. We can specify a contiguous range of characters using the dash as in the second example. Several singly specified characters or ranges can be used to specify a single class as in the third example. This example represents a single alphanumeric character where an alphanumeric character is either a lower case or upper case alphabetic character, or a decimal digit between 0 and 9, or the underscore character _. If the
dash (-) is specified in a character class, it must be backslashed. However, a - can occur at the beginning or the end of the class without being backslashed as in [-+] or [+-].

It is possible to define the so-called negative character classes also. This is done by putting a caret (^) at the beginning of the character class. For example,

 

/[^0123456789]/

 

or

 

/[^0-9]/

 

is a pattern that matches with a single character that is not a decimal digit. Similarly,

 

/[^a-zA-Z0-9_]/

 

represents a single non-alphanumeric character.

To make matters a little simple in specifying character classes, Perl provides a few built-in character classes and negative character classes as well. They are the following.

\d is the same as [0-9]

\w is the same as [0-9a-zA-Z_]

\s is the same as [ \f\n\r\t]

The last one represents the white space character class and includes the space character, the form feed character \f, the newline character \n, the return character \r, and the tab character \t.

Perl also provides the following negative classes.

\D is the same as [^0-9]

\W is the same as [^0-9a-zA-Z_]

\S is the same as [^ \f\n\r\t]