4.8.1.1 Counting Frequencies of Letters: Again
4.8.1.1 Counting Frequencies of Letters: Again
We now write two more versions of the letter frequency counting program that we discussed in an earlier section. The first version we present is almost identical to the previous program except that the conditional using pattern match has changed a little.
Program 4.26
#!/usr/local/bin/perl while (){ @letters = split(//,$_); foreach $letter (@letters){ if ($letter =~ /[a-z]/i){ $frequency{$letter} += 1; } } } @indexes = keys(%frequency); foreach(sort(@indexes)){ print $_,": ",$frequency{$_},"\n"; } The conditional of the if block inside the while loop is now written as $letter =~ /[a-z]/i instead of $letter =~ /[a-zA-Z]/
that we had earlier. In the newer version, we use the modifier i after the regular expression /[a-z]/. It is the ignore case option or modifier. It instructs Perl to ignore the case of the matched alphabetic character or characters. In this case, both match operations perform exactly the same because
/[a-z]/i
matches both lower case and upper case letters. The frequencies printed by the new version of the program is exactly the same as those printed by the previous version. In both cases, frequencies for lower case and upper case letters are computed and printed separately.
Now, suppose we want to print frequencies of letters in a set of files without regard to the cases of the letters. That is, frequencies for a lower case letter and the corresponding upper case letter are lumped together. The following is a modification of our program that does so.
Program 4.27
while (){ @letters = split(//,$_); foreach $letter (@letters){ if ($letter =~ /[a-z]/i){ $frequency{lc ($letter)} += 1; } } } @indexes = keys(%frequency); foreach(sort(@indexes)){ print $_,": ",$frequency{$_},"\n"; }
Here, the line of code that has changed from the previous version is the one that increments frequency. It is now
$frequency{lc ($letter)} += 1;
instead of
$frequency{$letter} += 1;
that we had in the previous two versions of the program. lc is a built-in function that takes a string argument and converts it into lower case. The output printed by the program for a sample call is given below.
a: 18322 b: 4978 c: 7420 d: 7563 e: 25381 f: 4642 g: 4280 h: 8009 i: 16633 j: 561 k: 1744 l: 8888 m: 7075 n: 15376 o: 16012 p: 5847 q: 296 r: 15241 s: 14672 t: 19196 u: 7123 v: 1986 w: 2497 x: 1231 y: 3427 z: 337
