4.7.2 Counting Frequencies of Letters
4.7.2 Counting Frequencies of Letters
We now write two more scripts that use split and pattern matching. The first one counts the frequency of alphabetic letters in a set of files. The second counts the frequency of words in a set of files. The first program follows.
Program 4.24
#!/usr/local/bin/perl while (){ @letters = split(//, $_); foreach $letter (@letters){ if ($letter =~ /[a-zA-Z]/){ $frequency{$letter} += 1; } } } @indexes = keys (%frequency); foreach (sort (@indexes)){ print $_, ": ",$frequency{$_},"\n"; }
The program looks at every line of every file given to it as an argument. As it reads a line, it splits that line into a list @letters. The splitting is done with an empty pattern argument. This means that each character is split out separately from the input line. Once this is done, the program goes through each of the characters and if the character is an alphabetic character, lowercase or uppercase, the program keeps a count of the number of occurrences in an associative array called %frequency.
Following the while loop, the program sorts the keys in %frequency. This sorting is done because the keys in an associative array can come out in any order if not sorted. Finally, the letters and their frequencies are printed. The output of a call of this program with one document file as argument is given below.
A: 1314
B: 1143
C: 1372
D: 807
E: 361
F: 382
G: 1423
H: 338
I: 740
J: 125
K: 213
L: 666
M: 611
N: 982
O: 391
P: 1019
Q: 11
R: 660
S: 855
T: 837
U: 769
V: 122
W: 357
X: 169
Y: 495
Z: 65
a: 17008
b: 3835
c: 6048
d: 6756
e: 25020
f: 4260
g: 2857
h: 7671
i: 15893
j: 436
k: 1531
l: 8222
m: 6464
n: 14394
o: 15621
p: 4828
q: 285
r: 14581
s: 13817
t: 18359
u: 6354
v: 1864
w: 2140
x: 1062
y: 2932
z: 272
