3.1.17 Sorting Elements of an Array
Elements of a list can be sorted using the built-in sort function. Consider the following simple program.
Program 3.15
#!/usr/bin/perl
#file sort2.pl
use strict vars;
$" = ", ";
my @friends = ('Reeves P. Smith III', 'Matthew Van Zandt',
'Kristofer Todd', 'Ryan Slaughter');
@friends = sort @friends;
print "@friends\n";
The program prints
Kristofer Todd, Matthew Van Zandt, Reeves P. Smith III, Ryan Slaughter
as the sorted list with the comma as the item separator.
Sorting takes place using the default ascending order for strings. Undefined values go to the front, defined null values are next, and then strings with values in the proper alphabetical order. In the following program, one of the values to be sorted is undefined, being an uninitialized element of an array.
Program 3.16
#!/usr/bin/perl
#file sort3.pl
use strict vars;
$" = ", ";
my @friends99;
my @friends = ($friends99[12], 'Reeves P. Smith III', 'Matthew Van Zandt',
'Kristofer Todd', 'Ryan Slaughter');
@friends = sort @friends;
print "@friends\n";
The output of this program is
, Kristofer Todd, Matthew Van Zandt, Reeves P. Smith III, Ryan Slaughter
The undef value comes in the front, but we know that undef is printed as the empty string.
Now, if we sort a list of numbers, say (53, 29, 11, 32, 7, 112, 291), Perl returns (11, 112, 29, 291, 32, 53, 7). Perl does not care that we have all numbers in the list. It sorts them as if they were all strings. Once again, if we sort the list (1, "2", "three", 4, ÕfiveÕ), Perl returns (1, 2, 4, five, three). Perl does so because it assumes every element in the list to be sorted is a string, and as a results sorts them as characters although some of the elements may be numbers.
sort can optionally take a first argument that is a subroutine. If such an argument is given, the subroutine must be one that takes two arguments. The subroutine must compare these two arguments and return either a negative number, 0, or a positive number. If a negative number is returned, the first argument given to the subroutine comes before the second argument in the final sort order. If 0 is returned, the two arguments are equal and they can occur in any order in the sorted list. If a positive number is returned, the first element appears after the second element in the final sorted order.
Thus, to sort a list of numbers, we write a subroutine that compares two numbers at a time, and returns a negative value, zero, or a positive number, depending on if the first number is smaller than, equal to, or greater than the second number.
Program 3.17
#!/usr/bin/perl
use strict vars;
$" = ", ";
#Subroutine for comparing two numbers
sub inAscendingOrder{
if ($a < $b) {return -1.1;}
elsif ($a == $b) {return 0;}
else {return 1.2}
}
my @numbers = (1, 2, 11, 12, 121);
@numbers = sort inAscendingOrder @numbers;
print "@numbers\n";
The output of the program is a printout of the sorted list.
1, 2, 11, 12, 121
There are several things to note about the subroutine used for sorting. It takes two arguments. However, the arguments are implicit. The first argument must be referred to by the name $a inside the subroutine, and the second argument by the name $b. This is kludgy, but this how Perl works. The authors of Perl say that Òin the interest of efficiency, the normal calling code for subroutines is bypassedÓ in this situation.
The subroutine must not be recursive; that is it must not make a call to itself. This is also for the sake of efficiency. In addition, the normal convention that arguments passed to a subroutine are available as the value of the special array @_ inside the subroutine, does not hold if the subroutine is used for sorting. If the subroutine is used for sorting, there are only two scalar arguments, and they must be called $a and $b, in order. $a and $b must not be declared using my inside the subroutine.
The subroutine also must not perform any assignments to $a and $b. This is because the arguments $a and $b are passed to the subroutine from the calling program, by using so-called reference. That is, references to their addresses are passed; the literal values are not passed to the subroutine by the calling program. If assignments are done to $a or $b inside the subroutine, the original two elements of the array being compared will undergo change. In other words, sort will destroy the original array.
Finally, the array is supposed to return a negative number if the first number is less than the second number, and a positive number if the first number is larger than the second number. Therefore, although we have returned -1.1 and 1.2, respectively, in these two situations, any other values will also work.
The subroutine we have written for sorting numbers in ascending order is so commonly used that Perl provides a built-in operator <=> for a comparison of two numbers that returns -1, 0 and 1 respectively, in the three situations discussed. Therefore, the previous program can be written in the following manner also.
Program 3.18
#!/usr/bin/perl
use strict vars;
$" = ", ";
sub inAscendingOrder{
$a <=> $b;
}
my @numbers = (1, 2, 11, 12, 121);
@numbers = sort inAscendingOrder @numbers;
print "@numbers\n";
Perl, actually, allows us to write the body of the sorting subroutine as a block of statements instead of a subroutine. Therefore, the following also works.
Program 3.19
#!/usr/bin/perl
use strict vars;
$" = ", ";
my @numbers = (1, 2, 11, 12, 121);
@numbers = sort {$a <=> $b;} @numbers;
print "@numbers\n";
It makes sense to write the subroutine "in-line" as a block of code only if it is short. If it is big, stylistically, it is better if we write it as a separate subroutine.
If we want sort the numbers in descending order, we simply need to change the subroutine such that the order of usage of $a and $b is reversed in the comparisons. One of the following subroutines will work.
sub inDescendingOrder{
if ($a > $b) {return -1;}
elsif ($a == $b) {return 0;}
else {return 1}
}
sub inDescendingOrder{
$b <=> $a;
}
We can also write the subroutine "in-line" if we want.
If we are trying to sort a list of strings, we know that the string ascending order is used automatically. In other words, no subroutine argument needs to be passed to sort. However, if we want to write a subroutine, we can do so to mimic the default comparison function.
Program 3.20
#!/usr/bin/perl
use strict vars;
$" = ", ";
#Subroutine for comparing two strings
sub inAscendingOrder{
$a cmp $b;
}
my @friends = ('Rene Myers', 'Brooke Peterson', 'Sharon Heatherington');
@friends = sort inAscendingOrder @friends;
print "@friends\n";
cmp is a built-in function that is used to compare two strings. It is similar to <=> in what it returns. Instead of using cmp, we can spell out the details of the comparison as the following subroutine.
sub inAscendingOrder{
if ($a lt $b) {return -1;}
elsif ($a eq $b) {return 0;}
else {return 1;}
}
If we have a list that has a mix of numbers and strings, we can write a comparison subroutine that uses both cmp and <=>. Let us look at the following program and discuss what happens.
Program 3.21
#!/usr/bin/perl
use strict vars;
$" = ", ";
#Subroutine for comparing two strings
sub inAscendingOrder{
($a <=> $b) || ($a cmp $b);
}
my @list = (1, 'Rene Myers', 11, 'Brooke Peterson', 21,
'Sharon Heatherington', 111);
@list = sort inAscendingOrder @list;
print "@list\n";
The result printed by this program is the following.
Brooke Peterson, Rene Myers, Sharon Heatherington, 1, 11, 21, 111
Let us try to understand why this happens.
When the comparison subroutine inAscendingOrder compares two numbers, say 1 and 121, it returns -1 because 1 is less than 121. Therefore, 1 appears before 121 in the sorted list. Since -1 is considered true in Perl, $a cmp $b is not executed. ÑÑ is the so-called short-circuit or operator which stops evaluating the boolean conditions as soon as it gets a true value. If the two numbers compared are equal, 0 is returned by <=>; in such a case, $a cmp $b is called. Two equal numbers when compared as string are also equal. Therefore, 0 is returned by the subroutine. Two equal numbers can occur in any order in the resulting list.
If we are comparing two strings, say ÕaÕ and ÕabcÕ, the first comparison $a <=> $b compares them as numbers. Perl tries to salvage a number from both the strings, but because they do not contain any embedded digits, the salvaged numbers are both 0. As a result the number comparison returns 0. In such a case, $a cmp $b is used to compare and the proper order results in the sorted list.
If one argument to the comparison subroutine is a number, and the other is a string, say 121 and ÕabcÕ, the first comparison using <=> actually compares 121 and 0 (the number salvaged from ÕabcÕ) and returns 1. This causes the numbers to float to the end of the sorted list.
If in the program given above, we write the comparison function as
sub inAscendingOrder{
($a cmp $b) || ($a <=> $b);
}
the sorted list will be printed as
1, 11, 111, 21, Brooke Peterson, Rene Myers, Sharon Heatherington
because the cmp comparison will be done first. The cmp comparison will treat numbers also as strings. Therefore, the <=> comparison will never be done.
