On Expressions, Statements and Functions

Writing a program is like writing a recipe for a dish or directions to a destination. A program is a sequence of well-thought out and precise steps to achieve a goal. When we write a recipe for an interesting dish or directions to a chosen location in English, there are two significant aspects to the writing: syntax and semantics. What we write must be syntactically correct, and also make sense at the same time. The syntax may be proper English or an abbreviated version or a dialect of it. The words are the basic components of the text we compose syntactically as well semantically. A sequence of words makes a phrase, and phrases make a sentence. If we have several sentences, they may be structured in the form a coherent paragraph. We may have several paragraphs even in a simple recipe or set of directions. Thus, words, phrases, sentences and paragraphs constitute increasingly more complex components of the text. At each stage, we choose the components carefully so that are syntactically well-formed and that they make sense, and that they are useful to expressing the objective at hand. Quite similarly, a program is composed of its constituents. The lowest level constituent that corresponds to words of English are the literals, the variables, and the operators. They are used to build expressions. One or more expressions make a statement. One or more expressions and statements make a compound statement.

A programming language’s syntax and semantics are complex topics. In particular, among the many issues we need to understand when learning any programming language are the following.

  • The data types the language supports and the values the data types can take. Examples of data types are numbers, characters, strings of characters, lists, etc. Depending on the data type, there are restrictions on the values. For example, a number can usually have digits and at most one decimal point in it. It is not usually correct to write a number to contain an alphabetic character such as a or u.

  • The variables, how the names of variables are constructed, and where their values can be seen and used. Variables are like place holders. They are names of data elements. A variable may have been given a value by using an assignment statement or by other means, or may be unassigned at a certain point in time. Variables of different types have restrictions on how the names are spelled. For example, the name of a list variable in Perl always starts with the @ symbol. A variable can be global—seen and usable all through a program, or may be confined to a region of the program.

  • The operators and expressions the language supports. An operator specifies a simple and common computation performed on one or more parameters or arguments. For example, + is the addition operator that takes two arguments that it adds. An expression is an operator with its arguments. For example, 2 + 2 is an expression.

  • The statements that the programming language allows, and how control flows among the statements comprising a program as its is executed. A statement corresponds to either a simple or a complex sentence in English. A simple statement is composed of expressions and operators. A complex or compound statement corresponds to a full paragraph or even several contiguous paragraphs. Flow of control signifies in what order the statements of a program are executed.

  • The type of subroutines the programming language allows, how parameters are passed to subroutines and how values are returned. A subroutine is a group of statements that are usually given a name so that they can be talked about by using the name. A subroutine is an abstraction that can stand for a simple or complex task.

  • The subroutines, the variables and constants the language has been endowed with from start to provide basic vocabulary. Any self-respecting language provides some variable names, and possibly named constants that have been set aside by its designers. A modern programming language has a stash of pre-defined subroutines; these are called built-in functions.

  • The modules or packages that are available to a programmer. A language such as Perl has hundreds of useful modules that have been contributed by developers from around the world. Perl is language that is freely available. Its developers give the language away. There are many benefactors who have written many extremely useful and complex subroutines and packaged them such that they are available as a group. These are called packages or modules. One of Perl’s spectacular characteristics is the large number of valuable modules available for free downloading from the Internet.

Perl supports three primary data types: scalars, lists and hashes. More complex data types such as list of lists, hash of hashes, or list of list of hashes, can be created fairly easily. In this chapter, we mostly deal with scalars although lists and hashes are also gently introduced. The scalar data type in Perl refers to a data type that has only one component. This is in contrast to a multi-component data structure such as a list. The two primary scalar data types are numbers and strings. In Perl, the name of a variable indicates whether it refers to a scalar, a list or a hash. The name of a scalar variable starts with a $ symbol. The name of a list variable starts with an @ symbol. A hash is also called an associative array since it associates one or more keywords with corresponding values. The name of a hash variable in Perl starts with the % symbol. Thus, $a is the name of a scalar, @a is the name of a list variable, and %a is the name of a hash variable. These are three distinct variable names.

In Perl, we do not have to state the name of a variable before we use it. Stating the name of a variable before use is called declaring a variable. This is an absolute necessity in many programming languages including Java and C. This laxity in Perl is like Lisp. However, we can force a discipline on ourselves if we so desire, asking Perl to make certain that all variable names are declared before first usage. This is a good habit to inculcate because it assists in writing programs that can be easily debugged, maintained and reused. In Perl, a variable name that has not been declared is usually global and is available in the whole program. An explicitly declared variable can be made available only within a specific area of the program. Where a variable is visible and thus, usable is called the scope of a variable.

Perl has a large number of operators defined for its data types. An expression is usually composed of one or two parameters along with an operator. The parameters are called operands. An operator produces a result. Based on the kinds of operands an operator takes, and the value it returns, we have two important types of expressions, among others: arithmetic expressions and logical expressions. A statement is an expression with a side effect. A side effect may alter the value of one or more variables, or produce an output, or read an input value, among other possibilities. In this chapter, we discuss expressions and statements in Perl in great detail.

Let us consider a very simple program in Perl and illustrate the components.
Program 2.1

#!/usr/bin/perl 
#file simplest.pl

use strict;
my ($a, $b, @c);
$a = 20;
$b = "Hello!";
@c = (1, 2, 3, 4);
print '$a = ', $a, "\n";
print "\$b = ", "$b\n";
print "\@c = @c\n";

Here, each line of the program is a statement. Each statement is simple. There are no compound statements in this program. The first statement is required in a Unix system, and specifies where the Perl interpreter is located. It must start on the first column. It does not really do anything since it starts with # and thus, is a comment. The second statement is also a comment. Unlike the first comment, it is not required. The statement starting with use is called a pragma. It asks Perl to use definitions from a pre-defined module called strict.pm that the system knows where to find. This module ensures that variables are declared before first use. There are two scalar variables: $a and $b, and list variable @c that are used in the program. my declares them to be available within the current program’s file. The statement

$a = 20;

is an assignment statement. $a, the scalar variable name, and 20, a numeric literal are simple expressions. = is the assignment operator. $a = 20 is an expression that has the side effect of giving a value to $a. The use of ; at the end makes it a statement. The second assignment statement uses the string literal "Hello!" that is doubly-quoted. The assignment statement

@c = (1, 2, 3, 4);

has the list variable @c on the left-hand side of the assignment operator, and a list literal on the right-hand side. The list literal is (1, 2, 3, 4) and is composed of a sequence of numeric literals separated from each other by the punctuation symbol ,. The parentheses group them together into a whole. The print statement takes one or more arguments. Each argument is a string. If it is not, it is converted or coerced into one. For example, in the first print statement, $a is a number with the value 20. It is coerced into the string "20" before printing. In the second print statement, there is a string argument "$b\n". Here, \n is the escaped newline character. $b is a scalar variable written inside a doubly-quoted string, and hence its value is put in its place before printing. In other words, $b’s value is interpolated. The last print statement has one argument string: "\@c = @c\n". This string contains @c, a list variable whose value is interpolated. The individual elements are printed separated by the default list element separator, an empty space. The whole program is an implicit block delimited by the boundaries of the physical file simple.pl in which it resides. The output of the program is given below.

$a = 20
$b = Hello!
@c = 1 2 3 4