Appendix A: CGI Programming Techniques in Perl

Table of Contents

Overview

Perl (Practical Extraction and Report Language) is not a CGI-specific programming language. In fact, it is a powerful language with many applications far beyond the needs of CGI. Thus, as a CGI programmer, your mastery of Perl need only extend initially to a small subset of the Perl universe.

In this appendix, we will try to identify the most commonly appearing Perl functions used with the CGI applications in this book to give the beginner a quick and dirty, but by no means all-inclusive, introduction to Perl. If you are a beginner to Perl as well as to CGI, this appendix should give you the very basic foundation which you will need in order to understand the scripts in this book. However, intermediate and advanced readers should only "selectively" browse this appendix as needed. Most of the information here should already be familiar to you.

If you would like more than a cheat-sheet, we strongly recommend that you go out and buy "Learning Perl" by Randall Schwartz and "Programming Perl" by Randall Schwartz and Larry Wall which are published by O' Reilly and Associates Inc. Both of these books outline Perl completely and, as such, are invaluable resources for the CGI programmer.

In the meantime, let this appendix be your guide.

Sending text to the Web browser

Every CGI application must output information. For example, both the HTTP header and the HTML code necessary to generate whatever graphical user interface (GUI) the client will be using to navigate must be sent to the Web browser.

Using the print function

The most basic method for sending text to the Web browser is the "print" function in Perl. The print function uses the following syntax:

print "[string to print]";

By default, the print function outputs data to standard output "<STDOUT>" which in the case of a CGI application, is the Web browser. Thus, whatever you tell Perl to print will be sent to the Web browser to be displayed.

For example, the following line sends the phrase, "Hello Universe" to the Web browser:

print "Hello Universe";

[Note] Of course, in order to comply with HTTP protocol, you must first send the HTTP header when communicating with a browser using the following syntax:

print "Content-type: text/html\n\n";

However, print does have some limitations. For example, the print function is limited in its ability to handle Perl special characters within an output string. For example, suppose we want to print the HTML code:

<A HREF = "mailto:selena@foobar.com">selena@foobar.com</A>

You might extrapolate from the syntax above, that you would use the following Perl code to display the hyperlink:

print "<A HREF = "mailto:selena@foobar.com">selena@foobar.com</A>";

Unfortunately, this would yield a syntax error. Additionally, because this is a very common line of HTML, it is a common source of Perl CGI customization errors. The problem lies in the incorporation of the at sign (@) and double-quote (") characters within the code.

As it so happens, these characters are "special" Perl characters. In other words, they each have special meaning to Perl and, when displaying them, you must take precautions so that Perl understands what you are asking for. For example, consider the double quote marks in the "mailto" hyperlink. How would Perl know that the double quote marks in the "mailto" hyperlink are supposed to be part of the string to be printed and not actually the end of the string to be printed? Recall that we use the double quote marks to delineate the beginning and the ending of a text string to be printed. Similarly, the at sign (@) is used by Perl to name list arrays.

[Note] Many other "special" characters exist and are discussed in other Perl references.

One solution to this problem is to escape the Perl special characters with a backslash (\). The backslash character tells Perl that whatever character follows should be considered a part of the string and not a special character. Thus, the correct syntax for the mailto hyperlink would be

print "<A HREF = \"mailto:selena\@foobar.com\">selena\@foobar.com</A>";

Using "here documents"

Unfortunately, much of what your CGI applications will be sending to the Web browser will include the double- quote mark. It becomes tedious, especially for long blocks of HTML code, to make print statements for every line of HTML and to escape every occurrence of a double-quote with a backslash. Consider the following table definition:

print "<TABLE BORDER = \"1\" CELLPADDING = \"2\" CELLSPACING = \"2\">";
print "<TR>";
print "<TD ALIGN = \"center\">Email</TD>";
print "<TD ALIGN = \"center\"> <A HREF = \"mailto:selena\@foobar.com\">selena\@foobar.com</A></TD>";
print "</TR>";
print "</TABLE>";

If any one of those backslashes are missing, the whole script breaks down. And this is a very small block of code!

One solution to the sending of large blocks of HTML code which incorporate the double-quote is to use the "here document" method of printing. The "here document" method tells Perl to print everything within a certain block of boundaried code. The "here document" uses the generic format:

print <<[TEXT_BOUNDARY_MARKER];
[Text to be printed]
[TEXT_BOUNDARY_MARKER]

For example, this code will print out the basic HTML header:

print <<end_of_html_header;
<HTML>
<HEAD>
<TITLE>My Title</TITLE>
</HEAD>
<BODY>
end_of_html_header

In short, the "here document" method of printing tells the Perl interpreter to print out everything it sees (print <<) from the print line until it finds the text boundary marker specified in the print line (end_of_html_header). The text boundary marker can be anything you like of course, but it is useful to make the flag descriptive.

Further, the ending flag must be "exactly" the same as the flag definition. Thus, the following code will fail because the final flag is not indented correctly:

print <<"    end_of_header";

<HTML><HEAD><TITLE>$title</TITLE></HEAD><BODY>
end_of_header

The final "end_of_header" tag should have been indented four spaces, but it was only indented two. In HTML this does not come across very well since the browser will disregard spaces...but, oh well, buy the book :)

[Note] Though the "here document" method of printing does avoid having to escape backslashes within the block to print, the at sign (@) and other special Perl characters still need escaping.

Using qq

"qq" is another Perl trick which helps a programmer solve the double-quote problem by allowing her to change the double-quote delimiter in a print statement.

Normally, as we said, double-quotes (") are used to delimit the characters in a print statement. However, by replacing the first quote with two q's followed by another character, that final character becomes the new print statement delimiter. Thus, by using "qq!", we tell Perl to use the bang (!) character to delimit the string instead of the double quotes.

For example, without using qq, a print statement that outputs, 'She said, "hi"'. would be written as

print "She said, \"hi\".";

But with the qq making bang (!) the new delimiter, the same statement can be written as

print qq!She said, "hi"!;

Why would we do this? Readability. If the print statement was surrounded with the normal double-quotes, then every double-quote would have to be escaped with a backslash whenever it was used within a string. The backslashes clutter the readability of the string. Thus, we choose a different character to delimit the string in the print statement so that we do not have to escape the double-quotes with backslashes.

Using the printf and sprintf functions

The Perl printf is much like the printf function in C and awk in that it takes a string to be formatted and a list of format arguments, applies the formatting to the string, and then typically prints the formatted string to standard output, which in our case, is the Web browser.

The printf syntax uses a double quoted string which includes special format markers followed by a comma- delimited list of arguments to be applied to those markers. The format markers are typically in the form of a percent sign followed by a control character.

For example, the generic format of printf might look like the following code:

printf ("[some text] %[format] [other text]", [argument to be formatted]);

In usage, we might use the %s formatting argument specifying a string and the %d formatting argument specifying a digit using the following syntax:

$name = "Selena Sol";
$age = 27;
printf ("My name is %s and my age is %d.\n", $name, $age);

The code above would produce the following output in the Web browser window:

My name is Selena Sol and my age is 27.

In reality, the printf function is rarely used in Perl CGI since, unlike C which almost demands the use of printf, Perl has much easier ways of printing. However, the printf routines are essential for another, more useful (to CGI developers) function, sprintf.

Unlike printf, sprintf takes the formatted output and assigns it to a variable rather than outputting it to standard output (<STDOUT>), using the following generic syntax:

$variable_name = sprintf ("[some text] %[format] [other text]", [string to be formatted]);

A good example of using sprintf comes from Chapter Seventeen, the HTML shopping cart script. In this script, we need to format subtotals and grand totals to two decimal places so that prices come out to numbers like "$99.00" or "$98.99" rather than "99" or "98.99876453782". Below is a snippet of code from that chapter which uses sprintf to format the price string to two decimal places.

$option_grand_total = sprintf ("%.2f\n", $unformatted_option_grand_total);

In this example, the variable, $unformatted_option_grand_total is formatted using the "%.2f" argument which formats (%) the string to two decimal places (.2f).

There are a multitude of formatting arguments besides "%s", "%d" and "%f", however. Table A-1 lists several useful ones.

Table A-1 printf and sprintf Formats

Format character Description
c Character
s String
d Decimal Number
x Hexadecimal Number
o Octal Number
f Floating Point Number

Formatting the output

Finally, we close this section with a note about formatting the outputs of your CGI applications so that your HTML is legible when "viewing the source". When reading an HTML document, a Web browser really does not care how the code is formatted. Since it ignores whitespace and newline characters anyway, a Web browser would be just as happy receiving one huge line of HTML code all smushed together as it would receiving a neatly formatted and human-legible HTML document. However, human readers (especially you when debugging) need to have HTML code in a format which helps you read lines and quickly analyze the output generated by your scripts.

Thus, it is very useful when printing with Perl, to use the newline character "\n". This character will introduce a newline into your output much like a <BR> does in HTML so that the text sent out by your CGI application will be formatted for easy reading.

For example, the following HTML could be displayed in two ways. First, you could type:

print "<TABLE>";
print "<TR>";
print "<TD>First Name</TD>";
print "<TD>Selena</TD>";
print "</TR><TR>";
print "<TD>Last Name</TD>";
print "<TD>Sol</TD>";
print "</TR></TABLE>";

This might seem pretty legible as Perl code, but you would receive the following HTML source code, compressed into one line:

<TABLE><TR><TD>First Name</TD><TD>Selena</TD></TR><TR><TD>Last Name</TD><TD>Sol</TD></TR></TABLE>

This code would be difficult to read, especially if an entire HTML page was formatted that way. On the other hand, you could use the following code:

print "<TABLE>\n";
print "<TR>\n";
print "<TD>First Name</TD>\n";
print "<TD>Selena</TD>\n";
print "</TR>\n<TR>\n";
print "<TD>Last Name</TD>\n";
print "<TD>Sol</TD>\n";
print "</TR>\n</TABLE>";

This time, when viewing the source, you would see the following HTML code neatly formatted:

<TABLE>
<TR>
<TD>First Name</TD>
<TD>Selena</TD>
</TR>
<TR>
<TD>Last Name</TD>
<TD>Sol</TD>
</TR>
</TABLE>

There are many other formatting constructs that can be included within a double-quote print or variable assignment, of course. Table A-2 outlines several important ones.

Table A-2 Formatting Constructs

Construct Description
\n Newline
\r Return
\t Tab
\b Backspace
\v Vertical Tab
\e Escape
\\ Backslash
\" Double Quote
\l Make next character lowercase
\L Lowercase every character until \E
\u Uppercase the next character
\U Uppercase every character until \E
\E Terminate \L or \U

It is not essential for you to keep formatting in mind, but it will make debugging much easier if it involves investigating the HTML code. Conscientious formatting is also considered good style in general.

[Note] Another benefit of using the "here document" method is that since the Perl prints out the text within the marker field exactly as you type it, you need not use the \n for newlines, because they are already incorporated.

Scalar variables, list arrays and associative arrays

What is a scalar variable?

You can think of a variable as a "place holder", or a "name" that represents one or more values. The generic syntax for defining scalar variables (also known as variables for short) is as follows:

$variable_name = value;

Thus, for example, we might assign the value of twenty-seven to the scalar variable named "age" with the syntax:

$age = 27;

The dollar sign ($) is used to let Perl know that we are talking about a scalar variable. From then on, unless we change the value of $age, the script will translate it to twenty-seven.

So if we then say:

print "$age\n";

Perl will send the value "27" to standard output, which in our case, will be the Web browser.

If we are assigning a word or a series of words to a scalar variable rather than just a number, we must mark the boundary of the value with single or double quotes so that Perl will know "exactly" what should be assigned to the scalar variable.

We use single quotes to mark the boundary of a plain text string and we use double quotes to mark the boundary of a text string which can include scalar variables to be "interpolated". For example, we might have the following lines:

$age = 27;
$first_name = 'Selena';
$last_name = 'Sol';
$sentence = "$first_name $last_name is $age";
print "$sentence\n";

The routine would print the following line to standard output:

Selena Sol is 27

Notice that the scalar variable $sentence is assigned the actual values of $first_name and $last_name. This is because they were "interpolated" since we included them within double-quotes in the definition of $sentence. There is no interpolation inside single quotes. Thus, if we had defined $sentence using single quotes as follows:

$sentence = '$first_name $last_name is $age';

Perl would print the following to standard output:

$first_name $last_name is $age

Using scalar variables

The benefit of substituting a scalar variable name for a value is that we can then manipulate its value. For example, you can "auto-increment" a scalar variable using the "++" operator:

$number = 1;
print "$number\n";
$number++;
print "$number\n";

Perl would send the following to standard output

1
2

You can also perform arithmetic such as:

$item_subtotal = $item_price * $quantity;
$shipping_price = 39.99 * $quantity;
$grand_total = $item_subtotal + $shipping_price;

Scalar variables are the meat and potatoes of CGI. After all, translating between the client and the Web server is essentially the formatting and the reformatting of variables. Be prepared to see them used a lot.

Using the "." operator

Another cool Perl trick is the use of the "." operator which "appends" a value to an already existing scalar variable. Thus, the following code would print out "Selena Sol":

$name = "Selena" . " Sol";
print "$name";

An alternative shorthand for appending to scalar variables is using the ".=" operator. for example, the following code does the same thing as the code above.

$name = "Selena";
$name .= " Sol";
print "$name\n";

Cropping scalar variables with the chop function

Sometimes, you do not want the entire value that has been assigned to a scalar variable. For example, it is often the case that the lines you retrieve from a data file will incorporate a newline character at the end of the line. In this book, data files often take advantage of the newline character as a "database row delimiter". That is, every line in a database file is a new database item. For example, here is a snippet from an address book data file:

Sol|Selena|sol@foobar.com|456-7890
Birznieks|Gunther|gunther@foobar.com|456-7899

When the script reads each line, it also reads in the newline information. Thus, the first line is actually represented as:

Sol|Selena|sol@foobar.com|456-7890\n

The final "\n" is a new line. Since we do not actually want the "\n" character included with the last database field, we use the chop function. The chop function chops off the very last character of a scalar variable using the syntax:

chop ($variable_name);

Thus, we would take off the final newline character as follows:

$database_row = "Sol|Selena|sol@foobar.com|456-7890\n";
chop ($database_row);

Finding the length of a scalar variable with the length function

Finding the length of a scalar variable is incredibly easy using the length function. The syntax of length is as follows:

length ([$variable_name]);

Thus, if the scalar variable $name equals "Selena", then the scalar variable $length_of_name will be assigned the value of six in the following line:

$length_of_name = length ($name);

Manipulating substrings with the substr function

Sometimes, you want to work with just part of a string that has been assigned. In WebBBS, the script uses the "substr" technique to get the message id number portion from the entire message filename. The substr function follows the syntax:

$substring = substr([string you want to extract from], [beginning point of extraction], [length of the extracted value]);

For instance, to assign "Sol" to the scalar variable $last_name you would use the following code:

$name = "Selena Sol";
$last_name = substr ($name, 7, 3);

The substr function takes the scalar variable $name, and extracts three characters beginning with the seventh.

[Note] Warning: as in array indexing, the substr function counts from zero, not from one. Thus, in the string "Gunther", the letter "t" is actually referenced as "3" not "4".

[Note] The final number (length of extracted value) is not necessary when you want to grab everything "after" the beginning character. Thus, the following code will do just what the previous did since we are extracting the entire end of the variable $name:

$last_name = substr ($name, 7);

Scalar variable naming conventions

Finally, you might notice that in these examples, we choose very descriptive names. Rather than saying for example:

$x = 27;
$y = "Selena Sol";

we say something like the following:

$age = 27;
$full_name = "Selena Sol";

Though it is not necessary that you make your scalar variable names descriptive (sometimes it can mean a lot more typing), we recommend that you do your best to choose names which will clearly communicate the function of the variable to you and to others a month or a year down the line.

List Arrays

What is a list array?

List arrays (also known simply as "arrays" for short) take the concept of scalar variables to the next level. Whereas scalar variables associate one value with one variable name, list arrays associate one array name with a "list" of values.

A list array is defined with the following syntax:

@array_name = ("element_1", "element_2"..."element_n");

For example, consider the following list array definition:

@available_colors = ("red", "green", "blue", "brown");

[Note] As you might have guessed, the at sign (@) is used to communicate to Perl that a list array is being named much like the dollar sign ($) is used to denote a scalar variable name.

In this example, the list array @available_colors is filled with four color "elements" in the specific order: red, green, blue, brown. It is important to see that the colors are not simply dumped into the list array at random. Each list element is placed in the specific order in which the list array was defined. Thus list arrays are also considered to be "ordered".

Using a list array

The benefit of ordering the elements in a list array is that we can easily grab one value out of the list on demand. To do this, we use Perl's subscripting operator using the format:

$array_name[list_element_number]

When pulling an element out of a list array, we create a scalar variable with the same name as the array, prefixed with the usual dollar sign which denotes scalar variables.

For example, the first element of the array @available_colors is accessed as $available_colors[0].

Notice that the first element is accessed with a zero. This is important. List arrays begin counting at zero, not one. Thus, $available_colors[0] is a variable place holder for the word "red". Likewise, $available_colors[1] equals "green" and $available_colors[2] equals "blue".

Figuring out how many element are in an array

Fortunately, Perl provides an easy way to determine how many elements are contained in an array. When used as a scalar, the list array name will be equal to the number of elements it contains. Thus, if the list array @available_colors contains the elements: red, green, blue and brown, then the following line would set $number_of_colors equal to four.

$number_of_colors = @available_colors;

[Note] Be careful when using this value in your logic. The number of elements in an array is a number counting from one. But when accessing an array, you must access starting from zero. Thus, the last element in the array @available_colors is not $available_colors[@available_colors] but rather $available_colors[@available_colors - 1].

Adding elements to a list array

Likewise, you can add to or modify the values of an existing array by simply referencing the array by number. For example, to add an element to @available_colors, you might use the following line:

$available_colors[4] = "orange";

Thus, @available_colors would include the elements: red, green, blue, brown, and orange.

You can also use this method to overwrite an element in a list array. To change a value in @available_colors, you might use the syntax:

$available_colors[0] = "yellow";

Now, the elements of @available_colors would be: yellow, green, blue, brown, orange.

Deleting and replacing list elements with the splice function

The splice function is used to remove or replace elements in an array and uses the following syntax:

splice ([array to modify], [offset], [length], [list of new elements]);

The array argument is the array to be manipulated. offset is the starting point where elements are to be removed. length is the number of elements from the offset number to be removed. The list argument consists of an ordered list of values to replace the removed elements with. Of course, if the list argument is null, the elements accessed will be removed rather than replaced.

Thus, for example, the following code will modify the @numbers list array to include the elements, ("1" , "2", "three", "four", "5").

@numbers = ("1", "2", "3", "4", "5");
splice (@numbers, 2, 2, "three", "four");

A more common usage of the splice is simply to remove list elements by not specifying a replacement list. For example, we might modify @numbers to include only the elements "1", "2" and "5" by using the following code:

splice (@numbers, 2, 2);

Advanced list array manipulation with the push, pop, shift, and unshift functions

Of course, once we have created a list array, we can do much more than just access the elements. We can also manipulate the elements in many ways. Throughout this book, list arrays are most often manipulated using the operators push, pop, shift and unshift.

Push is used to add a new element on the right hand side of a list array. Thus, the following code would create a list array of ("red", "green", "blue")

@colors = ("red", "green");
push (@colors, "blue");

In other words, the push operator, adds an element to the end of an existing list.

Pop does the exact same thing as push, but in reverse. Pop extracts the right side element of a list array using the following syntax:

$popped_variable_name = pop (@array_name);

Thus, we might pop out the value blue from @colors with the following syntax:

$last_color_in_list = pop (@colors);

Thus, the @colors array now contains only "red" and "green" and the variable $last_color_in_list is equal to "blue".

Unshift does the exact same thing as push, but it performs the addition to the left side of the list array instead of to the right. Thus, we would create the list ("blue", "red", "green") with the following syntax:

@colors = ("red", "green");
unshift (@colors, "blue");

Similarly, shift works the same as pop, but to the left side of the list array. Thus, we reduce @colors to just "red" and "green" by shifting the first element blue with the following syntax:

$first_color_in_list = shift(@colors);

Thus, @colors again contains only "red" and "green" and $first_color_in_list equals blue.

Though push, pop, shift, and unshift are the most common list array manipulation functions used in this book, there are many others covered in more complete references. Table A-3 Summarizes some of the common array manipulating operators.

Table A-3 Array Manipulation Operators

Operator Description
shift(@array) Removes the first element in @array
unshift (@array, $element) Adds $element to the beginning of @array
pop (@array) Removes the last element of @array
push (@array, $element) Adds $element to the end of @array
sort (@array) Sorts the elements in @array
reverse(@array) Reverses the order of the elements in @array
chop (@array) chops off the last character of every element in @array
split (/delimiter/, string) Creates an array by splitting a string
join (delimiter, @array) Creates a scalar of every element in @array joined by the delimiter.

Associative Arrays

What is an associative array?

Associative Arrays add the final degree of complexity allowing ordered lists to be associated with other values. Unlike list arrays, associative arrays have index values which are not numbers. You do not reference an associative array as $associative_array_name[0] as you did for the list array. Instead, associative arrays are indexed with arbitrary scalar variables. Consider the following associative array definition:

%CLIENT_ARRAY = ('full_name', 'Selena Sol', 'phone', '213-456-7890', 'age', '27');

In this example, we have defined the associative array %CLIENT_ARRAY to have three sets of associations.

[Note] the percent sign (%) denotes the associative array name just as the dollar sign ($) did for variables and the at sign (@) did for list arrays.

Thus, "full_name" is associated with "Selena Sol" as "age" is associated with "27". This association is discussed in terms of "keys" and "values". Each key is associated with one value. Thus, we say that the key "full_name" is associated with the value "Selena Sol".

Accessing an associative array

If we want to extract a value from the associative array, we reference it with the following syntax:

$variable_equal_to_value = $ASSOCIATIVE_ARRAY_NAME{'[key]'};

Thus, to pull out the value of the "name" key from %CLIENT_ARRAY, we use the following syntax:

$full_name = $CLIENT_ARRAY{'full_name'}

The variable $full_name would then be equal to "Selena Sol". Think of it as using a "key" to unlock a "value".

[Note] When accessing an associative array using a scalar variable as a key, you should not surround the key with single quotes because the scalar variable will not be interpolated. For example, the following syntax generates the value for the age key

$key_name = "age";
$age = $CLIENT_ARRAY{$key_name};

Accessing an associative array is one of the most basic CGI functions and is at the heart of the ReadParse routine in cgi-lib.pl which creates an associative array from the incoming form data. By accessing this associative array (usually referred to in this book as %in or %form_data), your CGI script will be able to determine what it is that the client has asked of it since HTML form variables are formed in terms of administratively-defined NAMES and client-defined VALUES using syntax such as the following:

<INPUT TYPE = "text" NAME = "full_name" SIZE = "40">

The "key" of the associative array generated by ReadParse will be "full_name" and the "value" will be whatever the client typed into the text box.

Using the keys and values functions

Perl also provides a convenient way to get a list of all the keys or of all the values in an associative array if you are interested in more than just one key/value pair. keys and values are accessed with the keys and values functions using the following formats:

@associative_array_keys = keys (%ASSOCIATIVE_ARRAY_NAME);

and

@associative_array_values = values (%ASSOCIATIVE_ARRAY_NAME);

Thus, the keys and values list of the associative array %CLIENT_ARRAY defined above can be generated with the following syntax:

@client_array_keys = keys (%CLIENT_ARRAY);

@client_array_values = values (%CLIENT_ARRAY);

In this example @client_array_keys would look like ("full_name", "phone", "age") and @client_array_values would look like ("Selena Sol", "213-456-7890", "27").

Adding to and deleting from an associative array

Like list arrays, associative arrays can be internally modified. The most common function, other than defining an associative array, is adding to it. Adding to an associative array simply involves telling Perl which key and value to add using the format:

$ARRAY_NAME{'key'} = "value";

or, using our example above:

$CLIENT_ARRAY{'favorite_candy'} = "Hershey's with Almonds";

%CLIENT_ARRAY now includes full_name, phone, age and favorite_candy along with their associated values.

Similarly, you can easily use the delete function to delete a key/value pair in an associative array. The delete function follows the syntax:

delete ($ASSOCIATIVE_ARRAY_NAME{'key'});

or for our %CLIENT_ARRAY example:

delete ($CLIENT_ARRAY{'age'});

Thus, %CLIENT_ARRAY would contain only full_name, phone, and favorite_candy.

Manipulating strings

Another important function provided by CGI is the manipulation of strings of data. Whether called upon to display or manipulate the contents of a data file, to reformat some text for Web-display, or simply to use in some logical routine or external program, Perl has a diverse array of string modification functions at its disposal.

Equality operators

One of the most important string manipulation functions is that of matching or testing of equality. It is an important tool because you can use it as the basis of complex logical comparisons necessary for the intelligence demanded of a CGI application.

For example, most of the applications in this book use one of the most basic methods of pattern matching, the "ne" operator, as the basis of their decision making process using the following logic:

if (the user has hit a specific submit button)
{
execute a specific routine.
}

Consider this code snippet:

if ($display_frontpage_submit_button ne "")
  {
  &display_frontpage;
  }

[Note] If you are confused about the usage of the "if" test, it is explained in greater detail in the "Control Structures" section later in this appendix.

The "ne" operator asks if the value of the variable $display_frontpage_submit_button is not equal to an empty string. This logic takes advantage of the fact that the HTTP protocol specifies that if a FORM submit button is pressed, its NAME is set equal to the VALUE specified in the HTML code. For example, the submit button may have been coded using the following HTML:

<INPUT TYPE = "submit" NAME = "display_frontpage_submit_button" VALUE = "Return to the Frontpage">

Thus, if the NAME in the associative array has a VALUE, the script knows that the client pushed the associated button. The script determines which routines it should execute by following the logic of these pattern matches.

Similarly, you can test for equality using the "eq" operator. An example of the "eq" operator in use is shown below:

if ($name eq "Selena")
  {
  print "Hi, Selena\n";
  }

When comparing numbers instead of strings however, Perl uses a second set of operators. For example, to test for equality, you use the double equal (==) operator as follows:

if ($number == 11)
  {
  print "You typed in 11\n";
  }

[Note] Warning: Never use the single equal sign (=) for comparison. Perl interprets the equal sign in terms of assignment rather than comparison. Thus the line:

$number = 11;

actually assigns the value of eleven to $number rather than comparing $number to eleven.

There are many other types of comparison operators, but they are better researched in more comprehensive texts. However, we do include several important ones in Table A-5

Table A-5 Numeric and String Comparison Operators

Numeric Op. String Op Description
== eq Equal
!= ne Not equal
< lt Less than
> gt Greater than
<= le Less than or equal to
>= ge Greater than or equal to

Regular expressions

Regular expressions are one of the most powerful, and hence, the most complicated tools for matching strings. You can think of a regular expression as a "pattern" which can be used to match against some string. Regular expressions are far more versatile than the simple "eq" and "ne" operators and include a wide variety of modifiers and tricks. Other books have detailed chapters focusing on the use of regular expressions, so we will only touch upon a few common uses of regular expressions found this book.

Pattern matching with //

Perl invokes a powerful tool for pattern matching which gives the program great flexibility in controlling matches. In Perl, a string is matched by placing it between two slashes as follows:

/[pattern_to_match]/

Thus, /eric/ matches for the string "eric". You may also match according to whole classes of characters using the square brackets ([]). The pattern match will then match against any of the characters in the class. For example, to match for any single even numbered digit, you could use the following match:

/[02468]/

For classes including an entire range of characters, you may use the dash (-) to represent the list. Thus, the following matches any single lower case letter in the alphabet:

/[a-z]/

Likewise, you may use the caret (^) character within the square brackets to match every character which is "not" in the class. The following matches any single character which is not a digit.

/[^0-9]/

Matching Operators

Further, the "//" operator can be modified to include complex pattern matching routines. For example, the period (.) matching operator is used to stand for "any" character. Thus, "/eri./" would match any occurrences of "eric" as well as "erik".

Another commonly used matching operator is the asterisk (*). The asterisk matches zero or more occurrences of the character preceding it. Thus, "/e*ric/" matches occurrences of "eeeeeric" as well as "eric".

Table A-6 includes a list of useful matching operators.

Table A-6 Commonly Used Matching Operators

Operator Description
\n Newline
\r Carriage Return
\t Tab
\d Digit (same as [0-9])
\D Any non-digit (same as [^0-9])
\w A Word Character (same as [0-9a-zA-Z_])
\W A Non-word character
\s Any whitespace character (\t, \n, \r, or \f)
\S A non-whitespace character
* Zero or more occurrences of the preceding character
+ One or more occurrences of the preceding character
. Any character
? Zero or one occurrences of the preceding character

Anchors

Regular expressions also take advantage of anchoring patterns which help match the string in relationship to the rest of the line. For example, the "\b" anchor is used to specify a word boundary. That is, "/\beric\b/" matches "eric", but it does not match "generic".

Similarly, the caret (^) anchor will match a string to the beginning of the line. Thus, "/^eric/" will match the following line

eric is my name

but it will not match

my name is eric

[Note] Warning: the caret (^) can be confusing since it is used as an anchor when included "outside" of the square brackets ([]) but is used as the "not" operator for a class when used "within".

Table A-7 summarizes a few of the most common anchors.

Table A-7 Common anchors

Anchor Description
^ Matches the beginning of the string
$ Matches the end of the string
\b Matches a word boundary (between \w and \W)
\B Matches on non-word boundary

String Modifiers

Finally, pattern matching can be used to modify strings of text. One of the most common methods of modification is substitution. Substitution is performed using the format

s/[pattern_to_find]/[pattern_to_replace_with]/

Thus, for example, the line:

s/eric/selena/

would change the line

eric is my name

to

selena is my name

The substitution function is modified most commonly with the /i and the /g arguments. The /i argument specifies that matching should be done with case insensitivity and the /g specifies that the match should occur globally for the entire string of text rather than just for the first occurrence.

Thus, the line

s/eric/selena/gi

would change the line:

I am Eric, eric I am

to

I am selena, selena I am

without the /i, you would get

I am Eric, selena I am

and without /g but with the /i, you would get

I am selena, eric I am

There are many, many different kinds of matching operators, anchors, and string modifiers. If you want a more detailed explanation we recommend that you find a good reference source on Regular Expressions. Otherwise, the above discussion should explain how we use operators and anchors in this book.

The =~ operator

Pattern matching can also be used to manipulate variables. In particular, the scripts in this book take advantage of the "=~" operator in conjunction with the substitution operator using the format

$variable_name =~ s/[string_to_remove]/[string_to_add]/gi;

For example, if we want to censor every occurrence of the word "Frack" from the client-defined input field "comment", we might use the line

$form_data{'comments'} =~ s/frack/censored/gi;

Using the split and join functions

Finally, regular expressions can be used to split a string into separate fields. To do so, we use the "split" function with the format:

@split_array = split (/[pattern_to_split_on]/, [string_to_split]);

For example, the applications in this book often use the split function to read the fields of database rows. Consider the following code snippet:

$database_row = "Selena Sol|213-456-7890|27";
@database_fields = split (/|/, $database_row);

Now @database_fields will include the elements "Selena Sol", "213-456-7890" and "27". Each of these fields can then be processed separately if need be.

The reverse operation is performed with the "join" function which uses the following format:

$joined_string = join ("[pattern_to_join_on]", [list_to_join]);

Thus, we might recreate the original database row using

$new_database_row = join ("\|", @database_fields);

[Note] Notice that in the above line, the pipe (|) symbol must be escaped with a backslash (\) because the pipe is a special Perl character.

Control structures

Some of the most powerful tools of Perl programming are control structures. Control structures are used to create the basic logic which drives many of the routines used in CGI applications. These control structures use Boolean logic to imbue your script with the intelligence necessary to manage the diverse needs of the clients with the abilities and requirements of the server.

Statement blocks

All control structures are divided into the control statement (which we will explain below) and the statement block. The statement block is simply a group of commands that are executed together. This block is grouped by enclosing the commands within curly braces ({}). For example, the following is a simple statement block.

{
  statement one
  statement two
  statement three
}

Perl will execute each statement in a statement block from beginning to end as a group. When, how, or if the script will execute the commands however, is determined by the control statement.

Using the if, elsif, else and unless control statements

The most common control statement used throughout the scripts in this book is the "if" test. The if test checks to see if some expression is true, and if so, executes the routines in the statement block. Perl uses a simple binary comparison as a test of truth. If the result of some operation is true, the operation returns a one and the statement block is executed. If the result is false, it returns a zero, and the statement block is not executed. For example, consider the following code:

if ($name eq "Selena Sol")
  {
  print "Hello Selena.\n";
  }

In this example, Perl checks to see if the scalar variable $name has the value of "Selena Sol". If the patterns match, the matching operation will return true and the script will execute the print statement within the statement block. If Perl discovers that $name is not equal to "Selena Sol" however, the print will not be executed.

[Note] Be careful with your usage of "eq" versus "=". Within an "if" test, if you write $name = "Selena Sol", you will actually be assigning "Selena Sol" to the variable $name rather than comparing it to the value "Selena Sol". Since this action will be performed successfully, the if test will always test to true and the statement block will always be performed even if $name did not initially equal "Selena Sol".

The if test also provides for alternatives: the "else" and the "elsif" control statements. The elsif alternative adds a second check for truth and the else alternative defines a final course of action for every case of failed if or elsif tests. The following code snippet demonstrates the usage of if, elsif, and else.

if ($name eq "Selena Sol")
  {
  print "Hi, Selena.\n";
  }
elsif ($name eq "Gunther Birznieks")
  {
  print "Hi, Gunther\n";
  }
else
  {
  print "Who are you?\n";
  }

Obviously, the else need not perform a match since it is a catch-all control statement.

The "unless" control statement works like an inverse "if" control statement. Essentially it says, "execute some statement block unless some condition is true". The unless control statement is exemplified in the code below:

unless ($name eq "Selena")
  {
  print "You are NOT Selena!\n";
  }
foreach

Another very useful control statement is the foreach loop. The foreach loop iterates through some list and execute a statement block for each iteration. In this book, the foreach loop is most commonly used to iterate through a list array. For example, the following code snippet will print out the value of every element in the list array @names.

foreach $name (@names)
  {
  print "$name\n";
  }

while

The while loop also performs iteration and is used in this book primarily for reading lines in a file. The while loop can be used to read and print out every line of a file with the following syntax:

open ([FILE_HANDLE_NAME], "[filename]");
while (<[FILE_HANDLE_NAME]>)
  {
  print "$_";
  }
close ([FILE_HANDLE_NAME]);

The script would print out every line in the file "filename" because the "$_", the Perl "default" variable, represents "the current line" in this case.

[Note] The process of opening and closing files is covered in the "File Management" section later in this appendix.

for loops

The for loops is another excellent control statement tool. The basic syntax of a for loop follows:

for ([initial condition]; [test]; [incrementation])
  {
  [action to perform]
  }

The "initial condition" defines where the loop should begin. The "test" defines the logic of the loop by letting the script know the conditions which determine the scripts actions. The "incrementation" defines how the script should perform the loop. For example, we might produce a visible countdown with the following for loop:

for ($number = 10; $number >= 0; $number--)
  {
  print "$number\n";
  }

The script would initially assign "10" to the scalar variables $number. It would then test to see if $number was greater than or equal to zero. Since ten is greater than zero, the script would decrement $number by subtracting one from the value of $number.

[Note] To decrement, you use $variable_name--. To increment, you use $variable_name++.

Executing the statement block, the script would then print out the number nine. Then, it would go back through the loop again and again, printing each decremented numbers until $number was less than zero. At that point, the test would fail and the for loop would exit.

Using logical operators (&& and ||)

Control statements can also be modified with a variety of logical operators which extend the breadth of the control statement truth test using the following syntax:

[control statement] (([first condition]) [logical operator] 
                    ([second condition]))
  {
  [action to be performed]
  }

For example, the "&&" operator can be translated to "and". In usage, it takes the format used in the following example:

if (($first_name eq "Selena") && ($last_name eq "Sol"))
  { 
  print "Hello Selena Sol";
  }

Translating the logic goes something like this: if the first name is Selena AND the last name is Sol, then print "Hello Selena Sol". Thus, if $first_name was equal to "Selena" but $last_name was equal to "Flintstone", the control statement would test as false and the statement block would not be executed.

Notice that we use parentheses to denote conditions. Perl evaluates each expression inside the parentheses independently and then evaluates the results for the entire group of conditions. If either returns false, the entire test returns false. The use of parentheses are used to determine precedence. With more complex comparisons, in which there are multiple logical operators, the parentheses help to determine the order of evaluation.

Similarly, you may wish to test using the double pipe (||) operator. This operator is used to denote an "or". Thus, the following code would execute the statement block if $first_name was Selena OR Gunther.

if (($first_name eq "Selena") || ($first_name eq "Gunther"))
  {
  print "Hello humble CGI book author!";  
  }

Formatting control structures

As a final note, it should be said that different programmers have different styles of representing statement blocks. Most programmers prefer to include the first curly brace ({) on the same line as the control statement using the following syntax:

if ($number = 1) {
  print "The number is one";
  }

Others prefer to include the curly brace ({) on the second line indented with the rest of the statement block as used throughout this section. To a certain degree, this is simply a matter of style. Perl does not care how you write your code so long as it is syntactically correct. Some say that it is easier to read the code if the curly braces are on their own lines. Others, especially those using the text editor EMACS, say that it is more efficient to include the first curly brace on the same line as the control statement. The debate will go on forever since there is no right answer.

File Management

Opening and closing files

One of the main resources that your server provides is a file management system. The scripts in this book, for example, use a multitude of supporting files in the server's file system such as temporary files, counter files, user files, data files, setup files, and libraries. Perl includes several excellent tools for working with these files.

First, Perl gives your scripts the ability to open files using the open function. The open function allows you to create a "filehandle" with which to manipulate a file. A filehandle is another name for a connection between the script and the server. Often, filehandles manage connections between the script and standard input, output, or error, however, in the case of open, any file can be read into a filehandle using the syntax:

open ([FILE_HANDLE_NAME], "[filename]");

For example, we might open a data file for reading using

open (DATA_FILE, "inventory.dat");

In this case, all of the lines of inventory.dat will be read into the filehandle "DATA_FILE" which Perl can then use within the program. However, you must also close a file once you are done with it. The syntax for closing a file is as follows:

close ([FILE_HANDLE_NAME]);

Finally, Perl gives you the ability to execute an error routine if there is a problem opening a file. The "or" logical operator is sometimes discussed in terms of a short circuit. For instance, the logic of the "or" operator is such that if the first expression evaluates to true, there is no need to evaluate the next. On the other hand, if the first expression evaluates to false, the second expression is executed. Thus, using the double pipe (||) operator, you can specify the default action to perform if an "open" fails. In CGI applications, the alternate action executed is usually something like the subroutine, CgiDie located in cgi-lib.pl. For example, the following routine would execute the CgiDie subroutine if there was a problem opening "address.dat".

open (ADDRESS_BOOK, "address.dat") || &CgiDie("Cannot open address.dat");

Thus, if the script has a problem opening a needed file, the double pipe (||) operator provides a convenient and elegant way to quit the program and report the problem.

Reading a file line by line

An often used technique in this book for the manipulation of files is the reading of each line of a file. Perhaps we want to check each line for a keyword, or find every occurrence of some marker tag on a line and replace it with some other string. This process is done using a while loop as discussed previously. Consider this routine which will print out every line in a address file:

open (ADDRESSES, "address.dat") || &CgiDie ("Cannot open address.dat");
while ()
  {
  print "$_";
  }
close (ADDRESSES);

Thus, the script would print out every line in the file address.dat because "$_" is Perl's special name for "the current line" in this case.

[Note] You can also manipulate the "$_" variable in other ways such as applying pattern matching on it or adding it to an array.

Writing and appending to files

You can do more than just read a file of course. You can also open a filehandle for writing with the greater than sign (>) using the syntax:

open ([FILE_HANDLE_NAME], ">[filename]");

or for appending using the double-greater-than symbol (>>) with the syntax:

open ([FILE_HANDLE_NAME], ">>[filename]");

The difference between appending and writing is that when you write to a file, you erase whatever was previously there whereas when you append to a file, you simply add the new information to the end of whatever text was already there.

[Note] If the file which Perl is asked to write or append to does not already exist, Perl will create the file for you.

Typically, when writing to a file, you use the print function. However, instead of printing to standard output, you would specify the filename to print to. Consider the following example:

open (TEMP_FILE, ">temp.file") || &CgiDie ("Cannot open temp.file");
print TEMP_FILE "hello there\n";
close (TEMP_FILE);

The file "temp.file" will now have the solitary line:

hello there

Deleting, renaming and changing the permissions of files

Perl also provides you with all of the file management functions typically offered by your operating system. In our experience, the three most utilized functions in CGI scripts are unlink, rename and chmod. unlink is Perl's function for deleting a file from the file system. The syntax is pretty straight forward.

unlink ("[filename]");

This line of Perl code will delete the file called filename provided that the script has permissions to delete the file.

Your Perl script can also rename a file using the rename function:

rename ("[old_filename]", "[new_filename]");

In this case, the file's name will be replaced with the new filename specified.

Finally, Perl gives you the ability to affect the permissions of files in the file system using the chmod function. The syntax is also fairly straight forward as follows:

chmod (0666, "filename");

In this case, "filename" will be made readable and writable by user, group and world.

File tests

Finally, Perl provides many methods for determining information about files on the file system using "File tests". For the purposes of this appendix, there are too many types of file tests to cover them all in depth. Further, they are covered extensively elsewhere. However, we will note the most frequent syntax of file tests used in this book which follow the form:

if ([filetest] [filename] && [other filetest] [filename])
  {
  do something
  }

Consider the following example which checks to see if a file exists (-e) and is writable (-w) by us, and if so deletes it:

if ((-e "temp.file") && (-w "temp.file"))
  {
  unlink ("temp.file");
  }

Table A-8 lists several common file tests.

Table A-8 Common File Tests

Test Description
-r File or directory is readable
-w File or directory is writable
-x File or directory is executable
-o File or directory is owned by user
-e File or directory exists
-z File exists and has zero size
-s File or directory exists and has non-zero size
-f Entry is a plain file
-d Entry is a directory
-T File is text
-B File is binary
-M Modification age in days
-A Access age in days

Getting information about a file with stat

The stat function produces useful information about files that you can use in your file management functions. The stat function returns a thirteen-element array of file information using the syntax:

open ([FILE_HANDLE_NAME], "[filename]")|| &CgiDie ("Can't open file");
($dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $size,   
       $atime, $mtime, $ctime, $blksize, $blocks) =   
       stat([FILE_HANDLE_NAME]);
close ([FILE_HANDLE_NAME]);

Table A-9 Describes the elements returned by stat

Table A-9 Stat Information

Variable Description
$dev The device that the file resides on
$ino The inode for this file
$mode The permissions for the file
$nlink The number of hard links to the file
$uid The numerical user ID for the file owner
$gid The numerical group ID for the file owner
$rdev The device type if the file is a device
$size The size of the file in bytes
$atime When the file was last accessed
$mtime When the file was last modified
$ctime When the file status was last changed
$blksize The optimal block size for i/o operations on the file system containing the file
$blocks The number of clocks allocated to the file

For the most part, CGI scripts will need to take advantage only of $atime, $mtime, $ctime, $mode and $size. $size and $mode are fairly straight forward in usage. However, the usage of the "time" variables is a bit subtle.

The time values returned by stat are formatted in terms of the number of non-leap seconds since January 1, 1970, UTC. Thus, the stat function might yield a result such as $mtime is equal to "838128443". Likewise, the time function returns the current time in the same format. Thus, the scalar variable $current_time is assigned the current time with the following syntax:

$current_time = time;

Once you have both the age of the file and the current time, you can use arithmetic to compare them for various operations such as the pruning of a Session Files directory after a certain amount of time.

For example, the following code snippet can be used to prune the file "289576893.dat" if it is older than an administratively-defined amount of time.

$seconds_to_save = 3600;
$age_of_file = $current_time - $mtime ;
if ($age_of_file > $seconds_to_save)
  {
  unlink ("289576893.dat");
  }

[Note] If you are interested in what the actual day is, and not the number of seconds since 1970, you must use the localtime function to convert the value to a more human-recognizable form using the syntax:

($sec, $min, $hour, $mday, $mon, $year, $wday, $yday, $isdst) = localtime (time);

The following code gets the same information for an $mtime value extracted from stat:

($sec, $min, $hour, $mday, $mon, $year, $wday, $yday, $isdst) = localtime ($mtime);

Opening, reading and closing directories

Just like files, Perl gives you the ability manage directories. Specifically, Perl allows you to open a directory as a directory handle, read in the current contents of the directory and then close it again .

To open a directory, you use the following syntax:

opendir ([FILE_HANDLE_NAME], "[directory_location]") || &CgiDie ("Can't open [directory_location]");

Thus, for example, you might open the directory "/usr/local/etc/www/" with the syntax:

opendir (WWW, "/usr/local/etc/www/") || &CgiDie ("Can't open www");

As you can see, like opening files, Perl allows the program to die elegantly in case there is a problem opening the directory. Also, as with file manipulation, you must close a directory after you are done with it using the syntax:

closedir ([FILE_HANDLE_NAME]);

For example, to close the directory opened above, you use the command:

closedir(WWW);

Once you have opened a directory, you can also read the contents of the directory with the readdir function. For example, the following code snippet assigns all of the filenames in the www directory to @filenames:

opendir (WWW, "/usr/local/etc/www/") || &CgiDie ("Can't open www");
@filenames = readdir (WWW);
closedir (WWW);

[Note] If you want to avoid including the "." (current directory) and ".." (root directory) files you can use the grep function to avoid including them in the readdir function using the syntax:

@filenames = grep (!/^\.\.?$/, readdir (FILE_HANDLE_NAME);