Introduction to Perl
Satheesh Babu
A brief introduction to getting started with Perl. This is not aimed at getting you proficient in writing CGI scripts, but to help you decide on when and how (not) to use Perl. Working knowledge of any programming language is assumed.
1. Introduction
1.1 Short History 1.2 Evolution 1.3 Relevance 1.4 Installation 1.5 Similarities to common languages/tools
2. Tutorial
2.1 First Step 2.2 Running Perl 2.3 Scalars 2.4 Lists (Arrays) 2.5 Hashes (Associative Arrays) 2.6 Control Structures 2.7 File operations 2.8 String Processing 2.9 Subroutines 2.10 More information
3. Useful Examples
3.1 Processing colon delimited Excel files 3.2 Processing fixed format text files 3.3 Report Generation and Formatting 3.4 DBM Databases 3.5 Exercise
4. Modules
4.1 Where to get them? 4.2 Modules vs. coding 4.3 Well known modules
Introduction to Perl
Introduction to Perl
5. Other choices
5.1 Python OOP 5.2 TCL GUI, Expect, Commercial support 5.3 PHP Web Scripting
6. References
6.1 Books 6.2 WWW 6.3 Scripts' Archives
1. Introduction
Perl was invented by Larry Wall. He called it Practical Extraction and Reporting Language (he also calls it Pathologically Eclectic Rubbish Lister). What started as an exercise in unifying multiple tools used to write scripts to make routine tasks of a system adminstrator evolved into a powerful scripting language with lots of followers. In all fairness, Perl (written always as Perl and not as PERL), is now treated as a generic programming language, though its early beginnings as a melting pot of multiple computing paradigms still make it possible to write undecipherable programs! We will try to get an introduction to Perl and its prowess as a text manipulation language without trying to write cryptic programs. According to Larry Wall, the parents of Perl are
COMPUTER SCIENCE LINGUISTICS Perl ART COMMON SENSE
So, Perl is a computer language that helps to implement some common sense with help from the principles of computer science in an artistic way using common linguistic constructs.
1.2 Evolution
From a quick hack by one system administrator, Perl has grown into a fullfledged language. It is being developed and enhanced continuously by hundreds of programmers around the world. One big step in earning recognition was the addition of regular expression engine. Now, the regular expression capabilities of Perl are 5. Other choices 2
Introduction to Perl so well known (especially since version 5.0), that it is being used in other languages like Python as Perl5 regex'es. The growth of Internet also complemented Perl. The initial attempt at providing dynamic content was through CGI (even now CGI is used extensively), and Perl's remarkable text handling features made it a quick fit. CGI programming is now synonymous with Perl programming. CPAN Comprehensive Perl Archive Network, was set up to share Perl code. Perl supports modules and chances are that for 99% of the programming requirements, there is already a tested module in CPAN (for the remaining 1%, write modules and contribute to CPAN!). Using modules really mask the complexities of adhering to predefined standards and frees you to concentrate on your tasks no point in reinventing the wheel. Now, you have modules which handles graphics, CGI etc... You can also embed Perl code in your C/C++ programs. A very popular embedded Perl architecture is mod_perl for Apache web server. JAPH is a project to get Java and Perl working together.
1.3 Relevance
Data manipulation
Perl can handle strings, dates, binary data, database connectivity, streams, sockets and many more. This ability to manipulate multiple data types help immensely in data conversion (and by the way, it is much faster than PL/SQL!). Perl also has provision for lists (or arrays) and for hashes (associative arrays). Perl also supports references, which are similar to the pointers in C. Lists, hashes and references together make it possible to define and manipulate powerful customdefined datatypes.
Glue language
Perl does not differentiate between files and pipes. So, it makes it very easy to use Perl as a glue language. Suppose you have a sed script, the output of which is to be given to a Perl script. You can do this the UNIX way,
sedscript | perlscript
This really helps when people want to migrate from traditional UNIX tools like Awk, sed, grep etc... You can use these tools straightaway instead of worrying on how to do the same thing entirely in Perl. In this aspect, Perl is just like shell. However, we must consider other features of Perl, which shell simply cannot provide easily.
1.3 Relevance
Introduction to Perl
CGI
CGI.pm. Period. Almost all CGI programs written today are using the CGI.pm module from CPAN. Even before this was written, people used to use Perl extensively for CGI programming. CGI.pm made the process streamlined and easy, even for beginners. The graphics library GD is used extensively in producing dynamic web charts.
Quick coding
The ease with which Perl can be employed to write programs quickly cannot be overstressed. A disturbing fact about this is that such quick code can tend to be dirty and quickly get out of hand if you keep extending it! Most of the time, you must control your urges to overextend short programs! But, as a prototyping tool, or as a fast reporting/textprocessing tool, Perl is immensely helpful. Two very good tools worth mentioning in this context are s2p and a2p tools which come with the Perl distribution. s2p converts a sed script to Perl script and a2p converts from Awk scripts. These two help a lot in extending sed and awk scripts.
Portability
Most of the Perl code will run without any change in Unix or Windows or Macintosh. Typical changes you might have to make include specifying file paths and use of lowlevel OS specific functions.
1.4 Installation
Just go to http://www.perl.com and download the source or precompiled binaries. Installation typically includes extracting the binary and then changing your PATH variable to reflect where Perl executable resides. Even when you want to compile Perl from scratch, it is a simple job.
C
99% of code looks like C code. So, it is very easy for C programmers to switch to Perl. And believe me, the code as you go philosophy of Perl really makes C programmers happy especially for small programs. All C functions that are available through standard libraries are available with little or no change at all in Perl.
AWK SED
The string processing strategy of Perl is very similar to that of Awk and sed, making it easy to migrate.
Shell
Again, the commenting scheme, variable naming scheme etc of Perl look similar to that of Shell. Many shell utilities like grep, tr etc are available as functions within Perl.
CGI
Introduction to Perl
2. Tutorial
Majority of the contents of this tutorial section were written by Nik Silver, at the School of Computer Studies, University of Leeds, UK. Assuming working knowledge of any programming language, we will now try to see what Perl programs look like.
Hello World!
Here is the basic perl program that we'll use to get started.
#! /usr/local/bin/perl # # prints a greeting. # print 'Hello world.';
# Print a message
Comments
A common Perlpitfall is to write cryptic code. In that context, Perl do provide for comments, albeit not very flexible. Perl treats any thing from a hash # to the end of line as a comment. Block comments are not possible. So, if you want to have a block of comments, you must ensure that each line starts with #.
Statements
Everything other than comments are Perl statements, which must end with a semicolon, like the last line above. Unlike C, you need not put a wrapping character \ for long statements. A Perl statement always ends with a semicolon.
at the UNIX prompt, where progname is the filename of the program. Now, to run the program, just type any of the following at the prompt.
perl progname ./progname progname
2. Tutorial
Introduction to Perl If something goes wrong then you may get error messages, or you may get nothing. You can always run the program with warnings using the command
perl w progname
at the prompt. This will display warnings and other (hopefully) helpful messages before it tries to execute the program. To run the program with a debugger use the command
perl d progname
When the file is executed Perl first compiles it and then executes that compiled version. Unlike many other interpreted languages, Perl scripts are compiled first, helping you to catch most of errors before program actually starts executing. In this context, the w switch is very helpful. It will warn you about unused variables, suspicious statements etc.
2.3 Scalars
Perl supports 3 basic types of variables, viz., scalars, lists and hashes. We will explore each of these little more. The most basic kind of variable in Perl is the scalar variable. Scalar variables hold both strings and numbers, and are remarkable in that strings and numbers are completely interchangeable. For example, the statement
$age = 27;
sets the scalar variable $age to 27, but you can also assign a string to exactly the same variable:
$age = 'Twenty Seven';
and can still cope with arithmetic and other operations quite happily. However, please note that the following code is a bit too much to ask for!
$age = 'Twenty Seven'; $age = $age + 10;
For the curious, the above code will set $age to 10. Think why. In general variable names consists of numbers, letters and underscores, but they should not start with a number and the variable $_ is special, as we'll see later. Also, Perl is case sensitive, so $a and $A are different.
2.3 Scalars
Introduction to Perl
$a = 7 / 8; $a = 9 ** 10; $a = 5 % 2; ++$a; $a++; $a; $a; # # # # # # # Divide 7 by 8 to give 0.875 Nine to the power of 10 Remainder of 5 divided by 2 Increment $a and then return Return $a and then increment Decrement $a and then return Return $a and then decrement
it it it it
Note that when Perl assigns a value with $a = $b it makes a copy of $b and then assigns that to $a. Therefore the next time you change $b it will not alter $a. Other operators can be found on the perlop manual page. Type man perlop at the prompt.
Interpolation
The following code prints apples and pears using concatenation:
$a = 'apples'; $b = 'pears'; print $a.' and '.$b;
It would be nicer to include only one string in the final print statement, but the line
print '$a and $b';
prints literally $a and $b which isn't very helpful. Instead we can use the double quotes in place of the single quotes:
print "$a and $b";
The double quotes force interpolation of any codes, including interpreting variables. This is a much nicer than our original statement. Other codes that are interpolated include special characters such as newline and tab. The code \n is a newline and \t is a tab.
Exercise
This exercise is to rewrite the Hello world program so that (a) the string is assigned to a variable and (b) this variable is then printed with a newline character. Use the double quotes and don't use the concatenation operator.
Interpolation
Introduction to Perl
assigns a three element list to the array variable @food and a two element list to the array variable @music. The array is accessed by using indices starting from 0, and square brackets are used to specify the index. The expression
$food[2]
returns eels. Notice that the @ has changed to a $ because eels is a scalar.
Array assignments
As in all of Perl, the same expression in a different context can produce a different result. The first assignment below explodes the @music variable so that it is equivalent to the second assignment.
@moremusic = ("organ", @music, "harp"); @moremusic = ("organ", "whistle", "flute", "harp");
This should suggest a way of adding elements to an array. A neater way of adding elements is to use the statement
push(@food, "eggs");
which pushes eggs onto the end of the array @food. To push two or more items onto the array use one of the following forms:
push(@food, "eggs", "lard"); push(@food, ("eggs", "lard")); push(@food, @morefood);
The push function returns the length of the new list. So does $#food ! To remove the last item from a list and return it use the pop function. From our original list the pop function returns eels and @food now has two elements:
$grub = pop(@food); # Now $grub = "eels"
It is also possible to assign an array to a scalar variable. As usual context is important. The line
$f = @food;
Introduction to Perl
$f = "@food";
turns the list into a string with a space between each element. This space can be replaced by any other string by changing the value of the special $" variable. This variable is just one of Perl's many special variables, most of which have odd names. When you get overloaded with oddity, use the English module which lets you name these variables in more userfriendly (i.e. to Englishspeaking people) way. Arrays can also be used to make multiple assignments to scalar variables:
($a, $b) = ($c, $d); ($a, $b) = @food; ($a, @somefood) = @food; # # # # # # # # Same as $a=$c; $b=$d; $a and $b are the first two items of @food. $a is the first item of @food @somefood is a list of the others. @somefood is @food and $a is undefined.
The last assignment occurs because arrays are greedy, and @somefood will swallow up as much of @food as it can. Therefore that form is best avoided. Finally, you may want to find the index of the last element of a list. To do this for the @food array use the expression
$#food
Displaying arrays
Since context is important, it shouldn't be too surprising that the following all produce different results:
print @food; # By itself print "@food"; # Embedded in double quotes print @food.""; # In a scalar context
Now we can find the age of people with the following expressions
Displaying arrays
Introduction to Perl
$ages{"Michael Caine"}; $ages{"Dirty Den"}; $ages{"Angie"}; $ages{"Willy"}; $ages{"The Queen Mother"}; # # Returns # # # Returns Returns 39 34 Returns 27 Returns "21 in dog years" 108
Notice that like list arrays each % sign has changed to a $ to access an individual element because that element is a scalar. Unlike list arrays the index (in this case the person's name) is enclosed in curly braces, the idea being that associative arrays are fancier than list arrays. An associative array can be converted back into a list array just by assigning it to a list array variable. A list array can be converted into an associative array by assigning it to an associative array variable. Ideally the list array will have an even number of elements:
@info = %ages; $info[5]; %moreages = @info; # # # # # # @info is a list array. It now has 10 elements Returns the value 27 from the list array @info %moreages is an associative array. It is the same as %ages
Operators
Associative arrays do not have any order to their elements (they are just like hash tables) but is it possible to access all the elements in turn using the keys function and the values function:
foreach $person (keys %ages) { print "I know the age of $person\n"; } foreach $age (values %ages) { print "Somebody is $age\n"; }
When keys is called it returns a list of the keys (indices) of the associative array. When values is called it returns a list of the values of the array. These functions return their lists in the same order, but this order has nothing to do with the order in which the elements have been entered. When keys and values are called in a scalar context they return the number of key/value pairs in the associative array. There is also a function each which returns a two element list of a key and its value. Every time each is called it returns another key/value pair:
while (($person, $age) = each(%ages)) { print "$person is $age\n"; }
Environment variables
When you run a perl program, or any script in UNIX, there will be certain environment variables set. These will be things like USER which contains your username and DISPLAY which specifies which screen your Operators 10
Introduction to Perl graphics will go to. When you run a perl CGI script on the World Wide Web there are environment variables which hold other useful information. All these variables and their values are stored in the associative %ENV array in which the keys are the variable names. Try the following in a perl program:
print "You are called $ENV{'USER'} and you are "; print "using display $ENV{'DISPLAY'}\n";
foreach
To go through each line of an array or other listlike structure (such as lines in a file) Perl uses the foreach structure. This has the form
foreach $morsel (@food) { print "$morsel\n"; print "Yum yum\n"; } # Print the item # That was nice # Visit each item in turn # and call it $morsel
The actions to be performed each time are enclosed in a block of curly braces. The first time through the block $morsel is assigned the value of the first item in the array @food. Next time it is assigned the value of the second item, and so until the end. If @food is empty to start with then the block of statements is never executed.
Testing
The next few structures rely on a test being true or false. In Perl any nonzero number and nonempty string is counted as true. The number zero, zero by itself in a string, and the empty string are counted as false. Here are some tests on numbers and strings.
$a == $b $a != $b $a eq $b $a ne $b # # # # # Is $a numerically equal to $b? Beware: Don't use the = operator. Is $a numerically unequal to $b? Is $a stringequal to $b? Is $a stringunequal to $b?
for
Perl has a for structure that mimics that of C. It has the form
for (initialise; test; inc)
11
Introduction to Perl
{ first_action; second_action; etc }
First of all the statement initialise is executed. Then while test is true the block of actions is executed. After each time the block is executed inc takes place. Here is an example for loop to print out the numbers 0 to 9.
for ($i = 0; $i < 10; ++$i) # Start with $i = 1 # Do it while $i < 10 # Increment $i before repeating
{ print "$i\n"; }
The curlybraced block of code is executed while the input does not equal the password. The while structure should be fairly clear, but this is the opportunity to notice several things. First, we can we read from the standard input (the keyboard) without opening the file first. Second, when the password is entered $a is given that value including the newline character at the end. The chop function removes the last character of a string which in this case is the newline. To test the opposite thing we can use the until statement in just the same way. This executes the block repeatedly until the expression is true, not while it is true. Another useful technique is putting the while or until check at the end of the statement block rather than at the beginning. This will require the presence of the do operator to mark the beginning of the block and the test at the end. If we forgo the sorry. Again message in the above password program then it could be written like this.
#!/usr/local/bin/perl do { print "Password? "; # Ask for input $a = ; # Get input chop $a; # Chop off newline } while ($a ne "fred") # Redo while wrong input
12
Introduction to Perl
Exercise
Modify the program from the previous exercise so that each line of the file is read in one by one and is output with a line number at the beginning. You should get something like:
1 root:oYpYXm/qRO6N2:0:0:SuperUser:/:/bin/csh 2 sysadm:*:0:0:System V Administration:/usr/admin:/bin/sh 3 diag:*:0:996:Hardware Diagnostics:/usr/diags:/bin/csh etc
When you have done this see if you can alter it so that line numbers are printed as 001, 002, ..., 009, 010, 011, 012, etc. To do this you should only need to change one line by inserting an extra four characters. Perl's clever like that.
ifelse
Of course Perl also allows if/then/else statements. These are of the following form:
if ($a) { print "The string is not empty\n"; } else { print "The string is empty\n"; }
For this, remember that an empty string is considered to be false. It will also give an "empty" result if $a is the string 0. It is also possible to include more alternatives in a conditional statement:
if (!$a) { print "The string } elsif (length($a) == 1) { print "The string } elsif (length($a) == 2) { print "The string } else { print "The string } # The ! is the not operator is empty\n"; # If above fails, try this has one character\n"; # If that fails, try this has two characters\n"; # Now, everything has failed has lots of characters\n";
Exercise
13
Introduction to Perl In this, it is important to notice that the elsif statement really does have an "e" missing. Sometimes, it is more readable to use unless instead of if (!...) . The switchcase statement familiar to C programmers are not available in Perl. You can simulate it in other ways. See the manual pages.
Exercise
From the previous exercise you should have a program which prints out the password file with line numbers. Change it so that works with the text file. Now alter the program so that line numbers aren't printed or counted with blank lines, but every line is still printed, including the blank ones. Remember that when a line of the file is read in it will still include its newline character at the end.
The open function opens a file for input (i.e. for reading). The first parameter is the filehandle which allows Perl to refer to the file in future. The second parameter is an expression denoting the filename. If the filename was given in quotes then it is taken literally without shell expansion. So the expression '~/notes/todolist' will not be interpreted successfully. If you want to force shell expansion then use angled brackets: that is, use instead. The close function tells Perl to finish with that file. There are a few useful points to add to this discussion on filehandling. First, the open statement can also specify a file for output and for appending as well as for input. To do this, prefix the filename with a > for output and a >> for appending:
open(INFO, open(INFO, open(INFO, open(INFO, $file); ">$file"); ">>$file"); "<$file"); # Open for input # Open for output # Open for appending # Also open for input
Second, if you want to print something to a file you've already opened for output then you can use the print statement with an extra parameter. To print a string to the file with the INFO filehandle use
print INFO "This line goes to the file.\n";
Third, you can use the following to open the standard input (usually the keyboard) and standard output (usually the screen) respectively:
open(INFO, ''); # Open standard input
Exercise
14
Introduction to Perl
open(INFO, '>'); # Open standard output
In the above program the information is read from a file. The file is the INFO file and to read from it Perl uses angled brackets. So the statement
@lines = ;
reads the file denoted by the filehandle into the array @lines. Note that the expression reads in the file entirely in one go. This is because the reading takes place in the context of an array variable. If @lines is replaced by the scalar $lines then only the next one line would be read in. In either case each line is stored complete with its newline character at the end.
Exercise
Modify the above program so that the entire file is printed with a # symbol at the beginning of each line. You should only have to add one line and modify another. Use the $" variable. Unexpected things can happen with files, so you may find it helpful to use the w option.
Extending pipes
You can very easily substitute reading a file to reading a pipe. The following example shows reading the ouput of the ps command.
open(PS,"ps aef|") or die "Cannot open ps \n"; while(){ print ; } close(PS);
Regular expressions
A regular expression is contained in slashes, and matching occurs with the =~ operator. The following expression is true if the string the appears in variable $sentence.
$sentence =~ /the/
then the above match will be false. The operator !~ is used for spotting a nonmatch. In the above example
$sentence !~ /the/
Exercise
15
Introduction to Perl
But it's often much easier if we assign the sentence to the special variable $_ which is of course a scalar. If we do this then we can avoid using the match and nonmatch operators and the above can be written simply as
if (/under/) { print "We're talking about rugby\n"; }
The $_ variable is the default for many Perl operations and tends to be used very heavily.
More on REs
In an RE there are plenty of special characters, and it is these that both give them their power and make them appear very complicated. It's best to build up your use of REs slowly; their creation can be something of an art form. Here are some special RE characters and their meaning
. ^ $ * + ? # # # # # # Any single character except a newline The beginning of the line or string The end of the line or string Zero or more of the last character One or more of the last character Zero or one of the last character
and here are some example matches. Remember that should be enclosed in /.../ slashes to be used.
t.e # # # # # # # # # # # # # # t followed by anthing followed by e This will match the tre tle but not te tale f at the beginning of a line ftp at the beginning of a line e at the end of a line tle at the end of a line un followed by zero or more d characters This will match un und undd
16
Introduction to Perl
# # # # # unddd (etc) Any string without a newline. This is because the . matches anything except a newline and the * means zero or more of these. A line with nothing in it.
.*
^$
There are even more options. Square brackets are used to match any one of the characters inside them. Inside square brackets a indicates "between" and a ^ at the beginning means "not":
[qjk] [^qjk] [az] [^az] [azAZ] [az]+ # # # # # # Either q or j or k Neither q nor j nor k Anything from a to z inclusive No lower case letters Any letter Any nonzero sequence of lower case letters
At this point you can probably skip to the end and do at least most of the exercise. The rest is mostly just for reference. A vertical bar | represents an "or" and parentheses (...) can be used to group things together:
jelly|cream (eg|le)gs (da)+ # Either jelly or cream # Either eggs or legs # Either da or dada or dadada or...
Clearly characters like $, |, [, ), \, / and so on are peculiar cases in regular expressions. If you want to match for one of those then you have to preceed it by a backslash. So:
\| \[ \) \* \^ \/ \\ # # # # # # # Vertical bar An open square bracket A closing parenthesis An asterisk A carat symbol A slash A backslash
and so on.
17
Introduction to Perl
Exercise
Previously your program counted nonempty lines. Alter it so that instead of counting nonempty lines it counts only lines with the letter x the string the the string the which may or may not have a capital t the word the with or without a capital. Use \b to detect word boundaries. In each case the program should print out every line, but it should only number those specified. Try to use the $_ variable to avoid using the =~ match operator explicitly.
Substitution Translation
Just like the sed and tr utilities in Unix, you have s/// and tr/// in Perl. The former is for substitution and the later is for translation.
$bar =~ s/this/that/g; # change this to that in $bar $path =~ s|/usr/bin|/usr/local/bin|; s/\bgreen\b/mauve/g; # don't change wintergreen
s/Login: $foo/Login: $bar/; # runtime pattern $count = ($paragraph =~ s/Mister\b/Mr./g); # get changecount $program =~ s { /\* # Match the opening delimiter. .*? # Match a minimal number of characters. \*/ # Match the closing delimiter. } []gsx; # Delete (most) C comments.
18
Introduction to Perl
s/^\s*(.*?)\s*$/$1/; for ($variable) { s/^\s+//; s/\s+$//; } s/([^ ]*) *([^ ]*)/$2 $1/; # trim white space in $_, expensively # trim white space in $variable, cheap
#Note the use of $ instead of \ in the last example. Unlike sed, #we use the \ form in only the left hand side. #Anywhere else it's $. $myname = "BABU"; $myname =~ tr/[AZ]/[az]/ ; # yields babu
Splitting
Perl provides a split function to split strings, based on REs. The syntax is
split /PATTERN/,EXPR,LIMIT split /PATTERN/,EXPR split /PATTERN/ split
If EXPR is omitted, $_ is used. If PATTERN is also omitted, splits on whitespaces, after skipping leading whitespaces. LIMIT sets the maximum fields returned so this can be used to split partially. Some examples are given below:
# process the password file open(PASSWD, '/etc/passwd'); while () { ($login, $passwd, $uid, $gid, $gcos, $home, $shell) = split(/:/); # note that $shell still has a new line. # use chop or chomp to remove the newline #... ($login, $passwd, $remainder) = split(/:/, $_, 3); # here we use LIMIT to set the number of fields }
We also have join which is the opposite of split. For fixed length strings, we have unpack and pack functions.
2.9 Subroutines
Like any good programming language Perl allows the user to define their own functions, called subroutines. They may be placed anywhere in your program but it's probably best to put them all at the beginning or all at the end. A subroutine has the form
sub mysubroutine { print "Not a very interesting routine\n"; print "This does the same thing every time\n"; }
Splitting
19
Introduction to Perl regardless of any parameters that we may want to pass to it. All of the following will work to call this subroutine. Notice that a subroutine is called with an character in front of the name:
&mysubroutine; # Call the subroutine &mysubroutine($_); # Call it with a parameter &mysubroutine(1+2, $_); # Call it with two parameters
Parameters
In the above case the parameters are acceptable but ignored. When the subroutine is called any parameters are passed as a list in the special @_ list array variable. This variable has absolutely nothing to do with the $_ scalar variable. The following subroutine merely prints out the list that it was called with. It is followed by a couple of examples of its use.
sub printargs { print "@_\n"; } &printargs("perly", "king"); # Example prints "perly king" &printargs("frog", "and", "toad"); # Prints "frog and toad"
Just like any other list array the individual elements of @_ can be accessed with the square bracket notation:
sub printfirsttwo { print "Your first argument was $_[0]\n"; print "and $_[1] was your second\n"; }
Again it should be stressed that the indexed scalars $_[0] and $_[1] and so on have nothing to with the scalar $_ which can also be used without fear of a clash.
Returning values
Result of a subroutine is always the last thing evaluated. This subroutine returns the maximum of two input parameters. An example of its use follows.
sub maximum { if ($_[0] > $_[1]) { $_[0]; } else { $_[1]; } } $biggest = &maximise(37, 24); # Now $biggest is 37
The subroutine above also returns a value, in this case 1. This is because the last thing that subroutine did was a print statement and the result of a successful print statement is always 1.
Parameters
20
Introduction to Perl
Local variables
The @_ variable is local to the current subroutine, and so of course are $_[0], $_[1], $_[2], and so on. Other variables can be made local too, and this is useful if we want to start altering the input parameters. The following subroutine tests to see if one string is inside another, spaces not withstanding. An example follows.
sub inside { local($a, $b); ($a, $b) = ($_[0], $_[1]); $a =~ s/ //g; $b =~ s/ //g; ($a =~ /$b/ || $b =~ /$a/); } &inside("lemon", "dole money"); # true
# # # # # #
Make local variables Assign values Strip spaces from local variables Is $b inside $a or $a inside $b?
In fact, it can even be tidied up by replacing the first two lines with
local($a, $b) = ($_[0], $_[1]);
Local variables
21
Introduction to Perl
# so keep a count $tableno=$colno=0; # similarly, we will store the comments in hashes %colcmnts=%tblcmnts=(); LINE: while(<>){ chomp; # remove the new line character next LINE if (/^$/); # ignore null lines next LINE if (/^\s*#/); # ignore comment lines ($column_name, $data_type, $precision, $nullable, $comment) = split /:/; $column_name =~ tr/[AZ]/[az]/ ; #lowercase column name if ($data_type eq ""){ #no datatype ? then this is a table $table_name = $column_name; $colno=0; print ");\n\n" unless ($tableno==0); print "CREATE TABLE $table_name AS (\n"; $tableno++; #store table comment if there is a comment $tblcmnts{$table_name} = $comment unless ($comment eq ""); next LINE; } #column comment $colcmnts{"$table_name.$column_name"} = $comment unless ($comment eq ""); $colno++; #if precision is specified, we need to put it inside parantheses $data_type .="($precision)" unless ($precision eq ""); if ($colno==1) { $column_name = " $column_name"; } else { $column_name = ",$column_name"; } $data_type =~ tr/[az]/[AZ]/ ; # uppercase datatype $nullable =~ tr/[az]/[AZ]/ ; # upper case "not null" # print to the format defined before write ; } print ");\n\n" ;
# print table comments foreach my $key (sort keys(%tblcmnts)) { $comment = "'$tblcmnts{$key}'"; print "COMMENT ON TABLE $key IS $comment;\n"; } print "\n\n" ; # print column comments foreach my $key (sort keys(%colcmnts)) { $comment = "'$colcmnts{$key}'"; print "COMMENT ON COLUMN $key IS $comment;\n"; } print "\n\n" ; __END__ # from here is the sample input file # columns are # column(table):data type:precision:nullable:comment # for table name, data type is null BARS_BATCH::::Batch Header Table batch_number:number:5:not null:The batch number(sequential)
Local variables
22
Introduction to Perl
deposit_date:date::not null:Deposit date of the batch payments:number:3:not null:Number of payments in the batch payment_amount:number:11,2:not null:Dollar amount of the batch pieces:number:3:not null:Number of cheques in the batch payment_method:varchar2:2:not null:Method of payment(Cheque,Cash...) clerk:varchar2:10:not null:Who entered the batch origin:varchar2:2:not null:Where the batch originated (Field, HO...) dirty:varchar2:1::Is the batch marked as dirty(1,0) actual_payments:number:3::Number of payments actually entered actual_amount:number:11,2::Amount actually entered creadt:date::not null:Date batch was created modidt:date:::Date batch was last modified sts:varchar2:1:not null:Status of the batch errcode:varchar2:16::Errors in the batch BARS_GIFTS::::Batch Detail Table batch_number:number:5:not null:The batch number(sequential) doc_number:number:2:not null:The gift doc number(sequential within batch) page:number:2:not null:Page number account_id:number:8:not null:Active/new account id for the member source:varchar2:14:not null:Active source fund:varchar2:16::Active fund gift_type:varchar2:2::Active gift type credit_account:varchar2:4:not null:Active credit account (check the size!) handle_flag:varchar2:1::Special handling flag payment_amount:number:11,2:not null:Gift amount total_payment_amount:number:11,2:not null:Cheque amount creadt:date::not null:Date gift was created modidt:date:::Date gift was last modified errcode:varchar2:16::Errors in the gift BARS_ACCOUNTS::::New members account_id:number:8:not null:New account id for the member title:varchar2:8::Title for the new member first_name:varchar2:20::First name middle_name:varchar2:20::Middle name last_name:varchar2:40:not null:Last name(In TA, can be null) suffix:varchar2:8::Suffix phone_number:varchar2:15::Phone number street_number:varchar2:8::Street number street_name:varchar2:30::Street name apt_no:varchar2:8::Apartment number zipcode:varchar2:5::Zipcode zipcode_ext:varchar2:5::Zipcode extension city:varchar2:30::City state:varchar2:2::State freeline:varchar2:50::Free comments extraline:varchar2:50::Free comments 2 BARS_CODES::::Default codes for LOVs code_type:varchar2:2:not null:Code type code:varchar2:20:not null:Code codelb:varchar2:40:not null:Description
23
Introduction to Perl
INTO TABLE ACQUIRED_DATA WHEN record_type = 'T' ( record_type POSITION(001:001) CHAR "DECODE (:record_type, 'T', 'BT', 'X', 'FT', 'D', 'D', 'O')", ocr_batch_number POSITION(002:010) CHAR, ocr_gift_date POSITION(011:021) CHAR, ocr_deposit_date POSITION(011:021) CHAR, target_payment_num POSITION(027:029) INTEGER EXTERNAL, target_payment_amt POSITION(030:040) DECIMAL EXTERNAL )
Here is our Perl code to read all the T records, split the record into corresponding variables and then print the batch number, payment number and the amount.
#! /usr/local/bin/perl w # read standard input LINE : while(<>) { #ignore records other than next LINE unless /^T/;
batch headers
# remove the new line character chomp; # split the record! ($rec_type, $ocr_batch_number, $ocr_gift_date, $filler, $target_payment_num, $target_payment_amount) = unpack("A1 A9 A11 A5 A3 A11",$_); #convert the number fields from scalar string to scalar number! $target_payment_num += 0; $target_payment_amount += 0; #voila! print it print "$ocr_batch_number, $target_payment_num, $target_payment_amount \n"; }
24
Introduction to Perl
return reverse $input; } #define the page header ## $% is the page number format STDOUT_TOP= THE NATURE CONSERVANCY @<<<<<<<<<<<<< Upload File Batch Report Page : @>>> $today,$% Batch Number Gift Date Payments Amount . # define the page format STDOUT= @<<<<<<<<<<<<<<<< @<<<<<<<<<< @>>>>>> @>>>>>>>>>>>>>>>>>>>>> $ocr_batch_number,$ocr_gift_date,$target_payment_num,$target_payment_amount . # $= is the lines per page . Normal printers have this as 59 $= = 59; # initialize the variables that hold report totals $sum_num = $sum_amount = 0; # read standard input LINE : while(<>) { #ignore records other than next LINE unless /^T/;
batch headers
# remove the new line character chomp; # split the record! ($rec_type, $ocr_batch_number, $ocr_gift_date, $filler, $target_payment_num, $target_payment_amount) = unpack("A1 A9 A11 A5 A3 A11",$_); #convert the number fields from scalar string to scalar number! $target_payment_num += 0; $target_payment_amount += 0; # add to the totals $sum_num += $target_payment_num; $sum_amount += $target_payment_amount; #add commas to the number $target_payment_num = &commify($target_payment_num); # dollar amount should have 2 decimal places $target_payment_amount = "\$".&commify(sprintf("%.2f",$target_payment_amount)); #voila! print it #print "$ocr_batch_number, $target_payment_num, $target_payment_amount \n"; write; } ## # print a line before printing totals # $ocr_batch_number = "";
25
Introduction to Perl
$ocr_gift_date = ""; $target_payment_num = ""; $target_payment_amount = ""; write; ## # print totals # $ocr_batch_number = "TOTAL"; $ocr_gift_date = ""; $target_payment_num = &commify($sum_num); $target_payment_amount = "\$".&commify(sprintf("%.2f",$sum_amount)); write;
batch headers
# remove the new line character chomp; # split the record! ($rec_type, $ocr_batch_number, $ocr_gift_date, $filler, $target_payment_num, $target_payment_amount) = unpack("A1 A9 A11 A5 A3 A11",$_); #convert the number fields from scalar string to scalar number! $target_payment_num += 0; $target_payment_amount += 0;
# key is the batch number # value is batch date + payments + amount # all joined by : ($Key,$Value) = ($ocr_batch_number,"$ocr_gift_date:$target_payment_num:$t ## # check whether this batch is already loaded if ( defined($HASH{$Key}) ) { # if so, print an error ($b_date,$b_payments,$b_amount)=split(/:/,$HASH{$Key}); print "Error: The batch $Key ($b_payments for \$$b_amount) is alr } else { # else, add to the batch database
26
Introduction to Perl
$HASH{$Key} = $Value; } } dbmclose %HASH;
3.5 Exercise
Using the examples above, write a program to read all batch records from an input file, verify against a DBM database and print a formatted report. Duplicate batches should also be indicated in the report. Try to split the tasks (verifying against the database, reporting etc) into individual subroutines. Also add another routine to generate an Excel CSV file report, in addition to the normal report.
4. Modules
(The following section is borrowed directly from Tim Bunce's modules file, available at your nearest CPAN site.) Perl implements a class using a package, but the presence of a package doesn't imply the presence of a class. A package is just a namespace. A class is a package that provides subroutines that can be used as methods. A method is just a subroutine that expects, as its first argument, either the name of a package (for ``static'' methods), or a reference to something (for ``virtual'' methods). A module is a file that (by convention) provides a class of the same name (sans the .pm), plus an import method in that class that can be called to fetch exported symbols. This module may implement some of its methods by loading dynamic C or C++ objects, but that should be totally transparent to the user of the module. Likewise, the module might set up an AUTOLOAD function to slurp in subroutine definitions on demand, but this is also transparent. Only the .pm file is required to exist.
3.5 Exercise
27
Introduction to Perl CPAN provides guidelines on writing modules. So, if you think you have some code that nobody else has written (rare chance!) and can be modularized, do so by all means and submit to CPAN.
In simple language, the DBI interface allows users to access multiple database types transparently. So, if you are connecting to an Oracle, Informix, mSQL, Sybase or whatever database, you don't need to know the underlying mechanics of the 3GL layer. The API defined by DBI will work on all these database types. A similar benefit is gained by the ability to connect to two different databases by different vendors within the one perl script, i.e., I want to read data from an Oracle database and insert it back into an Informix database all within one program. The DBI layer allows you to do this simply and powerfully. The DBI requires one or more driver modules to talk to databases. Oracle, Access and ODBC drivers might be of interest to us. Please note that DBI only standardize the database interaction process. If you use Oracle driver and write SQL specific to Oracle, don't expect to port your project smoothly to Informix, just by changing the driver!
28
Introduction to Perl Python is also a cleaner language in that it does not generally allow you to be adventurous with data types. Consequently, Python code is much easier to maintain, as it gets bigger. Python is designed to be extensible. So, enforcing standards and extending the capabilities of the language are easier. With the advent of Java and increased OOP awareness, Python is very popular these days. Python has modules for GUI programming in Unix and Windows environments and that is one area where it is catching up. It is a matter of personal preference to choose between Python and Perl. Generally, people who prefer C to C++ opt for Perl and C++ lovers go for Python. VB and ASP programmers also can migrate to Python smoothly, which might not be the case with Perl. Personally, for all the elegance of Python, I do not program in that because of one reason. In Python, the blocks are specified by indentation and not by any construct like {} or BEGINEND. This can really be irritating if your FTP client or editor decides to translate the file! An easy way out is to add {} as comments (like #{ ... #} !) A good feature of Python is that the interpreter can be used interactively like shell. This enables one to test out each line of code thoroughly. A similar application called perlshell is also available in Perl. A 100% pure Java implementation of Python called JPython is the latest news on Python. This enables one to create Java bytecode out ot Python code.
Introduction to Perl complexity site of dynamic content. My personal opinion is that, if you want to create interactive/dynamic web site without learning the complexities of CGI, PHP is the way to go especially if you don't want to spend thousands to purchase ColdFusion or ASP. PHP can also be installed as a separate interpreter program like Perl or Awk. Combined with the ease with which you can write powerful programs in PHP and its easy integration with databases (Oracle/Sybase/Informix/ODBC/PostgreSQL/MySQL...), graphics libraries, network functions, email etc, it is a viable and better choice to even Pro*C !
6. References
There are lots of materials available about Perl on the Internet. Of course, the first stop is the Perl homepage. You can also subscribe to The Perl Journal, which is slightly advanced.
6.1 Books
Programming Perl by Larry Wall is the standard book on Perl. Also known as the camel book it is not for easy reading. However, if you consider serious Perl programming, this is a must. Published by O'Reilly and Associates, it is available for around $30$40. Learning Perl by Randal L Schwartz is a very good book to get you started. It is much easier to read than the camel book. Again from O'Reilly and Associates, it is available for around $20$30. Perl Cookbook by Tom Christiansen is a very good book to look for quick solutions. If you have an idea of Perl, this is the book to buy, for very well explained real life examples. This book illustrates the Perl motto There Is More Than One Way To Do It to the core ! From O'Reilly and Associates, it is available for around $30$40.
6.2 WWW
http://www.perl.com http://www.developer.com
6. References
30