Anda di halaman 1dari 24

PERL

Variables and data structures


Andrew Emerson, High Performance Systems, CINECA
The Hello World program
Consider the following:
#
# Hello World
#
$message=Ciao, Mondo;
print $message \n;
exit;
Perl Variables
$message is called a variable, something with a
name used to hold one or more pieces of
information.

All computer languages have the ability to create
variables to store and manipulate data.

Perl differs from other languages because you do
not specify the type (i.e. integer, real, character,
etc.) only the complexity of the data.
Perl Variables
Perl has 3 ways of storing data:

1. Scalar
For single data items, like numbers or strings.
2. Arrays
For ordered lists of scalars. Scalars indexed by
numbers.
3. Associative arrays or hashes
Like arrays, but uses keys to identify the scalars.

Scalar Variables
Examples
#
$no_of_chrs=24; # integer
$per_cent_identity=0; # also integer
$per_cent_identity=99.50; # redefined as real
$pi = 3.1415926535; # floating point (real)
$e_value=1e-40; # using scientific notation
$dna=GCCTACCGTTCCACCAAAAAAAA; # string -double quotes
$dna=GCCTACCGTTCCACCAAAAAAAA; # string -single quotes

Scalar Variables
CASE is important, $DNA $dna;
(true for all variables)

Scalars must be prefixed with a $ whenever they are used (is
there a $? Yes it is a scalar). The next character should
be a letter and not a number (true for all variables).

Scalars can be happily redefined at any time (e.g. integer
real string):
# unlikely example
$dna = 0; # integer
$dna = GGCCTCGAACGTCCAGAAA; # now its a
# string
Doing things with scalars..
#
$a =1.5;
$b =2.0; $c=3;
$sum = $a+$b*$c; # multiply by $b by $c, add to $a
#
while ($j<100) {
$j++; # means $j=$j+1, i.e. add 1 to j
print $j\n;
}
#
$dna1=GCCTAAACGTC;
$polyA=AAAAAAAAAAAAAAAA;
$dna1 .= $polyA; # add one string to another
# (equiv. $dna1 = $dna1.$polyA)
$no_of_bases = length($dna2); # length of a scalar
More about strings..
There is a difference between strings with and
#
$nchr = 24;
$message=chromosones in human cell
=$nchr;
print $message;
$message = chromosones in human cell
=$nchr;
print $message;
exit;

single quotes
double quotes
OUTPUT
chromosones in
human cell =24

chromosones in
human cell
=$nchr

More about strings
Double quotes interpret variables, single quotes
do not:
$dna=GTTTCGGA;
print sequence=$dna;
print sequence=$dna;
OUTPUT
sequence=GTTTCGGA
sequence=$dna
Normally you would want double quotes
when using print.
@days_in_month=(31,28,31,30,31,30,31,31,30,31,30,31);
@days_of_the_week=(mon, tue, wed
,thu,fri,sat,sun);
@bases = (adenine, guanine, thymine, cytosine,
uracil);
@GenBank_fields=( LOCUS,
DEFINITION,
ACCESSION,
...
);
Arrays
Collections of numbers, strings etc can be stored in arrays.
In Perl arrays are defined as ordered lists of scalars and
are represented with the @ character.
Initializing arrays with lists
Arrays - elements
To access the individual array elements you use [ and ] :
@poly_peptide=(gly,ser,gly,pro,pr
o,lys,ser,phe);
# now mutate the peptide
$poly_peptide[0]=val;
$i=0;
# print out what we have
while ($i<8) {
print $poly_peptide[$i] ;
$i++;
}
Look
array index
The numbers used to identify the elements are
called indices.
Arrays - elements
When accessing array elements you use $ - why ?
Because array elements are scalar and scalars must
have $;
@poly_peptide=(..);
$poly_peptide[0] = val;
This means that you can have a separate variable
called $poly_peptide because $poly_peptide[0] is part
of @poly_peptide, NOT $poly_peptide.
This may seem a bit weird, but that's
okay, because it is weird.
Unix Perl Manual
Array indices start from 0 not 1 ;
Array elements
$poly_peptide[0]=var;
$poly_peptide[1]=ser;
$poly_peptide[7]=phe;
The last index of the array can be found from
$#name_of_array, e.g. $#poly_peptide. You can
also use negative indices: it means you count back from
the end of the array. Therefore
$poly_peptide[-1]=
$poly_peptide[$#poly_peptide] =
$poly_peptide[7]
Array properties
Length of an array:
$len = $#poly_peptide+1;
The size of the array does not need to be defined it can grow
dynamically:
# begin program
$i=0;
while ($i<100) {
$polyA[$i]=A;
$i++;
}
Useful Array functions
PUSH and POP
Functions commonly used for manipulating a stack:
PUSH
POP
F.I.L.O = First In
Last Out
Very common in computer programs
Array functions PUSH and POP
# part of a program that reads a database into an
array
# open database etc first..
@dblines=(); # resets @dblines
while ($line=<DB>) {
push @dblines,$line; # push $line onto array
}
...
while (@dblines) {
$record = pop @dblines; # pop line off and use it
.... do something
}
Scalar Contexts
If you provide an expression (e.g. an array) when Perl
expects a scalar, Perl attempts to evaluate the expression
in a scalar context. For an array this is the length of an
array:
$length=@poly_peptide;
$length=$#poly_peptide+1;
This is equivalent to
Hence:
while (@dblines) {
..
array in scalar
context = length of
array
Special variables
$_
Set in many situations such as reading from a file or in a foreach
loop.
$0
Name of the file currently being executed.
$]
Version of Perl being used.
@_
Contains the parameters passed to a subroutine.
@ARGV
Contains the command line arguments passed to the program.
Perl defines some variables for special purposes,
including:
Some are read-only and cannot be changed: see man
perlvar for more details.
Associative Arrays (Hashes)
Similar to normal arrays but the elements are identified by
keys and not indices. The keys can be more complicated,
such as strings of characters.

Hashes are indicated by % and can be initialized with lists
like arrays:
%hash = (key1,val1,key2,val2,key3,val3..);
Associative Arrays (Hashes)
Examples
%months=(jan,31,feb,28,mar,31,apr,30);
Alternatively,
%months=(jan=> 31,
feb=> 28,
mar=> 31,
apr=> 30);
=> is a synonym for ,
key
value
Associative Arrays (Hashes)
Further examples
#
%classification = (dog => mammal, robin =>
bird, snake => reptile);
%genetic_code = (
TCA => ser,
TTC => phe,
TTA => leu,
TTA => STOP
CCC => pro,
...
);
The elements of a hash are accessed using curly
brackets, { and } :
Associative Arrays (Hashes) - elements
$genetic_code{TCA} = ser;
$genetic_code{CCC} = pro;
$genetic_code{TGA} = STOP;
Note the $ sign: the elements are scalars
and so must be preceded by $, even
though they belong to a % (just as for
arrays).
Associative Arrays (Hashes) useful
functions
exists
indicates whether a key exists in the hash

if (exists $genetic_code{$codon}) {
...
}else {
print Bad codon $codon\n;
exit;
}
Associative Arrays (Hashes) useful
functions
keys and values
makes arrays from the keys and values of a
hash.
@codons = keys %genetic_code;
@amino_acids = values %genetic_code;
Often you will see code like the following:
foreach $codon (keys %genetic_code) {
if ($genetic_code{$codon} eq STOP) {
last; # i.e. stop translating
} else {
$protein .= $genetic_code{$codon};
}

Anda mungkin juga menyukai