Anda di halaman 1dari 23

Basic Script

#############################################################################
#
# name: main
# purpose: entry point of the script
#############################################################################
#

# @ARGV holds all script command line arguments (pos 0 is not prog-name)
# $0 holds script filename
print "hello world\n";

Data Types
#############################################################################
#
# name: main
# purpose: show the basic datatypes
#############################################################################
#

### scalars (ints, floats, strings)


$float = 3.14; # can hold real / whole numbers
$false = 0; # 0 counts as FALSE, non-zero is TRUE
$str = "hello"; # can hold strings
$false2 = ""; # en empty string is FALSE of course
$false3 = '0'; # "0" also counts as FALSE, all other strings are TRUE
undef $num; # similar to NULL, counts as FALSE
$line = "hello\n"; # $line holds 'hello' and then a new-line (LF) - 6
chars
$firm = 'hello\n'; # $firm holds the text 'hello\n' - 7 chars
$ten_a = 'a' x 10; # $ten_a holds 'aaaaaaaaaa'
$long1 = <<"END1"; # long text with a few lines - as a "" string
This is long text,
With $float lines.
END1
$long2 = <<'END2'; # long text with a few lines - as a '' string
This can hold \n.
END2

# scalar operations
$num = $num*2 + 3 - $float; # $num is 23.86
$num = 2**4 % 5; # $num is 1 - exp then modulus
$num++; # $num is 2 - inc after eval
$ms = (1<< 3)&0xff|0x03^0x01; # $ms is 0x0a
print ++($foo = '99'); # prints '100' - inc before eval
$new = $str." world"; # $new is "hello world"

### arrays (lists of scalars)


@nums = (1,2,3);
@strings = ("one",$str); # @strings is ("one","hello")
@mixed = ("three",3.13); # any scalar can be placed in a list
@empty = (); # empty list counts as FALSE, non-empty is TRUE
($one,$two) = (1,2); # $one is 1, $two is 2
@to_ten = (1,2,3..10); # 3..10 is a list of all nums from 3 to 10

# array operations
$first = $nums[0]; # $first is 1
$strings[1] = "neo"; # @strings is ("one","neo")
$mixed[2] = 37; # @mixed is ("three",3.13,37) - grows automatically
@joined = (@mixed,8); # @joined is ("three",3.13,37,8)
@sl = @nums[0,-1,1]; # @sl is (1,3,2) - array slice (specific indices)
@sl = @nums[0..2]; # @sl is (1,2,3) - array slice (span)
$len = scalar(@nums); # an array in scalar context is the list length (3)
$last_index = $#nums; # $last_index is 2 (the last index in the list)
$#nums = -1; # @nums is () - empty

### hashes (maps of keys and values)


%ages = ("jim"=>18,"ted"=>21); # the key "jim" has a value of 18
%same = ("jim",18,"ted",21); # => is exactly like ,
%mix_hash = (1=>"bla","hi"=>22.1); # any scalar can be a key or value
%empty_hash = (); # empty hash counts as FALSE, full is
TRUE

# hash operations
$jims_age = $ages{"jim"}; # $jims_age is 18
$ages{"jim"}++; # key "jim" has value of 19 in %ages
$ages{"ron"} = 24; # key "ron" with value 24 added to %ages
@sl = @ages{"ted","ron"}; # @sl is (21,24) - hash slice
$stats = scalar(%ages); # string eg. "1/16" - 1 used bucket out of 16
alloced

### references (scalar that holds a pointer to another type)


$scalarref = \$num;
$arrayref = \@mixed;
$hashref = \%ages;

# reference operations
$num_copy = $$scalarref; # dereference using {type}$reference
@mixed_copy = @$arrayref;
$value = $$hashref{"jim"};
$value = $arrayref->[0]; # or dereference using $reference->
$value = $hashref->{"jim"};

Conditionals
#############################################################################
#
# name: main
# purpose: show the basic conditionals
#############################################################################
#
# regular c style if statement, must use blocks
if (defined($value) && ($value == 1)) # defined() tests for
undef
{
print "value equals 1\n";
}

# if-else, must use blocks


if (($job eq "millionaire") || ($state ne "dead")) # eq,ne are used for
strings
{
print "a suitable husband found\n";
}
else
{
print "not suitable\n";
}

# unless is the opposite of if, must use blocks


unless (($age >= 18) and ($age < 80)) # and,or,not are also ok
{
print "too old\n";
}

# short forms (no blocks needed if a single statement comes before)


print "ok" if $ok;
print "ok" unless not $ok;

# and the true perl way


open(FILE) or die "cant open file";

Loops and Iterations


#############################################################################
#
# name: main
# purpose: show the flow blocks
#############################################################################
#

# for (regular c style), must use a block


for ($i=0; $i<10; $i++)
{
print "iteration number $i\n";
}

# foreach (iterate on lists), must use blocks


foreach $num (@numbers) # $num hold a member in each iteration
{
print "$num";
}
foreach (@numbers) # if excluded, the member is stored in $_
{
print; # by default, $_ is printed
}
for (1..10) { print "a"; } # 'foreach' is actually a synonym for
'for'

# while / until statements, must use blocks


$i = 0;
while ($i < 10) # enter block if condition is TRUE
{
print "iteration number $i\n";
$i++;
}
until ($i == 0) # enter block if condition is FALSE
{
$i--;
print "back to $i\n";
}

# do while / until just like in c, must use blocks


$i = 0;
do
{
print "this will print\n"; # enter block once before evaluating
} while ($i != 0);
do
{
print "this too\n";
} until ($i == 0);

# short forms (no blocks needed if a single statement comes before)


print "a" for (1..10);
read_next_line() while not end_of_file();
read_next_line() until end_of_file();

# next and last statements are similar to c continue and break


for ($i=0; $i<10; $i++)
{
next if ($i == 3); # skip printing 3 (go to next iteration)
last if ($i == 5); # exit the loop before printing 5
print $i; # will print 0124
}

Functions
#############################################################################
#
# name: main
# purpose: show function and subroutine syntax
#############################################################################
#

# return values
sub seventeen1 # return keyword indicated return value
{
return 17;
}
sub seventeen2 # if no return exists, retval is the last
expression
{
17;
}
$num = seventeen1() + seventeen2() + 53;
sub retlist # all datatypes can be returned
{
return (1,2,3);
}
($one,$two,$thr) = retlist; # () are optional (even when we have args)

# arguments
sub has_args
{
@func_arguments = @_; # all arguments are members of the list @_
$first_arg = $_[0]; # returns undef if no arg given
($arg1,$arg2,$arg3) = @_; # the common perl way to handle function
arguments
}
has_args($num,@l1,22,@l2); # all arguments are flattened into one list
sub takes_two_lists # to pass several lists / hashes, use references
{
($l1ref,$l2ref) = @_;
@list1 = @$l1ref;
}
takes_two_lists(\@a,\@b);

# prototypes (limited compile-time argument checking)


sub two_scalars($$) { }; # two_scalars(12,"hello");
sub scalar_n_list($@) { }; # scalar_n_list("scalar",1,2,3);
sub array_ref(\@) { }; # array_ref(@array);

Regular Expressions
#############################################################################
#
# name: main
# purpose: show regular expression usage
#############################################################################
#

# matching
$call911 = 'Someone, call 911.'; # the string we want to match upon
$found = ($call911 =~ /call/); # $found is TRUE, matched 'call'
@res = ($call911 =~ /Some(...)/); # @res is ('one'), matched 'Someone'
$entire_res = $&; # $entire_res is 'Someone'
$brack1_res = $1; # $brack1_res is 'one', $+ for last
brackets
($entire_pos,$brack1_pos) = @-; # $entire_pos is 0, $brack1_pos is 4
($entire_end,$brack1_end) = @+; # $entire_end is 7, $brack1_end is 7
# global matching (get all found)
$call911 =~ /(.o.)/g; # g is global-match, $1 is 'Som', $2 is
'eon'
@res = ($call911 =~ /(.o.)/g); # @res is ('Som','eon'), $& is 'eon'

# substituting
$greeting = "hello world"; # the string we want to replace in
$greeting =~ s/hello/goodbye/; # $greeting is 'goodbye world'

# splitting
@l = split(/\W+/,$call911); # @l is ('Someone','call','911')
@l = split(/(\W+)/,$call911); # @l is ('Someone',', ','call','
','911','.')

# pattern syntax
$call911 =~ /c.ll/; # . is anything but \n, $& is 'call'
$call911 =~ /c.ll/s; # s is singe-line, . will include \n, $& is 'call'
$call911 =~ /911\./; # \ escapes metachars {}[]()^$.|*+?\, $& is '911.'
$call911 =~ /o../; # matches earliest, $& is 'ome'
$call911 =~ /g?one/; # ? is 0 or 1 times, $& is 'one'
$call911 =~ /cal+/; # + is 1 or more times, $& is 'call', * for 0 or
more
$call911 =~ /cal{2}/; # {2} is exactly 2 times, $& is 'call'
$call911 =~ /cal{0,3}/; # {0,3} is 0 to 3 times, $& is 'call', {2,} for >=
2
$call911 =~ /S.*o/; # matches are greedy, $& is 'Someo'
$call911 =~ /S.*?o/; # ? makes match non-greedy, $& is 'So'
$call911 =~ /^.o/; # ^ must match beginning of line, $& is 'So'
$call911 =~ /....$/; # $ must match end of line, $& is '911.'
$call911 =~ /9[012-9a-z]/;# one of the letters in [...], $& is '91'
$call911 =~ /.o[^m]/; # none of the letters in [^...], $& is 'eon'
$call911 =~ /\d*/; # \d is digit, $& is '911'
$call911 =~ /S\w*/; # \w is word [a-zA-Z0-9_], $& is 'Someone'
$call911 =~ /..e\b/; # \b is word boundry, $& is 'one', \B for non-
boundry
$call911 =~ / \D.../; # \D is non-digit, $& is ' call', \W for non-word
$call911 =~ /\s.*\s/; # \s is whitespace char [\t\n ], $& is ' call '
$call911 =~ /\x39\x31+/; # \x is hex byte, $& is '911'
$call911 =~ /Some(.*),/; # (...) extracts, $1 is 'one', $& is 'Someone,'
$call911 =~ /e(one|two)/; # | means or, $& is 'eone'
$call911 =~ /e(?:one|tw)/;# (?:...) does not extract, $& is 'eone', $1 is
undef
$call911 =~ /(.)..\1/; # \1 is memory of first brackets, $& is 'omeo'
$call911 =~ /some/i; # i is case-insensitive, $& is 'Some'
$call911 =~ /^Some/m; # m is multi-line, ^ will match start of entire
text
$call911 =~ m!call!; # use ! instead of /, no need for \/, $& is 'call'

Special Variables
#############################################################################
#
# name: main
# purpose: show some special internal variables
#############################################################################
#

# $_ - default input
print for (1..10); # in many places, no var will cause work on $_
print $_ for $_ (1..10); # same as above

# $. - current line in last file handle


while (!(<IN> =~ /error/i)) {};
print "first error on line $.\n";

# $/ - input record separator (default is "\n")


undef $/;
$entire = <IN>; # read entire file all at once
$/ = "512";
$chunk = <IN>; # read a chunk of 512 bytes

# $\ - output record separator (default is undef)


$\ = "\n"; # auto \n after print
print 'no need for LF';

# $! - errno / a string description of error


open(FILE) or die "error: $!";

# $@ - errors from last eval


eval $cmd;
print "eval successful" if not $@;

Standard IO
#############################################################################
#
# name: main
# purpose: show some basic IO and file handling
#############################################################################
#

# open a file a la shell


open(IN, "< input.txt") or die "cant open input file: $!";
open(OUT, ">> output.txt") or die "cant open output file: $!";
# binmode(IN) to change IN from txt mode to binary mode

# read records from a file (according to $/)


while ($line = <IN>) # <IN> returns next line, or FALSE if none left
{
# write data to a file
print OUT $line;
}

# cleanup
close(IN);
close(OUT);

# check if file exists


print "$filename exists" if (-e $filename);

# check the file size


print "$filename file size is ".(stat $filename)[7];

# get all the txt files in current directory


@txtfiles = <*.txt>; # perl globbing
@txtfiles = `dir /b *.txt`; # or use the shell (slower), needs chomping

Useful Functions and Keywords


#############################################################################
#
# name: main
# purpose: show some basic functions and keywords of perl
#############################################################################
#

# scalar / string functions


foreach (`dir /b`) { chomp; print; } # chomp removes \n tail (according to
$/)
$ext = chop($file).$ext for (1..3); # chop removes last char and returns it
print 'a is '.chr(ord('a')); # ord converts chr to num, chr is
opposite
print lc("Hello"), uc(" World"); # prints 'hello WORLD'
print length("hello"); # prints '5'
$three_a = sprintf("%08x",58); # just like regular c sprintf
print($type) if ($type = ref $ref); # prints 'SCALAR'/'ARRAY'/'HASH'/'REF'

# regexps and pattern matching functions


print quotemeta('[.]'); # prints '\[\.\]'
@words = split(/W+/,$sentence); # splits a string according to a regexp

# array / list functions


@three_two_one = reverse(1,2,3); # returns a list in reverse
print pop(push(@arr,'at end')); # prints 'at end', no change to @arr
print shift(unshift(@arr,'at start'); # prints 'at start', no change to @arr
@after = grep(!/^\s*#/, @before); # weed out full comment lines
$sentence = join(' ',@words); # turns lists into strings with a delim
print sort <*.*>; # sort string lists in alphabetical
order
delete @arr[3..5]; # deletes the 3rd,4th,5th elements in
@arr
print "length is ".scalar @arr; # scalar evaluates expressions as
scalars

# hash related functions


delete @hash{"key1","key2"}; # deletes these keys from the hash
print $hash{$_} foreach (keys %hash); # prints all hash values by checking
keys
print values(%hash); # same but different

# misc functions and keywords


sleep(10); # causes the script to sleep for 10
secs
exit(0) if $should_quit; # exits the script with a return value
use warnings; use strict; # imports new external modules
no warnings; no strict; # un-imports imported external modules
my $var; # declare a local variable (strict)
undef($null) if defined($null); # check if a variable is defined
eval '$pn = $0;'; print $pn; # interpret new perl code in runtime
system("del $filename"); # run commands in the shell (blocking)
system("start calc.exe"); # run commands in the shell
(nonblocking)
@files = `dir /b`; # run & get output of shell commands
("")

is module provides syntax highlighting for Perl code. The design bias is roughly line-oriented
and streamed (ie, processing a file line-by-line in a single pass). Provisions may be made in the
future for tasks related to "back-tracking" (ie, re-doing a single line in the middle of a stream)
such as speeding up state copying.

Constructors
The only constructor provided is new(). When called on an existing object, new() will create a
new copy of that object. Otherwise, new() creates a new copy of the (internal) Default Object.
Note that the use of the procedural syntax modifies the Default Object and that those changes
will be reflected in any subsequent new() calls.

Formatting
Formatting is done using the format_string() method. Call format_string() with one or
more strings to format, or it will default to using $_.

Setting and Getting Formats


You can set the text used for formatting a syntax element using set_format() (or set the start
and end format individually using set_start_format() and set_end_format(), respectively).

You can also retrieve the text used for formatting for an element via get_start_format() or
get_end_format. Bulk retrieval of the names or values of defined formats is possible via
get_format_names_list() (names), get_start_format_values_list() and
get_end_format_values_list().

See "FORMAT TYPES" later in this document for information on what format elements can be
used.

Checking and Setting the State


You can check certain aspects of the state of the formatter via the methods: in_heredoc(),
in_string(), in_pod(), was_pod(), in_data(), and line_count().

You can reset all of the above states (and a few other internal ones) using reset().

Stable and Unstable Formatting Modes


You can set or check the stability of formatting via unstable().

In unstable (TRUE) mode, formatting is not considered to be persistent with nested formats. Or,
put another way, when unstable, the formatter can only "remember" one format at a time and
must reinstate formatting for each token. An example of unstable formatting is using ANSI color
escape sequences in a terminal.

In stable (FALSE) mode (the default), formatting is considered persistent within arbitrarily
nested formats. Even in stable mode, however, formatting is never allowed to span multiple
lines; it is always fully closed at the end of the line and reinstated at the beginning of a new line,
if necessary. This is to ensure properly balanced tags when only formatting a partial code
snippet. An example of stable formatting is HTML.

Substitutions
Using define_substitution(), you can have the formatter substitute certain strings with
others, after the original string has been parsed (but before formatting is applied). This is useful
for escaping characters special to the output mode (eg, > and < in HTML) without them affecting
the way the code is parsed.

You can retrieve the current substitutions (as a hash-ref) via substitutions().

FORMAT TYPES

The Syntax::Highlight::Perl formatter recognizes and differentiates between many Perl


syntactical elements. Each type of syntactical element has a Format Type associated with it.
There is also a 'DEFAULT' type that is applied to any element who's Format Type does not have
a value.

Several of the Format Types have underscores in their name. This underscore is special, and
indicates that the Format Type can be "generalized." This means that you can assign a value to
just the first part of the Format Type name (the part before the underscore) and that value will be
applied to all Format Types with the same first part. For example, the Format Types for all types
of variables begin with "Variable_". Thus, if you assign a value to the Format Type "Variable", it
will be applied to any type of variable. Generalized Format Types take precedence over non-
generalized Format Types. So the value assigned to "Variable" would be applied to
"Variable_Scalar", even if "Variable_Scalar" had a value explicitly assigned to it.
You can also define a "short-cut" name for each Format Type that can be generalized. The short-
cut name would be the part of the Format Type name after the underscore. For example, the
short-cut for "Variable_Scalar" would be "Scalar". Short-cut names have the least precedence
and are only assigned if neither the generalized Type name, nor the full Type name have values.

Following is a list of all the syntactical elements that Syntax::Highlight::Perl currently


recognizes, along with a short description of what each would be applied to.

Comment_Normal

A normal Perl comment. Starts with '#' and goes until the end of the line.

Comment_POD

Inline documentation. Starts with a line beginning with an equal sign ('=') followed by a
word (eg: '=pod') and continuing until a line beginning with '=cut'.

Directive

Either the "she-bang" line at the beginning of the file, or a line directive altering what the
compiler thinks the current line and file is.

Label

A loop or statement label (to be the target of a goto, next, last or redo).

Quote

Any string or character that begins or ends a String. Including, but not necessarily limited
to: quote-like regular expression operators (m//, s///, tr///, etc), a Here-Document
terminating line, the lone period terminating a format, and, of course, normal quotes (', ",
`, q{}, qq{}, qr{}, qx{}).

String

Any text within quotes, formats, Here-Documents, Regular Expressions, and the like.

Subroutine

The identifier used to define, identify, or call a subroutine (or method). Note that
Syntax::Highlight::Perl cannot recognize a subroutine if it is called without using
parentheses or an ampersand, or methods called using the indirect object syntax. It
formats those as barewords.

Variable_Scalar
A scalar variable.

Note that (theoretically) this format is not applied to non-scalar variables that are being
used as scalars (ie: array or hash lookups, nor references to anything other than scalars).
Syntax::Highlight::Perl figures out (or at least tries to) the actual type of the variable
being used (by looking at how you're subscripting it) and formats it accordingly. The first
character of the variable (ie, the $, @, %, or *) tells you the type of value being used, and
the color (hopefully) tells you the type of variable being used to get that value.

(See "KNOWN ISSUES" for information about when this doesn't work quite right.)

Variable_Array

An array variable (but not usually a slice; see above).

Variable_Hash

A hash variable.

Variable_Typeglob

A typeglob. Note that typeglobs not beginning with an asterisk (*) (eg: filehandles) are
formatted as barewords. This is because, well, they are.

Whitespace
Whitespace. Not usually formatted but it can be.

Character

A special, or backslash-escaped, character. For example: \n (newline), or \d (digits).

Only occurs within strings or regular expressions.

Keyword

A Perl keyword. Some examples include: my, local, sub, next.

Note that Perl does not make any distinction between keywords and built-in functions (at
least not in the documentation). Thus I had to make a subjective call as to what would be
considered keywords and what would be built-in functions.

The list of keywords can be found (and overloaded) in the variable


$Syntax::Highlight::Perl::keyword_list_re as a pre-compiled regular expression.

Builtin_Function

A Perl built-in function, called as a function (ie, using parentheses).

The list of built-in functions can be found (and overloaded) in the variable
$Syntax::Highlight::Perl::builtin_list_re as a pre-compiled regular expression.

Builtin_Operator

A Perl built-in function, called as a list or unary operator (ie, without using parentheses).

The list of built-in functions can be found (and overloaded) in the variable
$Syntax::Highlight::Perl::builtin_list_re as a pre-compiled regular expression.

Operator

A Perl operator.

The list of operators can be found (and overloaded) in the variable


$Syntax::Highlight::Perl::operator_list_re as a pre-compiled regular
expression.

Bareword

A bareword. This can be user-defined subroutine called without parentheses, a typeglob


used without an asterisk (*), or just a plain old bareword.
Package

The name of a package or pragmatic module.

Note that this does not apply to the package portion of a fully qualified variable name.

Number

A numeric literal.

Symbol

A symbol (ie, non-operator punctuation).

CodeTerm

The special tokens that signal the end of executable code and the begining of the DATA
section. Specifically, '__END__' and '__DATA__'.

DATA

Anything in the DATA section (see CodeTerm).

PROCEDURAL vs. OBJECT ORIENTED

Syntax::Highlight::Perl uses OO method-calls internally (and actually defines a Default Object


that is used when the functions are invoked procedurally) so you will not gain anything
(efficiency-wise) by using the procedural interface. It is just a matter of style.

It is actually recommended that you use the OO interface, as this allows you to instantiate
multiple, concurrent-yet-separate formatters. Though I cannot think of why you would need
multiple formatters instantiated. :-)

One point to note: the new() method uses the Default Object to initialize new objects. This
means that any changes to the state of the Default Object (including Format definitions) made by
using the procedural interface will be reflected in any subsequently created objects. This can be
useful in some cases (eg, call set_format() procedurally just before creating a batch of new
objects to define default Formats for them all) but will most likely lead to trouble.

METHODS
new PACKAGE

new OBJECT
Creates a new object. If called on an existing object, creates a new copy of that object
(which is thenceforth totally separate from the original).

reset

Resets the object's internal state. This breaks out of strings and here-docs, ends PODs,
resets the line-count, and otherwise gets the object back into a "normal" state to begin
processing a new stream.

Note that this does not reset any user options (including formats and format stability).

unstable EXPR

unstable

Returns true if the formatter is in unstable mode.

If called with a non-zero number, puts the formatter into unstable formatting mode.

In unstable mode, it is assumed that formatting is not persistent one token to the next and
that each token must be explicitly formatted.

in_heredoc

Returns true if the next string to be formatted will be inside a Here-Document.

in_string

Returns true if the next string to be formatted will be inside a multi-line string.

in_pod

Returns true if the formatter would consider the next string passed to it as begin within a
POD structure. This is false immediately before any POD instigators (=pod, =head1,
=item, etc), true immediately after an instigator, throughout the POD and immediately
before the POD terminator (=cut), and false immediately after the POD terminator.

was_pod

Returns true if the last line of the string just formatted was part of a POD structure. This
includes the /^=\w+/ POD instigators and terminators.

in_data
Returns true if the next string to be formatted will be inside the DATA section (ie,
follows a __DATA__ or __END__ tag).

line_count

Returns the number of lines processed by the formatter.

substitutions

Returns a reference to the substitution table used. The substitution table is a hash whose
keys are the strings to be replaced, and whose values are what to replace them with.

define_substitution HASH_REF

define_substitution LIST

Allows user to define certain characters that will be substituted before formatting is done
(but after they have been processed for meaning).

If the first parameter is a reference to a hash, the formatter will replace it's own hash with
the given one, and subsequent changes to the hash outside the formatter will be reflected.

Otherwise, it will copy the arguments passed into it's own hash, and any substitutions
already defined (but not in the parameter list) will be preserved. (ie, the new substitutions
will be added, without destroying what was there already.)

set_start_format HASH_REF

set_start_format LIST

Given either a list of keys/values, or a reference to a hash of keys/values, copy them into
the object's Formats list.

set_end_format HASH_REF

set_end_format LIST

Given either a list of keys/values, or a reference to a hash of keys/values, copy them into
the object's Formats list.

set_format LIST

Sets the formatting string for one or more formats.


You should pass a list of keys/values where the keys are the format names and the values
are references to arrays containing the starting and ending formatting strings (in that
order) for that format.

get_start_format LIST

Retrieve the string that is inserted to begin a given format type (starting format string).

The names are looked for in the following order:

First: Prefer the names joined by underscore, from most general to least. For example,
given ("Variable", "Scalar"): "Variable" then "Variable_Scalar".

Second: Then try each name singly, in reverse order. For example, "Scalar" then
"Variable".

See "FORMAT TYPES" for more information.

get_end_format LIST

Retrieve the string that is inserted to end a given format type (ending format string).

get_format_names_list

Returns a list of the names of all the Formats defined.

get_start_format_values_list

Returns a list of the values of all the start Formats defined (in the same order as the
names returned by get_format_names_list()).

get_end_format_values_list

Returns a list of the values of all the end Formats defined (in the same order as the names
returned by get_format_names_list()).

format_string LIST

Formats one or more strings of Perl code. If no strings are specified, defaults to $_.
Returns the list of formatted strings (or the first string formatted if called in scalar
context).

Note: The end of the string is considered to be the end of a line, regardless of whether or
not there is a trailing line-break (but trailing line-breaks will not cause an extra, empty
line).
Another Note: The function actually uses $/ to determine line-breaks, unless $/ is set to
\n (newline). If $/ is \n, then it looks for the first match of m/\r?\n|\n?\r/ in the string
and uses that to determine line-breaks. This is to make it easy to handle non-unix text.
Whatever characters it ends up using as line-breaks are preserved.

format_token TOKEN, LIST

Returns TOKEN wrapped in the start and end Formats corresponding to LIST (as would
be returned by get_start_format( LIST ) and get_end_format( LIST ),
respectively).

No syntax checking is done on TOKEN but substitutions defined with


define_substitution() are performed.

KNOWN ISSUES or LIMITATIONS

 Barewords used as keys to a hash are formatted as strings. This is Good. They should not be,
however, if they are not the only thing within the curly braces. That can be fixed.
 This version does not handle formats (see perlform(1)) very well. It treats them as Here-
Documents and ignores the rules for comment lines, as well as the fact that picture lines are not
supposed to be interpolated. Thus, your picture lines will look strange with the '@'s being
formatted as array variables (albeit, invalid ones). Ideally, it would also treat value lines as
normal Perl code and format accordingly. I think I'll get to the comment lines and non-
interpolating picture lines first. If/When I do get this fixed, I will most likely add a format type of
'Format' or something, so that they can be formatted differently, if so desired.

 This version does not handle Regular Expression significant characters. It simply treats Regular
Expressions as interpolated strings.
 User-defined subroutines, called without parentheses, are formatted as barewords. This is
because there is no way to tell them apart from barewords without parsing the code, and would
require us to go as far as perl does when doing the -c check (ie, executing BEGIN and END
blocks and the like). That's not going to happen.
 If you are indexing (subscripting) an array or hash, the formatter tries to figure out the "real"
variable class by looking at how you index the variable. However, if you do something funky (but
legal in Perl) and put line-breaks or comments between the variable class
character ($) and your identifier, the formatter will get confused and treat your variable as a
scalar. Until it finds the index character. Then it will format the scalar class character ($) as a
scalar and your identifier as the "correct" class.
 If you put a line-break between your variable identifier and it's indexing character (see above),
which is also legal in Perl, the formatter will never find it and treat your variable as a scalar.
 If you put a line-break between a bareword hash-subscript and the hash variable, or between a
bareword and its associated => operator, the bareword will not be formatted correctly (as a
string). (Noticing a pattern here?)

AUTHOR

Cory Johns darkness@yossman.net

Copyright (c) 2001 Cory Johns. This library is free software; you can redistribute and/or modify
it under the same conditions as Perl itself.

TO DO

1. Improve handling of regular expressions. Add support for regexp-special characters. Recognize
the /e option to the substitution operator (maybe).
2. Improve handling of formats. Don't treat format definitions as interpolating. Handle format-
comments. Possibly format value lines as normal Perl code.
3. Create in-memory deep-copy routine to replace eval(Data::Dumper) deep-copy.
4. Generalize state transitions (reset() and, in the future, copy_state()) to use non-hard-
coded keys and values for state variables. Probably will extrapolate them into an overloadable
hash, and use the aforementioned deep-copy to assign them.
5. Create a method to save or copy states between objects ( copy_state()). Would be useful for
using this module in an editor.
6. Add support for greater-than-one length special characters. Specifically, octal, hexidecimal, and
control character codes. For example, \644, \x1a4 or \c[.

REVISIONS

04-04-2001 Cory Johns


 Fixed problem with special characters not formatting inside of Here-Documents.
 Fixed bug causing hash variables to format inside of Here-Documents.

03-30-2001 Cory Johns


 Fixed bug where quote-terminators were checked for inside of Here-Documents.

Anda mungkin juga menyukai