Shlomo Yona <yona@cs.technion.ac.il> http://yeda.cs.technion.ac.il/~yona/
Today's lecture will also cover a lot of small details.
We will run through many examples.
Instead of trying to remember all the small details, try to focus on the big picture, as you can always check out the details on the freely available perldocs.
The quantifier metacharacters "?", "*", "+", and "{}" allow us to determine the number of repeats of a portion of a regex we consider to be a match. Quantifiers are put immediately after the character, character class, or grouping that we want to specify. They have the following meanings:
/[a-z]+\s+\d*/; # match a lowercase word, at least some space, and
# any number of digits
/(\w+)\s+\1/; # match doubled words of arbitrary length
$year =~ /\d{2,4}/; # make sure year is at least 2 but not more
# than 4 digits
$year =~ /\d{4}|\d{2}/; # better match; throw out 3 digit dates
These quantifiers will try to match as much of the string as possible, while still allowing the regex to match. So we have
$x = 'the cat in the hat'; $x =~ /^(.*)(at)(.*)$/; # matches, # $1 = 'the cat in the h' # $2 = 'at' # $3 = '' (0 matches)
The first quantifier ".*" grabs as much of the string as possible while still having the regex match. The second quantifier ".*" has no string left to it, so it matches 0 times.
By default, a quantified subpattern is "greedy", that is, it will match as many times as possible (given a particular starting location) while still allowing the rest of the pattern to match.
"12345" =~ /(\d+)(\d)/; # $1 = '1234'
# $2 = '5'
If you want it to match the minimum number of times possible, follow the quantifier with a "?". Note that the meanings don't change, just the "greediness":
"12345" =~ /(\d+?)(\d)/; # $1 = '1'
# $2 = '2'
Search and replace is performed using "s/regex/replace ment/modifiers". The "replacement" is a Perl double quoted string that replaces in the string whatever is matched with the "regex". The operator "=~" is also used here to associate a string with "s///". If matching against "$_", the "$_ =~" can be dropped. If there is a match, "s///" returns the number of substitutions made, otherwise it returns false.
$x = "Time to feed the cat!";
$x =~ s/cat/hacker/; # $x contains "Time to feed the hacker!"
$y = "'quoted words'";
$y =~ s/^'(.*)'$/$1/; # strip single quotes,
# $y contains "quoted words"
With the "s///" operator, the matched variables "$1", "$2", etc. are immediately available for use in the replacement expression. With the global modifier, "s///g" will search and replace all occurrences of the regex in the string:
$x = "I batted 4 for 4";
$x =~ s/4/four/; # $x contains "I batted four for 4"
$x = "I batted 4 for 4";
$x =~ s/4/four/g; # $x contains "I batted four for four"
The evaluation modifier "s///e" wraps an "eval{...}" around the replacement string and the evaluated result is substituted for the matched substring.
# reverse all the words in a string
$x = "the cat in the hat";
$x =~ s/(\w+)/reverse $1/ge; # $x contains "eht tac ni eht tah"
# convert percentage to decimal
$x = "A 39% hit rate";
$x =~ s!(\d+)%!$1/100!e; # $x contains "A 0.39 hit rate"
The last example shows that "s///" can use other delimiters, such as "s!!!" and "s{}{}", and even "s{}//". If single quotes are used "s'''", then the regex and replacement are treated as single quoted strings.
"split /regex/, string" splits "string" into a list of substrings and returns that list. The regex determines the character sequence that "string" is split with respect to. For example, to split a string into words, use
$x = "Calvin and Hobbes";
@word = split /\s+/, $x; # $word[0] = 'Calvin'
# $word[1] = 'and'
# $word[2] = 'Hobbes'
To extract a comma-delimited list of numbers, use
$x = "1.618,2.718, 3.142";
@const = split /,\s*/, $x; # $const[0] = '1.618'
# $const[1] = '2.718'
# $const[2] = '3.142'
If the empty regex "//" is used, the string is split into individual characters. If the regex has groupings, then list produced contains the matched substrings from the groupings as well:
$x = "/usr/bin";
@parts = split m!(/)!, $x; # $parts[0] = ''
# $parts[1] = '/'
# $parts[2] = 'usr'
# $parts[3] = '/'
# $parts[4] = 'bin'
Since the first character of $x matched the regex, "split" prepended an empty initial element to the list.
sub NAME; # A "forward" declaration.
sub NAME(PROTO); # ditto, but with prototypes
sub NAME BLOCK # A declaration and a definition.
sub NAME(PROTO) BLOCK # ditto, but with prototypes
$subref = sub BLOCK; # no proto
$subref = sub (PROTO) BLOCK; # with proto
NAME(LIST); # & is optional with parentheses.
NAME LIST; # Parentheses optional if predeclared/imported.
&NAME(LIST); # Circumvent prototypes.
&NAME; # Makes current @_ visible to called subroutine.
Functions whose names are in all upper case are reserved to the Perl core, as are modules whose names are in all lower case.
A function in all capitals is a loosely-held convention meaning it will be called indirectly by the run-time system itself, usually due to a triggered event.
Functions that do special, pre-defined things include "BEGIN", "CHECK", "INIT", "END", "AUTOLOAD", and "DESTROY"--plus all functions mentioned in the perltie manpage.
my $foo; # declare $foo lexically local
my (@wid, %get); # declare list of variables local
my $foo = "flurp"; # declare $foo lexical, and init it
my @oof = @bar; # declare @oof lexical, and init it
The "my" operator declares the listed variables to be lexically confined to the enclosing block, conditional ("if/unless/elsif/else"), loop ("for/fore ach/while/until/continue"), subroutine, "eval", or "do/require/use"'d file.
If more than one value is listed, the list must be placed in parentheses.
All listed elements must be legal lvalues.
Only alphanumeric identifiers may be lexically scoped--magical built-ins like "$/" must currently be "local"ize with "local" instead.
Unlike dynamic variables created by the "local" operator, lexical variables declared with "my" are totally hidden from the outside world, including any called subroutines. This is true if it's the same subroutine called from itself or elsewhere--every call gets its own copy.
Unlike dynamic variables created by the "local" operator, lexical variables declared with "my" are totally hidden from the outside world, including any called subroutines.
This is true if it's the same subroutine called from itself or elsewhere--every call gets its own copy.
This doesn't mean that a "my" variable declared in a statically enclosing lexical scope would be invisible. Only dynamic scopes are cut off.
For example, the "bumpx()" function below has access to the lexical $x variable because both the "my" and the "sub" occurred at the same scope, presumably file scope.
my $x = 10;
sub bumpx { $x++ }
my $foo, $bar = 1; # WRONG defines only one variable:
That has the same effect as
my $foo;
$bar = 1;
The declared variable is not introduced (is not visible) until after the current statement. Thus,
my $x = $x;
can be used to initialize a new $x with the value of the old $x, and the expression
my $x = 123 and $x == 123
is false unless the old $x happened to have the value "123".
use strict 'vars';
Forces you to declare variables either by 'use vars' or by 'our' or by 'my'.
use strict;
Employs more restrictions (see 'perldoc strict' for more information)
Just because a lexical variable is lexically (also called statically) scoped to its enclosing block, this doesn't mean that within a function it works like a C static.
It normally works more like a C auto, but with implicit garbage collection.
Unlike local variables in C or C++, Perl's lexical variables don't necessarily get recycled just because their scope has exited.
If something more permanent is still aware of the lexical, it will stick around.
So long as something else references a lexical, that lexical won't be freed--which is as it should be.
You wouldn't want memory being free until you were done using it, or kept around once you were done.
Automatic garbage collection takes care of this for you.
This means that you can pass back or save away references to lexical variables, whereas to return a pointer to a C auto is a grave error.
It also gives us a way to simulate C's function statics.
Here's a mechanism for giving a function private variables with both lexical scoping and a static lifetime.
If you do want to create something like C's static variables, just enclose the whole function in an extra block, and put the static variable outside the function but in the block.
{
my $secret_val = 0;
sub gimme_another {
return ++$secret_val;
}
}
# $secret_val now becomes unreachable by the outside
# world, but retains its value between calls to gimme_another
$a = 3.1416;
{
local $a = 2.7183;
print "$a\n"; # 2.7183
}
print "$a\n"; # 3.1416
Although this looks like it does the same thing 'my' would in terms of output, behind the scenes something completely different happens.
In the case of 'my' Perl creates a separate variable that cannot be accessed by name at run time. In other words, it never appears in a package symbol table. During the execution of the inner block, the global $a on the outside continues to exist, with its value of 3.1416, in the symbol table.
In the case of 'local', Perl saves the current contents of $a on a run-time stack. The contents of $a are then REPLACED by the new value. When the program exits the enclosing block, the values saved by 'local' are restored. There is only one variable named $a in existence throughout the entire example.
See the 'Temporary Values via local()' entry in the perlsub manpage for details and also the 'When to Still Use local()' entry in the perlsub manpage.
A detailed discussion and overview of references will be given in the next lecture.
For now, we will just see one way of referencing and de-referencing things in Perl.
$scalarref = \$foo;
$arrayref = \@ARGV;
$hashref = \%ENV;
$coderef = \&handler;
$bar = $$scalarref;
push(@$arrayref, $filename);
$$arrayref[0] = "January";
$$hashref{"KEY"} = "VALUE";
&$coderef(1,2,3);
There are many more ways of referencing things in Perl and also dereferencing them, but all we need for now is the terminology and some very basic understanding, so we can get on with this lecture's material - we will understand more in this evening's lecture.
If you want to pass more than one array or hash into a function--or return them from it--and have them maintain their integrity, then you're going to have to use an explicit pass-by-reference.
Here are a few simple examples. First, let's pass in several arrays to a function and have it "pop" all of then, returning a new list of all their former last elements:
@tailings = popmany ( \@a, \@b, \@c, \@d );
sub popmany {
my $aref;
my @retlist = ();
foreach $aref ( @_ ) {
push @retlist, pop @$aref;
}
return @retlist;
}
Here's how you might write a function that returns a list of keys occurring in all the hashes passed to it:
@common = inter( \%foo, \%bar, \%joe );
sub inter {
my ($k, $href, %seen); # locals
foreach $href (@_) {
while ( $k = each %$href ) {
$seen{$k}++;
}
}
return grep { $seen{$_} == @_ } keys %seen;
}
So far, we're using just the normal list return mechanism.
What happens if you want to pass or return a hash? Well, if you're using only one of them, or you don't mind them concatenating, then the normal calling convention is ok, although a little expensive.
Where people get into trouble is here:
(@a, @b) = func(@c, @d); # MISTAKE!
or
(%a, %b) = func(%c, %d); # MISTAKE!
That syntax simply won't work. It sets just "@a" or "%a" and clears the "@b" or "%b". Plus the function didn't get passed into two separate arrays or hashes: it got one long list in "@_", as always.
Perl supports a very limited kind of compile-time argument checking using function prototyping.
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
You probably figured out that subroutines can return either scalar or list values.
You probably understand the significance of scalar and list context in Perl.
You should consider using 'wantarray' in subroutines which return lists.
Returns true if the context of the currently executing subroutine is looking for a list value.
Returns false if the context is looking for a scalar.
Returns the undefined value if the context is looking for no value (void context).
return unless defined wantarray; # don't bother doing more
my @a = complex_calculation();
return wantarray ? @a : "@a";
This function should have been named wantlist() instead.
Please see your handouts and read this offline.
We will not have time to present this in class properly in this lecture, but you might want to be sure you get the idea - so you can figure out how Perl decides if and when your variable is valid or not.