Perl literacy course

Perl literacy course

There's more than one way to do it

lecture #4

Shlomo Yona <yona@cs.technion.ac.il> http://yeda.cs.technion.ac.il/~yona/


Today

Today

  1. References in Perl
  2. Complex data structures


References - outline

References - outline

  1. Making References
  2. Using References


Syntax

Syntax

There are just two ways to make a reference, and just two ways to use it once you have it.


Making References -- Rule 1

Making References -- Rule 1

If you put a "\" in front of a variable, you get a reference to that variable.


$aref = \@array;         # $aref now holds a reference to @array
$href = \%hash;          # $href now holds a reference to %hash


Making References -- Rule 1 (cont.)

Making References -- Rule 1

Once the reference is stored in a variable like $aref or $href, you can copy it or store it just the same as any other scalar value:


$xy = $aref;             # $xy now holds a reference to @array
$p[3] = $href;           # $p[3] now holds a reference to %hash
$z = $p[3];              # $z now holds a reference to %hash


Making References -- Rule 1 (cont.)

Making References -- Rule 1

Sometimes you want to make an array or a hash that doesn't have a name.

This is analogous to the way you like to be able to use the string ""\n"" or the number 80 without having to store it in a named variable first.


Making References -- Make Rule 2

Making References -- Make Rule 2

"[ ITEMS ]" makes a new, anonymous array, and returns a reference to that array.

"{ ITEMS }" makes a new, anony mous hash. and returns a reference to that hash.


           $aref = [ 1, "foo", undef, 13 ];
           # $aref now holds a reference to an array

$href = { APR => 4, AUG => 8 }; # $href now holds a reference to a hash


Making References -- Make Rule 2 (cont.)

Making References -- Make Rule 2

The references you get from rule 2 are the same kind of references that you get from rule 1:


               # This:
               $aref = [ 1, 2, 3 ];

# Does the same as this: @array = (1, 2, 3); $aref = \@array;

The first line is an abbreviation for the following two lines, except that it doesn't create the superfluous array variable "@array".


Using References

Using References

What can you do with a reference once you have it?

It's a scalar value, and we've seen that you can store it as a scalar and get it back again just like any scalar.

There are just two more ways to use it.


Using References -- Use Rule 1

Using References -- Use Rule 1

If "$aref" contains a reference to an array, then you can put "{$aref}" anywhere you would normally put the name of an array.

For example, "@{$aref}" instead of "@array".


Using References -- Use Rule 1 (cont.)

Using References -- Use Rule 1

Arrays:


@a              @{$aref}                An array
reverse @a      reverse @{$aref}        Reverse the array
$a[3]           ${$aref}[3]             An element of the array
$a[3] = 17;     ${$aref}[3] = 17        Assigning an element

On each line are two expressions that do the same thing.

The left-hand versions operate on the array "@a", and the right-hand versions operate on the array that is referred to by "$aref", but once they find the array they're operating on, they do the same things to the arrays.


Using References -- Use Rule 1 (cont.)

Using References -- Use Rule 1

Using a hash reference is exactly same:


%h              %{$href}              A hash
keys %h         keys %{$href}         Get the keys from the hash
$h{'red'}       ${$href}{'red'}       An element of the hash
$h{'red'} = 17  ${$href}{'red'} = 17  Assigning an element


Using References -- Use Rule 2

Using References -- Use Rule 2

alternative syntax which "reads" better:

"${$aref}[3]" is too hard to read, so you can write "$aref->[3]" instead.

"${$href}{red}" is too hard to read, so you can write "$href->{red}" instead.

Most often, when you have an array or a hash, you want to get or set a single element from it.

"${$aref}[3]" and "${$href}{'red'}" have too much punctuation, and Perl lets you abbreviate.


Using References -- Use Rule 2 (cont.)

Using References -- Use Rule 2

Let's make sure we understand and remember the basics!

If "$aref" holds a reference to an array, then "$aref->[3]" is the fourth element of the array.

Don't confuse this with "$aref[3]", which is the fourth element of a totally different array, one deceptively named "@aref".

"$aref" and "@aref" are unrelated the same way that "$item" and "@item" are.


Using References -- Use Rule 2 (cont.)

Using References -- Use Rule 2

Let's make sure we understand and remember the basics!

Similarly, "$href->{'red'}" is part of the hash referred to by the scalar variable "$href", perhaps even one with no name.

"$href{'red'}" is part of the deceptively named "%href" hash.

It's easy to forget to leave out the "->", and if you do, you'll get bizarre results when your program gets array and hash elements out of totally unexpected hashes and arrays that weren't the ones you wanted to use.


An Example

An Example

Let's see a quick example of how all this is useful.

First, remember that "[1, 2, 3]" makes an anonymous array containing "(1, 2, 3)", and gives you a reference to that array.

Now think about


@a = (	[1, 2, 3],
	[4, 5, 6],
	[7, 8, 9]
);

@a is an array with three elements, and each one is a reference to another array.


An Example (cont.)

An Example


@a = (	[1, 2, 3],
	[4, 5, 6],
	[7, 8, 9]
);

"$a[1]" is one of these references.

It refers to an array, the array containing "(4, 5, 6)", and because it is a reference to an array, USE RULE 2 says that we can write "$a[1]->[2]" to get the third element from that array.

"$a[1]->[2]" is the 6. Similarly, "$a[0]->[1]" is the 2.

What we have here is like a two-dimensional array; you can write "$a[ROW]->[COLUMN]" to get or set the element in any row and any column of the array.


Yet another abbreviation

Yet another abbreviation

The notation still looks a little cumbersome, so there's one more abbreviation:


Arrow Rule

Arrow Rule

In between two subscripts, the arrow is optional.

Instead of "$a[1]->[2]", we can write "$a[1][2]"; it means the same thing. Instead of "$a[0]->[1]", we can write "$a[0][1]"; it means the same thing.

Now it really looks like two-dimensional arrays!

You can see why the arrows are important. Without them, we would have had to write "${$a[1]}[2]" instead of "$a[1][2]".

For three-dimensional arrays, they let us write "$x[2][3][5]" instead of the unreadable "${${$x[2]}[3]}[5]".


Some problem as example

Some problem as example

You have a file of city and country names, like this:


Chicago, USA
Frankfurt, Germany
Berlin, Germany
Washington, USA
Helsinki, Finland
New York, USA

and you want to produce an output like this, with each country mentioned once, and then an alphabetical list of the cities in that country:


Finland: Helsinki.
Germany: Berlin, Frankfurt.
USA:  Chicago, New York, Washington.


Some problem as example -- Code

Some problem as example -- Code


	1   while (<>) {
	2     chomp;
	3     my ($city, $country) = split /, /;
	4     push @{$table{$country}}, $city;
	5   }
	6
	7   foreach $country (sort keys %table) {
	8     print "$country: ";
	9     my @cities = @{$table{$country}};
	10     print join ', ', sort @cities;
	11     print ".\n";
	12   }

The program has two pieces: Lines 1--5 read the input and build a data structure, and lines 7--12 analyze the data and print out the report.


Explaining the code

Explaining the code


	4     push @{$table{$country}}, $city;

We're going to have a hash, "%table", whose keys are country names, and whose values are (references to) arrays of city names.

After acquiring a city and country name, the program looks up "$table{$country}", which holds (a reference to) the list of cities seen in that country so far.

Line 4 is totally analogous to


push @array, $city;

except that the name "array" has been replaced by the reference "{$table{$country}}".

The "push" adds a city name to the end of the referred-to array.


Explaining the code (cont.)

Explaining the code


	9     my @cities = @{$table{$country}};

Again, "$table{$country}" is (a reference to) the list of cities in the country, so we can recover the original list, and copy it into the array "@cities", by using "@{$table{$country}}".

Line 9 is totally analogous to


@cities = @array;

except that the name "array" has been replaced by the reference "{$table{$country}}".

The "@" tells Perl to get the entire array.


Explaining the code (cont.)

Explaining the code

The rest of the program is just familiar uses of "chomp", "split", "sort", "print", and doesn't involve references at all.

For information about these functions look them up using 'perldoc -f FUNCNAME', e.g. 'perldoc -f chomp'.


Explaining the code (cont.)

Explaining the code

There's one fine point that we have skipped.

Suppose the program has just read the first line in its input that happens to mention Greece.

Control is at line 4, "$country" is "'Greece'", and "$city" is "'Athens'".

Since this is the first city in Greece, "$table{$country}" is undefined---in fact there isn't an "'Greece'" key in "%table" at all.


Explaining the code (cont.)

Explaining the code


4      push @{$table{$country}}, $city;

This is Perl, so it does the exact right thing.

It sees that you want to push "Athens" onto an array that doesn't exist, so it helpfully makes a new, empty, anonymous array for you, installs it in the table, and then pushes "Athens" onto it.

This is called `autovivification'.


So, now we know Perl references?

So, now we know Perl references?

We have covered material worth 90% of the benefit with 10% of the details, and that means I left out 90% of the details.

Now that you have an overview of the important parts, it should be easier to read the the 'perlref' (perldoc -f perlref) manpage manual page, which discusses 100% of the details.


Some of the highlights of the perlref manpage:

Some of the highlights of the perlref manpage:


Further reading

Further reading

You might prefer to go on to the 'perllol' (perldoc perllol) manpage instead of the 'perlref' (perldoc perlref) manpage; it discusses lists of lists and multidimensional arrays in detail.

After that, you should move on to the 'perldsc' (perldoc perldsc) manpage; it's a Data Structure Cookbook that shows recipes for using and printing out arrays of hashes, hashes of arrays, and other kinds of data.


bibliography

bibliography


Complex Data Structures

Complex Data Structures

OK, so we learned a bit about references, and suddenly it seems simple to create complex data structures (e.g. lists of lists), as long as you remember the making rules and using rules of references.

Here are some examples of of using 4 basic complex structures:


$array[7][12]                       # array of arrays
$array[7]{string}                   # array of hashes
$hash{string}[7]                    # hash of arrays
$hash{string}{'another string'}     # hash of hashes


So what's the complications?

So what's the complications?

Consider some complex data structure (even some multidimensional array):

And surely more questions can ba asked... I'll try to clear some of the fog.


What about printing?

What about printing?

Since a complex data structure has the top level containing only references, if you try to print out your array in with a simple print() function, you'll get something that doesn't look very nice.

Let's print just one element:


@AoA = ( [2, 3], [4, 5, 7], [0] );
print $AoA[1][2];

7

Seems OK, now let's try to have the whole list of lists printed out:


print @AoA;

ARRAY(0x83c38)ARRAY(0x8b194)ARRAY(0x8b1d0)


What about printing? (cont.)

What about printing?

That's because Perl doesn't (ever) implicitly dereference your variables. If you want to get at the thing a reference is referring to, then you have to do this yourself using either prefix typing indicators, like "${$blah}", "@{$blah}", "@{$blah[$i]}", or else postfix pointer arrows, like "$a->[3]", "$h->{fred}", or even "$ob->method()->[3]".


Common mistakes -- Wrong assignment

Common mistakes -- Wrong assignment


$AoA[$i] = @array;      # WRONG!

That's just the simple case of assigning an array to a scalar and getting its element count. If that's what you really and truly want, then you might do well to consider being a tad more explicit about it, like this:


$counts[$i] = scalar @array;


Common mistakes -- accidently reaccessing same location

Common mistakes -- accidently reaccessing same location

Here's the case of taking a reference to the same memory location again and again:


for $i (1..10) {
	@array = somefunc($i);
	$AoA[$i] = \@array;     # WRONG!
}

So, what's the big problem with that? It looks right, doesn't it? After all, I just told you that you need an array of references.

Unfortunately, while this is true, it's still broken. All the references in @AoA refer to the very same place, and they will therefore all hold whatever was last in @array!


Common mistakes -- accidently reaccessing same location (cont.)

Common mistakes -- accidently reaccessing same location

solution

In Perl, you'll want to use the array constructor "[]" or the hash constructor "{}" instead.

Here's the right way to do the preceding broken code fragments:


for $i (1..10) {
	@array = somefunc($i);
	$AoA[$i] = [ @array ];
}

The square brackets make a reference to a new array with a copy of what's in @array at the time of the assignment. This is what you want.


Common mistakes -- accidently reaccessing same location (cont.)

Common mistakes -- accidently reaccessing same location

Note that this will produce something similar, but it's much harder to read:


for $i (1..10) {
	@array = 0 .. $i;
	@{$AoA[$i]} = @array;
}


Common mistakes -- accidently reaccessing same location (cont.)

Common mistakes -- accidently reaccessing same location

Is it the same? Well, maybe so--and maybe not. The subtle difference is that when you assign something in square brackets, you know for sure it's always a brand new reference with a new copy of the data.

Something else could be going on in this new case with the "@{$AoA[$i]}}" dereference on the left-hand-side of the assignment.

It all depends on whether "$AoA[$i]" had been undefined to start with, or whether it already contained a reference.


Common mistakes -- accidently reaccessing same location (cont.)

Common mistakes -- accidently reaccessing same location

If you had already populated @AoA with references, as in


$AoA[3] = \@another_array; 

Then the assignment with the indirection on the left-hand-side would use the existing reference that was already there:


@{$AoA[3]} = @array; 

Of course, this would have the "interesting" effect of clobbering @another_array.


Common mistakes -- accidently reaccessing same location (cont.)

Common mistakes -- accidently reaccessing same location

So just remember always to use the array or hash constructors with "[]" or "{}", and you'll be fine, although it's not always optimally efficient.


Common mistakes -- accidently reaccessing same location (cont.)

Common mistakes -- accidently reaccessing same location

Surprisingly, the following dangerous-looking construct will actually work out fine:


for $i (1..10) { 
	my @array = somefunc($i); 
	$AoA[$i] = \@array; 
} 

This means that the my() variable is remade afresh each time through the loop.

So even though it looks as though you stored the same variable reference each time, you actually did not! This is a subtle distinction that can produce more efficient code at the risk of misleading all but the most experienced of programmers.


Common mistakes -- accidently reaccessing same location (cont.)

Common mistakes -- accidently reaccessing same location

Beginners might try to use the much more easily understood constructors "[]" and "{}" instead of relying upon lexical (or dynamic) scoping and hidden reference-counting to do the right thing behind the scenes.


Common mistakes -- In summary

Common mistakes -- In summary


$AoA[$i] = [ @array ];      # usually best
$AoA[$i] = \@array;         # perilous; just how my() was that array?
@{ $AoA[$i] } = @array;     # way too tricky for most programmers


Caveat on precedence

Caveat on precedence

Speaking of things like "@{$AoA[$i]}", the following are actually the same thing:


$aref->[2][2]       # clear
$$aref[2][2]        # confusing

That's because Perl's precedence rules on its five prefix dereferencers (which look like someone swearing: "$ @ * % &") make them bind more tightly than the postfix subscripting brackets or braces!

This will no doubt come as a great shock to the C or C++ programmer, who is quite accustomed to using "*a[i]" to mean what's pointed to by the i'th element of "a". That is, they first take the subscript, and only then dereference the thing at that subscript. That's fine in C, but this isn't C.


Caveat on precedence (cont.)

Caveat on precedence

The seemingly equivalent construct in Perl, "$$aref[$i]" first does the deref of $aref, making it take $aref as a reference to an array, and then dereference that, and finally tell you the i'th value of the array pointed to by $AoA. If you wanted the C notion, you'd have to write "${$AoA[$i]}" to force the "$AoA[$i]" to get evaluated first before the leading "$" dereferencer.


Why you should always "use strict"

Why you should always "use strict"

If this is starting to sound scarier than it's worth, relax.

Perl has some features to help you avoid its most common pitfalls. The best way to avoid getting confused is to start every program like this:


#!/usr/bin/perl -w
use strict;

This way, you'll be forced to declare all your variables with my() and also disallow accidental "symbolic dereferencing".


Why you should always "use strict" (cont.)

Why you should always "use strict"

Therefore if you'd done this:


my $aref = [
	[ "fred", "barney", "pebbles", "bambam", "dino", ],
	[ "homer", "bart", "marge", "maggie", ],
	[ "george", "jane", "elroy", "judy", ],
];

print $aref[2][2];

The compiler would immediately flag that as an error at compile time, because you were accidentally accessing "@aref", an undeclared variable, and it would thereby remind you to write instead:


print $aref->[2][2]


Debugging -- the debugger

Debugging -- the debugger

Before version 5.002, the standard Perl debugger didn't do a very nice job of printing out complex data structures. With 5.002 or above, the debugger includes several new features, including command line editing as well as the "x" command to dump out complex data structures. For example, given the assignment to $AoA above, here's the debugger output:


DB<1> x $AoA
$AoA = ARRAY(0x13b5a0)
	0  ARRAY(0x1f0a24)
		0  'fred'
		1  'barney'
		2  'pebbles'
		3  'bambam'
		4  'dino'
	1  ARRAY(0x13b558)
		0  'homer'
		1  'bart'
		2  'marge'
		3  'maggie'
	2  ARRAY(0x13b540)
		0  'george'
		1  'jane'
		2  'elroy'
		3  'judy'


Debugging -- the debugger (cont.)

Debugging -- the debugger

more on debugger features see:

'perldoc perldebtut' (Perl debugging tutorial)

and

'perldoc perldebug' (Perl debugging).


Debugging -- Data::Dumper

Debugging -- Data::Dumper


use Data::Dumper;

# simple procedural interface print Dumper($aref);


$VAR1 = [
          [
            'fred',
            'barney',
            'pebbles',
            'bambam',
            'dino'
          ],
          [
            'homer',
            'bart',
            'marge',
            'maggie'
          ],
          [
            'george',
            'jane',
            'elroy',
            'judy'
          ]
        ];


Debugging -- Data::Dumper (cont.)

Debugging -- Data::Dumper

More on the Data::Dumper see 'perldoc Data::Dumper'.


Code samples

Code samples

For more code samples see:

'perldoc perllol' (Perl data structures: arrays of arrays) and 'perldoc perldsc' (Perl data structures intro) especially for code samples.


Thank you

Thank you