Shlomo Yona <yona@cs.technion.ac.il> http://yeda.cs.technion.ac.il/~yona/
If you put a "\" in front of a variable, you get a reference to that variable.
$aref = \@array; # $aref now holds a reference to @array
$href = \%hash; # $href now holds a reference to %hash
Once the reference is stored in a variable like $aref or $href, you can copy it or store it just the same as any other scalar value:
$xy = $aref; # $xy now holds a reference to @array
$p[3] = $href; # $p[3] now holds a reference to %hash
$z = $p[3]; # $z now holds a reference to %hash
Sometimes you want to make an array or a hash that doesn't have a name.
This is analogous to the way you like to be able to use the string ""\n"" or the number 80 without having to store it in a named variable first.
"[ ITEMS ]" makes a new, anonymous array, and returns a reference to that array.
"{ ITEMS }" makes a new, anony mous hash. and returns a reference to that hash.
$aref = [ 1, "foo", undef, 13 ];
# $aref now holds a reference to an array
$href = { APR => 4, AUG => 8 };
# $href now holds a reference to a hash
The references you get from rule 2 are the same kind of references that you get from rule 1:
# This:
$aref = [ 1, 2, 3 ];
# Does the same as this:
@array = (1, 2, 3);
$aref = \@array;
The first line is an abbreviation for the following two lines, except that it doesn't create the superfluous array variable "@array".
What can you do with a reference once you have it?
It's a scalar value, and we've seen that you can store it as a scalar and get it back again just like any scalar.
There are just two more ways to use it.
If "$aref" contains a reference to an array, then you can put "{$aref}" anywhere you would normally put the name of an array.
For example, "@{$aref}" instead of "@array".
@a @{$aref} An array
reverse @a reverse @{$aref} Reverse the array
$a[3] ${$aref}[3] An element of the array
$a[3] = 17; ${$aref}[3] = 17 Assigning an element
On each line are two expressions that do the same thing.
The left-hand versions operate on the array "@a", and the right-hand versions operate on the array that is referred to by "$aref", but once they find the array they're operating on, they do the same things to the arrays.
%h %{$href} A hash
keys %h keys %{$href} Get the keys from the hash
$h{'red'} ${$href}{'red'} An element of the hash
$h{'red'} = 17 ${$href}{'red'} = 17 Assigning an element
"${$aref}[3]" is too hard to read, so you can write "$aref->[3]" instead.
"${$href}{red}" is too hard to read, so you can write "$href->{red}" instead.
Most often, when you have an array or a hash, you want to get or set a single element from it.
"${$aref}[3]" and "${$href}{'red'}" have too much punctuation, and Perl lets you abbreviate.
If "$aref" holds a reference to an array, then "$aref->[3]" is the fourth element of the array.
Don't confuse this with "$aref[3]", which is the fourth element of a totally different array, one deceptively named "@aref".
"$aref" and "@aref" are unrelated the same way that "$item" and "@item" are.
Similarly, "$href->{'red'}" is part of the hash referred to by the scalar variable "$href", perhaps even one with no name.
"$href{'red'}" is part of the deceptively named "%href" hash.
It's easy to forget to leave out the "->", and if you do, you'll get bizarre results when your program gets array and hash elements out of totally unexpected hashes and arrays that weren't the ones you wanted to use.
Let's see a quick example of how all this is useful.
First, remember that "[1, 2, 3]" makes an anonymous array containing "(1, 2, 3)", and gives you a reference to that array.
Now think about
@a = ( [1, 2, 3],
[4, 5, 6],
[7, 8, 9]
);
@a is an array with three elements, and each one is a reference to another array.
@a = ( [1, 2, 3],
[4, 5, 6],
[7, 8, 9]
);
"$a[1]" is one of these references.
It refers to an array, the array containing "(4, 5, 6)", and because it is a reference to an array, USE RULE 2 says that we can write "$a[1]->[2]" to get the third element from that array.
"$a[1]->[2]" is the 6. Similarly, "$a[0]->[1]" is the 2.
What we have here is like a two-dimensional array; you can write "$a[ROW]->[COLUMN]" to get or set the element in any row and any column of the array.
The notation still looks a little cumbersome, so there's one more abbreviation:
Instead of "$a[1]->[2]", we can write "$a[1][2]"; it means the same thing. Instead of "$a[0]->[1]", we can write "$a[0][1]"; it means the same thing.
Now it really looks like two-dimensional arrays!
You can see why the arrows are important. Without them, we would have had to write "${$a[1]}[2]" instead of "$a[1][2]".
For three-dimensional arrays, they let us write "$x[2][3][5]" instead of the unreadable "${${$x[2]}[3]}[5]".
You have a file of city and country names, like this:
and you want to produce an output like this, with each country mentioned once, and then an alphabetical list of the cities in that country:
1 while (<>) {
2 chomp;
3 my ($city, $country) = split /, /;
4 push @{$table{$country}}, $city;
5 }
6
7 foreach $country (sort keys %table) {
8 print "$country: ";
9 my @cities = @{$table{$country}};
10 print join ', ', sort @cities;
11 print ".\n";
12 }
The program has two pieces: Lines 1--5 read the input and build a data structure, and lines 7--12 analyze the data and print out the report.
4 push @{$table{$country}}, $city;
We're going to have a hash, "%table", whose keys are country names, and whose values are (references to) arrays of city names.
After acquiring a city and country name, the program looks up "$table{$country}", which holds (a reference to) the list of cities seen in that country so far.
Line 4 is totally analogous to
push @array, $city;
except that the name "array" has been replaced by the reference "{$table{$country}}".
The "push" adds a city name to the end of the referred-to array.
9 my @cities = @{$table{$country}};
Again, "$table{$country}" is (a reference to) the list of cities in the country, so we can recover the original list, and copy it into the array "@cities", by using "@{$table{$country}}".
Line 9 is totally analogous to
@cities = @array;
except that the name "array" has been replaced by the reference "{$table{$country}}".
The "@" tells Perl to get the entire array.
The rest of the program is just familiar uses of "chomp", "split", "sort", "print", and doesn't involve references at all.
For information about these functions look them up using 'perldoc -f FUNCNAME', e.g. 'perldoc -f chomp'.
There's one fine point that we have skipped.
Suppose the program has just read the first line in its input that happens to mention Greece.
Control is at line 4, "$country" is "'Greece'", and "$city" is "'Athens'".
Since this is the first city in Greece, "$table{$country}" is undefined---in fact there isn't an "'Greece'" key in "%table" at all.
4 push @{$table{$country}}, $city;
This is Perl, so it does the exact right thing.
It sees that you want to push "Athens" onto an array that doesn't exist, so it helpfully makes a new, empty, anonymous array for you, installs it in the table, and then pushes "Athens" onto it.
This is called `autovivification'.
We have covered material worth 90% of the benefit with 10% of the details, and that means I left out 90% of the details.
Now that you have an overview of the important parts, it should be easier to read the the 'perlref' (perldoc -f perlref) manpage manual page, which discusses 100% of the details.
You might prefer to go on to the 'perllol' (perldoc perllol) manpage instead of the 'perlref' (perldoc perlref) manpage; it discusses lists of lists and multidimensional arrays in detail.
After that, you should move on to the 'perldsc' (perldoc perldsc) manpage; it's a Data Structure Cookbook that shows recipes for using and printing out arrays of hashes, hashes of arrays, and other kinds of data.
OK, so we learned a bit about references, and suddenly it seems simple to create complex data structures (e.g. lists of lists), as long as you remember the making rules and using rules of references.
$array[7][12] # array of arrays
$array[7]{string} # array of hashes
$hash{string}[7] # hash of arrays
$hash{string}{'another string'} # hash of hashes
Consider some complex data structure (even some multidimensional array):
And surely more questions can ba asked... I'll try to clear some of the fog.
Since a complex data structure has the top level containing only references, if you try to print out your array in with a simple print() function, you'll get something that doesn't look very nice.
Let's print just one element:
@AoA = ( [2, 3], [4, 5, 7], [0] );
print $AoA[1][2];
7
Seems OK, now let's try to have the whole list of lists printed out:
print @AoA;
ARRAY(0x83c38)ARRAY(0x8b194)ARRAY(0x8b1d0)
That's because Perl doesn't (ever) implicitly dereference your variables. If you want to get at the thing a reference is referring to, then you have to do this yourself using either prefix typing indicators, like "${$blah}", "@{$blah}", "@{$blah[$i]}", or else postfix pointer arrows, like "$a->[3]", "$h->{fred}", or even "$ob->method()->[3]".
$AoA[$i] = @array; # WRONG!
That's just the simple case of assigning an array to a scalar and getting its element count. If that's what you really and truly want, then you might do well to consider being a tad more explicit about it, like this:
$counts[$i] = scalar @array;
Here's the case of taking a reference to the same memory location again and again:
for $i (1..10) {
@array = somefunc($i);
$AoA[$i] = \@array; # WRONG!
}
So, what's the big problem with that? It looks right, doesn't it? After all, I just told you that you need an array of references.
Unfortunately, while this is true, it's still broken. All the references in @AoA refer to the very same place, and they will therefore all hold whatever was last in @array!
In Perl, you'll want to use the array constructor "[]" or the hash constructor "{}" instead.
Here's the right way to do the preceding broken code fragments:
for $i (1..10) {
@array = somefunc($i);
$AoA[$i] = [ @array ];
}
The square brackets make a reference to a new array with a copy of what's in @array at the time of the assignment. This is what you want.
Note that this will produce something similar, but it's much harder to read:
for $i (1..10) {
@array = 0 .. $i;
@{$AoA[$i]} = @array;
}
Is it the same? Well, maybe so--and maybe not. The subtle difference is that when you assign something in square brackets, you know for sure it's always a brand new reference with a new copy of the data.
Something else could be going on in this new case with the "@{$AoA[$i]}}" dereference on the left-hand-side of the assignment.
It all depends on whether "$AoA[$i]" had been undefined to start with, or whether it already contained a reference.
If you had already populated @AoA with references, as in
$AoA[3] = \@another_array;
Then the assignment with the indirection on the left-hand-side would use the existing reference that was already there:
@{$AoA[3]} = @array;
Of course, this would have the "interesting" effect of clobbering @another_array.
So just remember always to use the array or hash constructors with "[]" or "{}", and you'll be fine, although it's not always optimally efficient.
Surprisingly, the following dangerous-looking construct will actually work out fine:
for $i (1..10) {
my @array = somefunc($i);
$AoA[$i] = \@array;
}
This means that the my() variable is remade afresh each time through the loop.
So even though it looks as though you stored the same variable reference each time, you actually did not! This is a subtle distinction that can produce more efficient code at the risk of misleading all but the most experienced of programmers.
Beginners might try to use the much more easily understood constructors "[]" and "{}" instead of relying upon lexical (or dynamic) scoping and hidden reference-counting to do the right thing behind the scenes.
$AoA[$i] = [ @array ]; # usually best
$AoA[$i] = \@array; # perilous; just how my() was that array?
@{ $AoA[$i] } = @array; # way too tricky for most programmers
Speaking of things like "@{$AoA[$i]}", the following are actually the same thing:
$aref->[2][2] # clear
$$aref[2][2] # confusing
That's because Perl's precedence rules on its five prefix dereferencers (which look like someone swearing: "$ @ * % &") make them bind more tightly than the postfix subscripting brackets or braces!
This will no doubt come as a great shock to the C or C++ programmer, who is quite accustomed to using "*a[i]" to mean what's pointed to by the i'th element of "a". That is, they first take the subscript, and only then dereference the thing at that subscript. That's fine in C, but this isn't C.
The seemingly equivalent construct in Perl, "$$aref[$i]" first does the deref of $aref, making it take $aref as a reference to an array, and then dereference that, and finally tell you the i'th value of the array pointed to by $AoA. If you wanted the C notion, you'd have to write "${$AoA[$i]}" to force the "$AoA[$i]" to get evaluated first before the leading "$" dereferencer.
If this is starting to sound scarier than it's worth, relax.
Perl has some features to help you avoid its most common pitfalls. The best way to avoid getting confused is to start every program like this:
#!/usr/bin/perl -w
use strict;
This way, you'll be forced to declare all your variables with my() and also disallow accidental "symbolic dereferencing".
Therefore if you'd done this:
my $aref = [
[ "fred", "barney", "pebbles", "bambam", "dino", ],
[ "homer", "bart", "marge", "maggie", ],
[ "george", "jane", "elroy", "judy", ],
];
print $aref[2][2];
The compiler would immediately flag that as an error at compile time, because you were accidentally accessing "@aref", an undeclared variable, and it would thereby remind you to write instead:
print $aref->[2][2]
Before version 5.002, the standard Perl debugger didn't do a very nice job of printing out complex data structures. With 5.002 or above, the debugger includes several new features, including command line editing as well as the "x" command to dump out complex data structures. For example, given the assignment to $AoA above, here's the debugger output:
more on debugger features see:
'perldoc perldebtut' (Perl debugging tutorial)
and
'perldoc perldebug' (Perl debugging).
use Data::Dumper;
# simple procedural interface
print Dumper($aref);
More on the Data::Dumper see 'perldoc Data::Dumper'.
For more code samples see:
'perldoc perllol' (Perl data structures: arrays of arrays) and 'perldoc perldsc' (Perl data structures intro) especially for code samples.