How to deal with more elaborate records

In this page we'll cover the following topics:

You can do this by declaring the value associated with that given key a reference to an array where you can store multiple values. Below you’ll find an example about how you can do this:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
my %object = (wide => 30, height => 25, shape => 'circle');
my @colors = qw(brown red yellow blue);
 
push @{$object{colors}}, $_ foreach (@colors);
 
use Data::Dumper;
print Dumper(\%object);

I populated the %object hash with three keys: wide, height, shape and I want to add the color key whose associated value is a reference to a list of colors. I stored the colors in the @colors array.

The Perl foreach loop will iterate through the @colors array and at each iteration step the current color will be stored in $_. Then, the value stored in $_ is append to the array referenced by the colors hash key.

Instead of the Perl push, you can use the following assignment statement:

$object{colors} = [qw(brown red yellow blue)];

To see the result, I printed the hash using the Data::Dumper module. The output is as follows:

$VAR1 = {
          'shape' => 'circle',
          'wide' => 30,
          'height' => 25
          'colors' => [
                        'brown',
                        'red',
                        'yellow',
                        'blue'
                      ],
        };

If you want to print the list associated with the colors key by coding, you can do something similar with the next code snippet:

while (my ($k, $v) = each(%object)){
  next unless $k eq 'colors';
  print "$_ " foreach (@{$v});
  last;
}
print "\n";
The while loop iterates through the %object hash step by step using the each function. If the key is not equal with colors, the loop will continue with the next iteration.

If the key colors was found, the list associated with it will be printed, each element on a new line. Because the Perl foreach statement has an array as argument and not a reference to an array, we need to dereference the $v reference by using the @{$v} notation.

You’ll get as output:

    brown red yellow blue

Hashes are internally one-dimensional and they can store only scalar values, meaning strings, numbers and references.

They cannot contain other arrays or hashes, but their values can contain references to other arrays or hashes. Using the mechanism of references, you can easily emulate multidimensional structures in Perl.                       

The following example shows you how to use three nested Perl foreach to traverse a hash of hashes of hashes in order to print it. 

To get access to a hash defined by a reference first you must dereference the hash reference by putting a % symbol followed by the reference enclosed in curly braces: %{$v1} (or you can omit the braces %$v1).

By enclosing an item between braces you make a new anonymous hash that returns a reference to that hash, for example {Country=>"England",City=>"London"} returns a reference to the (Country=>"England",City=>"London") anonymous hash.

Now please follow me with the next code snippet:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
# create a hash of hashes of hashes
my %HoHoH = (
 John => { Age => "22", Address =>
    {Country =>"England", City => "London"}},
 Paul => { Age => "25", Address =>
    {Country =>"France", City => "Paris"}},
 Mary => { Age => "20", Address =>
    {Country =>"Romania", City => "Bucharest"}}
);
 
# print the hash of hashes of hashes
 
foreach my $k1 (keys %HoHoH){
  my $v1 = $HoHoH{$k1};
  if(ref $v1 eq 'HASH') {
    print "$k1 =>\n";
    foreach my $k2 (keys %$v1){
      my $v2 = $$v1{$k2};
      if(ref $v2 eq 'HASH') {
        print "\t$k2 =>\n";
        foreach my $k3 (keys %$v2){
          print "\t\t$k3 => $$v2{$k3}\n"; 
        }
      } else {
        print "\t$k2 => $v2\n"; 
      }
    }
  } else {
      print "$k1 =>$v1\n";
  }
}

In the above example I used the Perl ref function to see if the value of a hash is a reference.

To get a value of a hash element, where the hash is defined by a reference, I used the notation $$hashRef{key} (or you can use $hashRef->{key})

And the output:

John =>
        Age => 22
        Address =>
                Country => England
                City => London
Mary =>
        Age => 20
        Address =>
                Country => Romania
                City => Bucharest
Paul =>
        Age => 25
        Address =>
                Country => France
                City => Paris
 
To print this hash, we used three nested foreach. The same thing can be done with three nested Perl while:
 
while (my ($k1, $v1) = each %HoHoH)
{
  if(ref $v1 eq 'HASH') {
    print "$k1 =>\n";
    while (my ($k2, $v2) = each %$v1) {
      if(ref $v2 eq 'HASH') {
        print "\t$k2 =>\n";
        while (my ($k3, $v3) = each %$v2) {
          print "\t\t$k3 => $v3\n"; 
        }
      } else {
        print "\t$k2 => $v2\n"; 
      }
    }
  } else  {
      print "$k1 =>$v1\n";
  }
}
Note: Our %HoHoH is a particular type of Perl hash of hashes of hashes, not every value of the first inner hash is a reference to other hash (the Age key has associated as value a simple string value).

The next example will show you how to deal with "multidimensional" structures using a recursive subroutine. You can see an example about how to use the Perl chomp function too.

Please take a first look at it and then follow me to go step by step through this code:

#!/usr/local/bin/perl
 
use strict;
use warnings;
use Data::Dumper;
 
# define and initialize an array of arrays
my @AoA = (
  ["a11\n", "a12\n"],
  ["a21\n", "a22\n"],
);
 
# define and initialize a hash of hashes
my %HoH = (
 John => {Age => "22\n", Sex => "M\n"},
 Paul => {Age => "25\n", Sex => "M\n"},
 Mary => {Age => "20\n", Sex => "F\n"},
);
 
sub AHchomp{
  my $x = shift;
  if( ref $x eq 'HASH' ) {
    # use foreach to iterate through the hash values
    # %$x is used to dereference the hash $x reference
    foreach( values %$x ) {
      AHchomp($_);   # invoke the AHchomp subroutine
    }
  } elsif( ref $x eq 'ARRAY' ) {
    # use foreach to iterate through the array elements
    # @$x is used to dereference the array $x reference
    foreach( @$x ) {
      AHchomp($_);   # invoke the AHchomp subroutine
    }
  } else
  {
    chomp;
  }
}
 
print "\n";
 
# invoke the AHchomp subroutine with a reference to
# @AoA as its argument
 
AHchomp(\@AoA);
print "\@AoA content:\n\n";
print Dumper(\@AoA);
print "\n";
 
# invoke the AHchomp subroutine with a reference to
# %HoH as its argument
 
AHchomp(\%HoH);
print "\%HoH content:\n\n";
print Dumper(\%HoH);

At the beginning of the script sample, we define and initialize two structures:

  • @AoA – an array of arrays

  • %HoH – a hash of hashes

Next, we define the AHcomp subroutine that will go through our complex data structures recursively. We’ll assume that the elements of the complex structure we intend to loop through can be numeric or string scalars and references to arrays and hashes. We use the ref function to check the type of each element of the structure.

The above code produces the following output:

@AoA content:
 
$VAR1 = [
          [
            'a11',
            'a12'
          ],
          [
            'a21',
            'a22'
          ]
        ];
 
%HoH content:
 
$VAR1 = {
          'John' => {
                      'Age' => '22',
                      'Sex' => 'M'
                    },
          'Mary' => {
                      'Age' => '20',
                      'Sex' => 'F'
                    },
          'Paul' => {
                      'Age' => '25',
                      'Sex' => 'M'
                    }
        };
 
Obviously, you can use this subroutine to perform whatever you want instead of Perl chomp by modifying the code appropriately.

A problem with the above subroutine is when your complex structure has references that are not unique, the subroutine will execute forever and your script will never end (well, almost never). We need to improve our subroutine in order to avoid an infinite loop. I give you one solution here, but you can imagine other ways to do this.

See the following snippet code:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
# get a reference to an anonymous array
my $refa = ["one", "two", "three", "four"];
 
# get a reference to an anonymous hash
my $refh = {1 => "one", 2 => "two"};
 
# define an array with references
my @array = ();
@array = ($refa, $refh, \@array);
# please note that @array contains as an element
# a reference to itself:
 
my %Ref = ();
# add to %Ref a key with the reference of our array
$Ref{\@array} = 0;
 
sub AHchomp{
  my $x = shift;
  if(ref $x eq 'HASH') {
    if(exists $Ref{$x}) {
      print "Infinite loop: $x\n";
      return;
    }
    # add the $x hash reference as a key to %Ref
    $Ref{$x} = 0;
   
    AHchomp($_) foreach(values %$x);
 
  } elsif(ref $x eq 'ARRAY') {
    if(exists $Ref{$x}) {
      print "Infinite loop: $x\n"; 
      return;
    }
   
    # add the $x array reference as a key to %Ref
    $Ref{$x} = 0;
   
    AHchomp($_) foreach(@$x);
  } else
  {
    chomp;
  }
}
 
# invoke the AHchomp subroutine with a reference to
# @array as its argument
AHchomp(\@array);
In this snippet code I use the %Ref hash that will contain as keys all the references included in your complex structure.

If a reference is not unique (i.e. appears more than once in your structure) the subroutine will print the appropriate message with the reference that is not unique and then it will return.

I used the Perl exists function to check if a key exists in the %Ref. If the reference was not found, it will be added to %Ref as a key. 

You’ll get as output something like this:

    Infinite loop: ARRAY(0x1839ce0)

Let’s say we have a complex data structure (a hash) named %earth. To copy a complex data structure into a new one there are two ways:

A shallow copy – it assumes to copy the content of the aggregate structure into a new one. You can do this by a simple assignment, as shown below:

my %newEarth = %earth;
I’ll explain what that means by using the %earth hash example.

By using this method you just copy the (key, val) pair elements from %earth into %newEarth.

The two hashes will share the items given by references  in such a way that if you change the content of an item given by a reference, both %earth and %newEarth are altered as they both point to the same item.

A deep copy – it assumes to copy the items of the aggregate structure and the content of the references too, continuing until no more references are found - this is easier to accomplish if you use a recursive subroutine.

The following example shows you how to use a recursive subroutine to copy each of the data contained in the %earth hash:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
my %earth = (
 Continents => {Europe => ['France', 'Germany'],
                Asia => {India => 'Delhi',
                         China => 'Beijing'} },
 Oceans => ['Indian', 'Pacific', 'Arctic']
);
 
# invoke the clone subroutine
my %newEarth = clone(%earth);
 
# alter a few items from %earth
$earth{Continents}{Europe}[2] = 'Italy';
$earth{Continents}{Asia}{Thailand} = 'Bangkok';
pop @{$earth{Oceans}};  # delete the Arctic ocean
 
sub clone {
  map { ! ref() ? $_ :
        ref eq 'ARRAY' ? [clone(@$_)] :
        ref eq 'HASH' ? {clone(%$_)} :
        die "$_ - reference not supported" } @_;
}
 
use Data::Dumper;
print Dumper \%newEarth;
print Dumper \%earth;
The output is presented below.

the %newEarth hash (identical with the initial %earth):

$VAR1 = {
          'Continents' => {
                            'Europe' => [
                                          'France',
                                          'Germany'
                                        ],
                            'Asia' => {
                                        'India' => 'Delhi',
                                        'China' => 'Beijing'
                                      }
                          },
          'Oceans' => [
                        'Indian',
                        'Pacific',
                        'Arctic'
                      ]
        };

the %earth hash (after a few items were changed):

$VAR1 = {
          'Continents' => {
                            'Europe' => [
                                          'France',
                                          'Germany',
                                          'Italy'
                                        ],
                            'Asia' => {
                                        'India' => 'Delhi',
                                        'Thailand' => 'Bangkok',
                                        'China' => 'Beijing'
                                      }
                          },
          'Oceans' => [
                        'Indian',
                        'Pacific'
                      ]
        }

As you can see from the output, the contents of the two hashes are different, the %newEarth hash haven’t been affected by this change. To find more about this clone() subroutine, you might have a look at the similar topics described for array of arrays, array of hashes, hash of arrays or hash of hashes.

For more complicated structures you can use the Storable module which provides the dclone function that allows you to do recursively copies too (See perlfaq4).

In a complex data structure you can use the Perl exists function to avoid autovivification when you don’t intend to use it.

See the following example:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
my %earth = (
 Continents => {Europe => ['France', 'Germany'],
                Asia => {India => 'Delhi',
                           China => 'Beijing'}},
 Oceans => ['Indian', 'Pacific', 'Arctic']
);
 
# autovivification
$earth{Continents}{Europe}[2][3] = 'Spain';
$earth{Continents}{Asia}{Thailand}[2] = 'Bangkok';
defined $earth{Oceans}[3]{Tahiti} || print "not found\n";
 
use Data::Dumper;
print Dumper \%earth;

First we populate a complex structure with a few entries.

Now let’s pay a bit of attention to this code.

The first assignment statement:

$earth{Continents}{Europe}[2][3] = 'Spain';
adds an entry to our complex data structure.

Because $earth{Continents}{Europe}[2] doesn’t exist it will be created with an appropriate value, so you don’t need to create yourself the inner array ($earth{Continents}{Europe}[2] = []).

This process is called autovivification and it is very useful when you have to deal with this kind of assignments. The expression can be arbitrary complicated and Perl will create all the structures it needs to make the assignment.

But if you look at the statement:

defined $earth{Oceans}[3]{Tahiti} || print "not found\n";
first it will be evaluated the defined $earth{Oceans}[3]{Tahiti} expression and because the result is false the Perl print function will be executed.

But in the process of evaluation Perl needs to create $earth{Oceans}[3] which will remain as an empty hash entry in our complex data structure.

This time the process of autovivification enlarged our structure with an unnecessary entry - see the output.

Please note the using of || short-circuit operator that evaluates the second operand only if the first operand is evaluated false.

To see what is happening, I printed the hash using the Data::Dumper module. The output of this script is as follows:

not found
$VAR1 = {
          'Continents' => {
                            'Europe' => [
                                          'France',
                                          'Germany',
                                          [
                                            undef,
                                            undef,
                                            undef,
                                            'Spain'
                                          ]
                                        ],
                            'Asia' => {
                                        'India' => 'Delhi',
                                        'Thailand' => [
                                                        undef,
                                                        undef,
                                                        'Bangkok'
                                                      ],
                                        'China' => 'Beijing'
                                      }
                          },
          'Oceans' => [
                        'Indian',
                        'Pacific',
                        'Arctic',
                        {}
                      ]
        };

As I mentioned at the beginning of this script, to avoid autovivification in this last case, you can use the Perl exists function:

if(exists $earth{Oceans}[3] && defined $earth{Oceans}[3]{Tahiti})
{
   print "found\n"; 
};
Here we use the && short-circuit operator, first we test if $earth{Oceans}[3] exists and only afterwards we check if the $earth{Oceans}[3]{Tahiti} expression is defined. This time the value of the Oceans key will be not altered with extra entries:


          'Oceans' => [
                        'Indian',
                        'Pacific',
                        'Arctic'
                      ]

Please note the using of && short-circuit operator that evaluates the second operand only if the first operand is evaluated true. 

The following example shows you how you can populate a hash with some functions or commands that you can call in execution later.

See the following code snippet:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
my %operators = (
 add => '+',
 subtract => '-',
 multiply => '*',
 divide => '/',
 power => '**');
 
my ($p1, $p2) = (10,2);
 
my %HOF = (
  add => sub { ($p1, $p2) = (shift, shift); $p1 + $p2; },
  subtract => sub { ($p1, $p2) = (shift, shift); $p1 - $p2;},
  multiply => sub { ($p1, $p2) = (shift, shift); $p1 * $p2; },
  divide => sub { ($p1, $p2) = (shift, shift); $p1 / $p2; },
  power => sub { ($p1, $p2) = (shift, shift); $p1 ** $p2; },
);
 
while (my($k, $v) = each %operators){
  print "$p1 $operators{$k} $p2 = ",
        $HOF{$k}->($p1, $p2), "\n";
}
I want to say a few words about this code. The %operators hash was populated with the name of some arithmetic operators and their associated symbol.  In the %HOF hash we defined the same keys as in the %operators hash but for each key we assigned as value a specific subroutine. The $p1 and $p2 are the operands used by each binary operator.  

The Perl while loop and the each function were used to step through the %operators hash. We used the notation  $HOF{$k}->($p1,$p2) to invoke the appropriate operator by dereferencing the hash value as a function and passing that function the ($p1,$p2)argument list. The result of operation is printed.

You get as output:

10 + 2 = 12
10 - 2 = 8
10 ** 2 = 100
10 / 2 = 5
10 * 2 = 20

Or if you have more lines of code for your functions, you can name your functions and use the function references instead, you can rewrite the above example as follows:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
my ($p1, $p2) = (10,2);
 
sub add { ($p1, $p2) = (shift, shift); $p1 + $p2; };
sub subtract { ($p1, $p2) = ( shift, shift); $p1 - $p2; };
sub multiply{ ($p1, $p2) = (shift, shift); $p1 * $p2; };
sub divide { ($p1,$p2) = (shift, shift); $p1 / $p2; };
sub power{ ($p1,$p2) = (shift, shift); $p1 ** $p2; };
 
my %operators = (add => '+', subtract => '-',
                 multiply => '*', divide => '/',
                 power => '**');
 
my %HoF = (add => \&add, subtract => \&subtract,
           multiply => \&multiply, divide => \&divide,
           power => \&power,);
 
while (my($k, $v) = each %operators){
  print "$p1 $operators{$k} $p2 = ",
        $HoF{$k}->($p1, $p2), "\n";
}

The output is the same as before.

You can use the %HoF hash with a set of commands that perform something or whatever you want.