How to deal with hashes of hashes - Part 2

We have a Perl hash of hashes named %HoH. We want to extract from the inner hashes of this hash the unique values associated with a particular key and store these values in an array. In the same time we’ll supply the number of times each value is found.

See the code:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
### initialize a hash of hashes
my %HoH = (
 1 => { name => 'John', age => 20 },
 2 => { name => 'Marry', age => 25 },
 3 => { name => 'Patricia', age => 30 },
 4 => { name => 'John', age => 20 },
 5 => { name => 'John', age => 20 },
 6 => { name => 'Patricia', age => 30 }
);
 
my %count;
my @uniqueValues =
  grep !$count{ $HoH{$_}{name} }++, keys(%HoH);
 
print "$HoH{$_}{name} " foreach (@uniqueValues);
print "\n";
print "$_ = $count{$_} " foreach keys(%count);
print "\n";
In the above example, the %HoH hash is populated with 6 entries having as keys 1, 2, 3, 4, 5, 6. Each key has associated as value a reference to a particular hash and the {} constructor is used to return a reference to a particular anonymous hash.

Each inner hash has 2 keys: name and age. We want to extract from these inner hashes the unique values associated with the key name and store these values in the @uniqueValues array.

This can be done by using an additional hash %count. The Perl grep function will loop through the keys of the %HoH hash and it will return a list with the unique values of name key from the inner hashes.

The %count hash has as keys the unique values associated with the name key and as values the counts of how many times that value was found. For example the value 'John' associated with the name key was found three times.

The grep function returns the value associated with the name key only if that value is not present as key in the %count hash. Additionally, in the %count hash the value associated with the current name key is incremented (++).

To see how it works, in the output is shown the content of the @uniqueValues and %count aggregates:

Patricia John Marry
Marry = 1 Patricia = 2 John = 3

A Perl hash of hashes (%HoH) is a hash whose values are references to other hashes. In the example below we’ll get and sort the values associated with a specific key in the inner hashes.

Let’s start looking at the following code:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
### initialize a hash of hashes
my %HoH = (
 1 => {color => 'yellow', width => 250, height => 120},
 2 => {color => 'blue', width => 125, height => 320},
 3 => {color => 'green', width => 25, height => 420},
 4 => {color => 'red', width => 75, height => 12},
);
 
my @colors = sort { $a cmp $b }
             map $_->{color},
             values %HoH;
print "@colors\n";
 
my @widths = sort { $a <=> $b }
             map $_->{width},
             values %HoH;
print "@widths\n";
This code produces the following output:
 
blue green red yellow
25 75 125 250

The Perl map function will loop through the values of the %HoH hash, assigning each value in turn to $_. In the $_ special variable we have a reference to the current inner hash and we need to dereference it by using the arrow operator (->).

The first map will return a list with the values associated with the color key in all inner hashes of the %HoH hash and the second map will return a list with the values mapping the width key in all inner hashes of the %HoH hash.

The sort function will sort ascending the list returned by map.

Please note that to sort the numbers I used the <=> operator and to sort the strings, the cmp operator.

Let’s say you have in stock a few hard drives with the following characteristics:

Item

Mfr

Cap (GB)

Speed (RPM)

Item1

Toshiba

100

4200

Item2

Maxtor

100

5400

Item3

Maxtor

100

7200

Item4

Seagate

100

7200

Item5

Quantum

10.2

4500

Item6

Seagate

160

5400

Item7

Hitachi

250

7200

Item8

Toshiba

60

4200

 

We’ll turn this table into a Perl hash of hashes (%HoH) where the keys of the hash are the item codes and the corresponding values are references to other hashes.

The keys of the inner hash are mfr, cap and sp and their values represent the manufacturer, the capacity and the speed of the hard disk.

After the initialization, we will sort the keys of the hash of hashes after the values of the inner hashes:

  • alphabetically ascending after the manufacturer value, next
  • numerically descending after the speed value and last
  • numerically descending after the capacity value. 

Please note that we can’t sort the hash itself, but rather return a list with the keys ordered after specific criteria.

A hash order is generally random and you can’t rely on a specific order in a hash.

See the code:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
my %HoH = (
 'Item1'=>{mfr=>'Toshiba', cap=>100,  sp=>4200},
 'Item2'=>{mfr=>'Maxtor',  cap=>100,  sp=>5400},
 'Item3'=>{mfr=>'Maxtor',  cap=>100,  sp=>7200},
 'Item4'=>{mfr=>'Seagate', cap=>100,  sp=>7200},
 'Item5'=>{mfr=>'Quantum', cap=>10.2, sp=>4500},
 'Item6'=>{mfr=>'Seagate', cap=>160,  sp=>5400},
 'Item7'=>{mfr=>'Hitachi', cap=>250,  sp=>7200},
 'Item8'=>{mfr=>'Toshiba', cap=>60,   sp=>4200}
);
 
  print
  map { "$HoH{$_}{mfr}\t$HoH{$_}{cap}\t$HoH{$_}{sp} \n"}
  sort {
           $HoH{$a}{mfr} cmp $HoH{$b}{mfr} ||
           $HoH{$b}{sp} <=> $HoH{$a}{sp} ||
           $HoH{$b}{cap} <=> $HoH{$a}{cap}
       } keys %HoH;
This code produces the following output:
 
Hitachi 250     7200
Maxtor  100     7200
Maxtor  100     5400
Quantum 10.2    4500
Seagate 100     7200
Seagate 160     5400
Toshiba 100     4200
Toshiba 60      4200
 
The Perl sort function has as argument the list of keys of the %HoH hash.

$HoH{$a} is the inner hash reference associated with the key $a of the %HoH hash, $HoH{$a}{mfr} is the value corresponding to the mfr key of this inner hash. Here <=> means the numerical comparison operator and cmp the string comparison operator.

If $a is positioned at the left side of the comparison operator this gives an ascending order and at the right side of the comparison operator it gives a descending order. I used the || operator to indicate from left to right the priority of the columns in the sort processing.

The Perl map function has as argument the ordered list of the %HoH hash keys returned by the sort function and will return to the print function a list of strings to be printed.

Practically, for each key of the %HoH hash it will be printed the values of the inner hash corresponding to that key, each inner hash on a new line.

Let’s populate a Perl hash of hashes (%HoH) with a few entries:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
my %HoH = (
  a => {aa => 1, ab => 2, ac => 3},
  b => {ba => 1},
  c => {ca => 1, cb => 2}
);   
 
# clear a particular inner hash
delete $HoH{'a'}{'ab'};
 
# print the %HoH hash of hashes
use Data::Dumper;
print Dumper(\%HoH);
In the above example we use the delete function. Here 'a' is the key of the outer hash and 'ab' as the key of the inner corresponding hash.

The output obtained using Data::Dumper module is shown here:

$VAR1 = {
          'c' => {
                   'ca' => 1,
                   'cb' => 2
                 },
          'a' => {
                   'ac' => 3,
                   'aa' => 1
                 },
          'b' => {
                   'ba' => 1
                 }
        };

In the following example I’ll show you how to delete a specific key of the Perl hash of hashes or empty the hash of hashes (%HoH) using the delete function.

Here is the snippet:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
my %HoH = (
  a => {aa => 1, ab => 2, ac => 3},
  b => {ba => 1},
  c => {ca => 1, cb => 2}
);   
 
# clear a particular inner hash
#%{$HoH{'c'}} = ();
$HoH{'c'} = {};
 
# delete the outer hash entry
delete $HoH{'c'};
 
# print the %HoH hash of hashes
use Data::Dumper;
print Dumper(\%HoH);
First, I populated the %HoH hash of hashes with a few entries. This outer hash has three keys: a, b, c, each key being associated with a reference to an anonymous hash. To create references to anonymous hashes, the {} hash constructor was used.

To delete a particular key of the outer hash, you need to free the memory used by the associated inner hash. In our example we intend to delete the key c.

The value associated with this key is a reference to an anonymous hash, so first we need to free the memory occupied by this inner hash. To accomplish this you can use either:

  • %{$HoH{c}}=()where the inner hash is assigned to an empty hash (here %{} is used to dereference the hash reference) or
  • $HoH{c}={} where the hash reference is assigned to an empty hash reference

Next, you can delete the outer hash entry associated with the c key.

To print the resulting hash, the Data::Dumper module is used:

$VAR1 = {
          'a' => {
                   'ab' => 2,
                   'ac' => 3,
                   'aa' => 1
                 },
          'b' => {
                   'ba' => 1
                 }
        };

Here you can see an example about how to use delete to clear a Perl hash of hashes (%HoH).

The keys function returns a list with the outer hash keys and the foreach loop is used to traverse this list.

# clear the HoH
foreach (keys %HoH) {
  %{$HoH{$_}} = ();
  delete $HoH{$_};  
}
Inside the foreach loop, each key of the outer hash is assigned in turn to $_ and:
 
  • the inner hash corresponding to the current key is cleared
  • the correspondent outer hash entry is deleted

An hash of hashes (%HoH) is a hash whose values are references to other hashes. 

To copy an hash of hashes into a new one there are two ways:

A shallow copy – it assumes to copy the content of the hash of hashes into a new one. You can do this by a simple assignment, as shown below:

my %newHoH = %HoH;
Please note that by using this method you just copy the (key, val) pair elements from %HoH into %newHoH. The two hashes will share the inner hashes, in such a way that if you change the content of an inner hash, both %HoH and %newHoH are changed as they both point to the same anonymous hash.

A deep copy – it assumes to copy the pair elements of the hash of hashes and the content of the inner hashes too. In this case the hash references will point to different memory locations.

The following example shows you how to use a recursive subroutine to copy each of the data contained in the hash of hashes:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
my %HoH = (
 1 => {name => 'John', age => 20},
 2 => {name => 'Mary', age => 25},
);
 
my %newHoH = clone(%HoH);
 
# for the hash having the key equal with 1, in the inner hash
# referenced by the value associated with this key, we alter
# the value corresponding to the age key from 20 to 40
 
$HoH{1}{age} = 40;
 
# print the %HoH
print "\%HoH:\n";
printHoH(\%HoH);
 
# print the %newHoH
print "\n\%newHoH:\n";
printHoH(\%newHoH);
 
sub clone {
  map { ! ref() ? $_ : {clone(%$_)} } @_;
}
 
sub printHoH {
  my %HoH = %{shift()};
  foreach my $oKey( keys %HoH ) {
    print "$oKey: { ";
    foreach my $iKey ( keys %{ $HoH{$oKey} } ) {
      print "$iKey=$HoH{$oKey}{$iKey} ";
    }
    print "}\n";
  }
}
The output is as follows:
 
%HoH:
1: { name=John age=40 }
2: { name=Marry age=25 }
 
%newHoH:
1: { name=John age=20 }
2: { name=Marry age=25 }

In this example we use the clone() subroutine to copy all the elements of our hash of hashes. This subroutine has as arguments the elements of the %HoH hash.

Inside the body of the subroutine we use the Perl map function that loops through @_ (the special @_array has as elements the values passed to the subroutine).

At each iteration step the current element of the @_array is assigned in turn to $_.

Inside the map block the ? ternary operator and the Perl ref function are used to test if an element of @_ array is a reference.

The map function will return:

  • the value stored in $_ if this value is not a reference, otherwise
  • a reference to a new independent anonymous hash created by the {} hash constructor; in the same time we need to call the subroutine again to copy the elements of the hash referenced by $_

After the hash was duplicated, in %HoH hash we altered the content of the element having the key equal with 1: in the anonymous hash referenced  by the value associated with this key, we changed the value corresponding to the age key from 20 to 40.

As you can see from the output, the contents of the two hashes are different, the %newHoHhash haven’t been affected by this change.

To print a hash of hashes, we used the printHoH subroutine and two nested foreach. This subroutine has as parameter a reference to a hash of hashes.

To get the subroutine parameter we used shift with parentheses to tell Perl that shift is a function and not a variable.

The % symbol was used to dereference the reference returned by shift. Please note that inside the body of the printHoH subroutine %HoH is a local variable that we use to store the hash of hashes we intend to print.

For more complicated structures you can use the Storable module which provides the dclone function that allows you to do recursively copies too (See perlfaq4).

Let’s say you have a Perl hash of hashes (%HoH) and you want to use it within a subroutine. A common way to do this is by passing the Perl hash of hashes by reference.

See a simple example below:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
my %HoH = (
  1 => {name => 'John', age => 20},
  2 => {name => 'Marry', age => 25},
);
 
# invoke the subroutine
myPrint(\%HoH);
 
sub myPrint{
  my $hashRef = shift;
  foreach my $oKey ( sort keys %$hashRef ) {
    print "$oKey => {";
    foreach my $iKey ( keys %{$hashRef->{$oKey}} ) {
      print " $iKey => $hashRef->{$oKey}{$iKey}, ";
    }
    print "}\n"
  }
}

This script produces the following output:

1 => { name => John,  age => 20, }
2 => { name => Marry,  age => 25, }

First we populate the % HoH hash of hashes with a few entries.

The {} is the hash constructor and returns a reference to an anonymous hash whose elements are included between braces.

myPrint subroutine is used to print the hash of hashes. It has as argument a reference to a hash of hashes.

Inside the subroutine body we use the shift function to discharge the argument, assigning it to the $hashRef scalar variable. So in $hashRef we have a reference to our hash of hashes. To dereference the hash references, we prefix them with a % sign.

To print the Perl hash of hashes we used two nested foreach loops and the keys function. The foreach loop iterates through hashes by using two iterators:

  • $oKey for the outer hash
  • $iKey for the inner hashes

In a similar way you can modify the subroutine and write your own code in order to perform inside its body whatever you want.

In a Perl hash of hashes (%HoH), you can use the exists function to avoid autovivification when you don’t intend to use it.

See the following example:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
# initialize a hash of hashes
my %HoH = (
 item1 => {color => 'blue', height => 123},
 item2 => {color => 'yellow', height => 43}
);
 
# autovivification
$HoH{item3}{color} = 'red'; 
$HoH{item3}{height} = 100;
defined $HoH{item4}{color} || print "color not found\n";
 
use Data::Dumper;
print Dumper \%HoH;
First we populate a hash of hashes with a few entries. A hash of hashes is a hash whose values are references to other hashes. To get references to other hashes, the {} hash constructor was used.

Now let’s pay a bit of attention to this code.

The first assignment statement:

$HoH{item3}{color} = 'red'; 

adds an entry to our hash of hashes. Because $HoH{item3} doesn’t exist it will be created with an appropriate value, so you don’t need to create yourself the inner hash ($HoH{item3} = {}). This process is called autovivification and it is very useful when you have to deal with this kind of assignments. The expression can be arbitrary complicated and Perl will create all the structures it needs to make the assignment.

But if you look at the following statement:

defined $HoH{item4}{color} || print "color not found\n";

first it will be evaluated the Perl defined $HoH{item4}{color} expression and because the result is false the print function will be executed. But in the process of evaluation Perl needs to create the item4 key which will remain as a key entry in the %HoH hash of hashes. This time the process of autovivification enlarged our %HoH structure with an unnecessary entry.

Please note the using of || short-circuit operator that evaluates the second operand only if the first operand is evaluated false.

To see what is happening, I printed the hash using the Data::Dumper module. The output of this script is as follows:

color not found
$VAR1 = {
          'item3' => {
                       'color' => 'red',
                       'height' => '100'
                     },
          'item1' => {
                       'color' => 'blue',
                       'height' => 123
                     },
          'item2' => {
                       'color' => 'yellow',
                       'height' => 43
                     },
          'item4' => {}
        };
 
As you can see the item4 key has associated as value an empty hash reference.

As I mentioned at the beginning of this script, to avoid autovivification in this last case, you can use the Perl exists function:

if(exists $HoH{item4} && defined $HoH{item4}{color}) {
 print "color found\n"; 
};

Here we use the && short-circuit operator, first we test if $HoH{item4} exists and only afterwards we check if the $HoH{item4}{color} expression is defined.

Please note the using of && short-circuit operator that evaluates the second operand only if the first operand is evaluated true.