Perl unpack Function

The unpack function expands a binary string into a list of values by using a template format.

The syntax form of the unpack function is as follows:

LIST = unpack TEMPLATE, EXPR
 
The TEMPLATE consists of a sequence of characters as shown in the table below. One or more modifiers may follow some letters in the template (for instance, each letter may optionally be followed by a number giving a repeat count; or a * for the repeat count means to use however many items are left).

The EXPR is a string expression representing a structure to be expanded and returned into a list. If EXPR is omitted, the unpack function is used against the $_ special variable.

Here are the most frequent template characters for pack and unpack: a, A, b, B, c, C, d, f, h, H, i, I, l, L, n, N, s, S, U, v, V, x, X.

The following example shows you how to deal with the Perl unpack function and the 'a' template. Used with the Perl pack function, this template let you pad a binary string with nulls. Used with the unpack function, the 'a' template returns the full field as it is.

See the following code:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
my $str = pack 'a8', '32ab';     # "32ab\0\0\0\0"
 
foreach(unpack("(a1)*", $str)) {
    print sprintf("%x", ord), " ";
}
print "\n";
# it prints: 33 32 61 62 0 0 0 0
 
$str = pack 'A8', '32ab';     # "32ab    "
foreach(unpack("(a1)*", $str)) {
    print sprintf("%x", ord), " ";
}
print "\n"
# it prints: 33 32 61 62 20 20 20 20
In the first example, the 'a' template was used to pad the string with nulls and in the second example the 'A' template was used to pack the string with spaces. In both cases we used the unpack function with the 'a' template. As you can notice, we get back the value of the string as it was packed by the pack function.

A few words about the Perl pack function from the first example. The $str is the string where the result will be returned, 'a8' is the template and '32ab' is the string to be converted. The 8 digit in the template is a modifier and it means that it will be appended so many null bytes until the resulting string will have 8 characters length.

The template for the Perl unpack function uses parentheses to group things (like the regular expressions do) and the group is followed by a * repeat count which means to use how many items are left.

The sprintf function allows you to convert in hexadecimal the elements of the list returned by unpack. This function was used to see the hexadecimal values of the characters. The ord function was used without arguments and this means that it has as argument the default $_ special variable (the $_ special variable was used as the default iterator for the foreach statement – at each iteration step the current element of the list returned by unpack is assigned in turn to $_).

The 'A' template is similar with the 'a' template, except that space is used instead of null. Used with the pack function, this template let you pad an ASCII string with spaces. If you use it with the Perl unpack function, the 'A' template replaces trailing spaces with nulls.

To illustrate this, I’ll rewrite the example supplied for the 'a' template:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
# the string is padded with spaces
my $str = pack 'A8', '32ab';     # "32ab    "
 
foreach(unpack("(A1)*", $str)) {
    print sprintf("%x", ord), " ";
}
print "\n";
# it prints: 33 32 61 62 0 0 0 0
 
# the string is padded with nulls
$str = pack 'a8', '32ab';     # "32ab\0\0\0\0"
foreach(unpack("(A1)*", $str)) {
    print sprintf("%x", ord), " ";
}
print "\n"
# it prints: 33 32 61 62 0 0 0 0
As you can see, if you use Perl unpack with the 'A' template, the trailing spaces will be replaced by nulls. The trailing nulls will remain unchanged.

The 'b' and 'B' formats of the Perl pack function packs strings consisting of 0 and 1 characters to bytes. The Perl unpack function get back the list of 0’s and 1’s from the bit string.

A byte consists of a group of 8 bits as in the following figure:

    1 0 1 1 0 0 1 0

   MSB           LSB

LSB means here the least significant bit and it is sometimes referred as the rightmost bit. MSB is the most significant bit and is sometimes referred as the leftmost bit. In the above example, MSB = 1 and LSB = 0.

For an example, let’s say you want to represent the decimal number 178 into a string of bits. You can write this number either as '10110010' starting with MSB or as '01001101' starting with LSB. For these representations, Perl has respectively two templates: 'B' and  'b'.

The following example shows you how to pack the above number using the two templates:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
# starting with MSB
my $nr = ord pack ('B8', '10110010');
print "$nr\n";
# it prints 178 (128 + 32 + 16 + 2)
 
# starting with LSB
$nr = ord pack ('b8', '01001101');
print "$nr\n";
# it prints 178 (2 + 16 + 32 + 128)
In this representation, the count refers to the number of bits to be packed - in the above example the count is 8.

You can use the pack function with the 'b*' format to translate a string of 0’s and 1’s into a bit string, and the Perl unpack function to get back the list of 0’s and 1’s from the bit string (the '*' is like a wildcard for more of the same). It is important to use the same format ('b'  or 'B' ) for both pack and unpack functions (i.e. if you packed a number with the 'b' template, you must use the same template to unpack the bits string).

Here’s an example for the 'b' format:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
my @bitArray = qw(1 0 0 0 1 1 1 1 0 0 1 1);
my $bitString = pack 'b*', join('', @bitArray);
 
@bitArray = split(//, unpack('b*', $bitString));
print "@bitArray\n";
# it prints:      1 0 0 0 1 1 1 1 0 0 1 1 0 0 0 0
The join function converts the @bitArray into a bit string. Please note that our initial array of bits had 12 elements only, so the pack function initialized the last 4 bits of the $bitString with 0.

You can rewrite the previous example by using both pack and Perl unpack functions with the 'B*' format:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
my @bitArray = qw(1 0 0 0 1 1 1 1 0 0 1 1);
my $bitString = pack 'B*', join('', @bitArray);
@bitArray = split(//, unpack('B*', $bitString));
print "@bitArray\n";
# it prints:      1 0 0 0 1 1 1 1 0 0 1 1 0 0 0 0

The 'c' template format is for a signed char (8-bit) value and the 'C' template format is for an unsigned char (octet) value.

Here're a few examples for the 'c' template:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
my $str = pack 'cccc', 97, -98, 99, -100;
my @ints = unpack 'cccc', $str;
# or my @ints = unpack 'c4', $str;
# or my @ints = unpack 'c' x 4, $str;
# or my @ints = unpack 'c*', $str;
print "@ints\n";
# it prints 97 -98 99 -100

If you use the 'c*' template you don’t need to count the elements of the list argument.

You can use the 'C' template in a similar way:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
my @ints = unpack 'C*', 'white';
print "@ints\n";
# it prints 119 104 105 116 101
Here 119, 104, … are the numeric values of the ASCII 'w', 'h', … characters.

You can use the Perl unpack function to compute checksums, by preceding the 'c' or 'C' specifier with a percent sign and a number indicating how many bits of checksum are desired.

See the following code for an example:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
my $str = pack 'c*', 10, 20, 30;
my $int = unpack '%3c*', $str;
print "$int\n";
# it prints 4
The extracted items are check-summed together into a single item. In our example 10 + 20 + 30 = 60 and converted in binary it gives 111100. This example is run on a Windows machine so we take the last three bits 100 that in decimal is 4.

Notice that if unpack is used to unpack a single item as in the above example, you can store the item either in an array variable or in a scalar variable. If you use an array variable to store the result, the resulting array consists of a single element.

The 'd' format is for 64 bit floating point in native machine format. The real numbers – floats and doubles – are in the native machine format and that means that a packed float or double number written on a machine may not be readable on another.

See an example here:

#!/usr/bin/perl
 
use warnings;
use strict;
 
print unpack 'd', pack ('d', -573.56782345612);
print "\n";
 
# it prints -573.56782345612
Because Perl uses doubles internally for numeric calculation, by using pack and unpack, generally you’ll retrieve the number you packed without losing precision.

The 'f' format is for 32 bit floating point in a native machine format. Because of the variety of floating formats around, it’s possible that floating point data written on one machine may not be readable on another – as in the case that the two machines have different endianness.

You can use pack to pack the floating point numbers and Perl unpack to get them back.

You can use this format like in the following lines of code:

#!/usr/bin/perl
 
use warnings;
use strict;
 
my $str = pack 'f', 123.13421;
my $float = unpack 'f', $str;
print "$float\n";
 
# it prints: 123.134208679199
Because Perl uses doubles internally for numeric calculation, by converting from double into float and thence back to double again will result in a lose of precision. You can see in the above example that the number retrieved by unpack differs slightly from the number packed with pack.

If you have more single-precision float numbers to pack, you can use the '*' repeat pack-format that will pack all the available float numbers from the list:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
my @floatArray = (23.13421, 112.78, 77.896);
@floatArray = unpack ('f*', pack('f*', @floatArray));
print "@floatArray\n";
# it displays: 23.1342105865479 112.779998779297 77.8960037231445
Here the pack function will return a string with 3 single-precision float numbers packed into the specific native machine format. The Perl unpack function will unpack the 3 numbers from the packed resulting string into an array.

Finally, the array with the result will be printed.

The 'h' template format is for packing a hex string by putting the low nibble first while the 'H' template format is for packing a hex string by putting the high nibble first (a nibble contains 4 bits and it is known as a half an octet - it corresponds to single hexadecimal digit).

If you want to get back the unaltered value of the string, you can use the Perl unpack function but with the same template format. If you pack a string with the 'H' template format and if you use unpack with 'h' format, you’ll get the bytes in the same order but with their nibbles reversed, as you can notice in the next snippet:

#!/usr/bin/perl
 
use warnings;
use strict;
 
my $str = pack'H*','6162636465';
print unpack ('H*', $str), "\n";  # it prints: 6162636465
print unpack ('h*', $str), "\n";  # it prints: 1626364656
Here I put a * character inside the template, to avoid counting the hex characters of the string argument.

The 'i' template format is used for signed integers, while the 'I' template format is for unsigned integers. You can use the Perl pack function to convert one or more integers into a string and the unpack function to get back the list with the packed integers.

Bear in mind that the 'i' and 'I' formats are machine dependent, so if you pack a list of integers or unsigned integers into a string and then unpack the string to another machine, it’s possible to get back a list of weird things.

Now let’s see a short example about how to use these templates:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
my @integerArray = (150, 160, 170, 180, 190);
@integerArray = unpack 'i*', pack('i*', @integerArray);
 
print "@integerArray\n";
 
# it displays: 150 160 170 180 190
If you have many integers to pack/unpack, you can use the '*' repeat pack/unpack format that will pack/unpack all the integers available in the list.

Here the pack function will return a string with 5 integers packed into the specific integer format to your machine.

The Perl unpack function will unpack the 5 integers from the pack resulting string into an array.

Finally, the array with the result will be printed. As you can notice, the content is equal with the content of the initial array.

If you need to deal with unsigned integers, the usage is similar.

The 'l' format generates a signed long format, while 'L' format is for unsigned long formats. Generally these formats generate a four-byte number.

It depends if the machine is little- or big-endian (the endianness refers to the way a number is stored in memory – for example on Windows a hex word like 0x1234 is stored in memory as 0x34 0x12  - the little end is stored first).

See the following lines of code for a short example:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
my $str = pack 'l', 0x61626364;
print "$str\n";
# it prints dcba (on Intel)
 
print unpack 'l', $str;
# it prints 1633837924
This Perl pack function returns a four-byte consisting of either dcba if the machine is little-endian (as Windows NT) or abcd if the machine is big-endian.

Here 61,62,63,64 are the ASCII values for the a,b,c,d characters.

The Perl unpack function returns a list consisting of one element only: the numeric value of the 0x61626364 hex value.

The print function will print the long number returned by unpack.

For unsigned long values the usage is similar, if you want you can have a look at the following snippet:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
my $str = pack 'L*', 0x61626364, 0x65666768, 0x69707172;
my @longsArray = unpack 'L*', $str;
print "@longsArray\n";
# it prints 1633837924 1701209960 1768976754
 
my $lastLong = unpack '@* X4 L', $str;
print "$lastLong\n";
# it prints 1768976754
The 'L*' format means that we’ll pack as many unsigned long values as are available and the result will be stored in the $str variable. Next the unpack function is used to get back the values and store them in @longsArray. After that the print function is used to see the values.

If you want to get the last unsigned long value packed in @longsArray, you can use the '@* X4 L' template format for the unpack function. In this template we used the following specifiers:

  • @* - to skip to the end value stored in $str  
  • X4 – to back up four bytes
  • L – to unpack the last four bytes as a long unsigned integer

Please note that when using 'l' and 'L' template formats the pack and Perl unpack functions could not behave in the same way on different machines (depending on endianess).

The 'n' format is for an unsigned short in a network byte order (big-endian) and the 'N' is for an unsigned long in a network byte order. These formats are specific to TCP/IP communications and you need to use them if you do certain types of TCP/IP communication.

The endianness refers to the way a number is stored in memory – for example on Windows a hex word like 0x1234 is stored in memory as 0x34 0x12  - the little end is stored first. For big-endian machines 0x1234 is stored in memory as 0x12 0x34 (the first byte is the most significant). Many network protocols may be regarded as big-endian because the most significant byte is sent first.

The following code snippet shows two simple examples about how you can use these templates in your scripts:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
my $buff = pack 'n', 1234, 235;
my @array = unpack 'n*', $buff;
 
print "@array\n";
# it prints: 1234
 
$buff = pack 'N*', 45320..45325;
@array = unpack 'N*', $buff;
 
print "@array\n";
# it displays: 45320 45321 45322 45323 45324 45325
In the first example, because we didn’t provide any qualifier inside the template, the pack function will pack just the first number and it will return it in the $buff variable. The second number (235) from the list will be lost.

In the second example, the '*' repeat pack-format was used so you don’t need to provide the count of the numbers you intend to pack. The Perl unpack function was used to extract the numbers from the packed $buff string and populate an array with them.

If your protocol needs to send a message by prepended it with the message length, you can do something similar as shown in the following example:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
my $msg = "The message text";
my $buff = pack ('NA*', length($msg), $msg);
 
my $length;
($length, $msg) = unpack 'NA*', $buff;
 
print "length=$length, Message=\"$msg\"\n";
 
# it prints: length=16, Message="The message text"
In the $msg variable the message to send was stored. The $buff variable is used to pack the length of the message followed by the message itself.

To get back the message and its length, the unpack function is used. Both pack and unpack functions were used with the 'NA*' template format where:

  • N means an unsigned long in a network byte order
  • A* is used to pack/unpack the message

The formats are for signed or unsigned short numbers. If you transfer data across the network or onto a disk of another computer, you must consider the endianness of your computers, because the integers and the floating-point numbers could be stored in memory in different orders. So you must take this into considerations when you use the 's' or 'S' formats.

The 's' format is for a signed (16-bit) value, while the 'S' is for an unsigned short value.  A short example about how to use it:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
my $i16 = pack 's*', 21, -77, 100, -256;
my @array = unpack 's*', $i16;
print "@array\n";
# it prints: 21 -77 100 -256
 
$i16 = pack 'S*', 21, 77, 100, 256;
@array = unpack 'S*', $i16;
print "@array\n";
# it prints: 21 77 100 256
In this example the 's' and 'S' formats are associated with the '*' specifier that allows you to use the pack/unpack functions to pack/unpack as many short integers as it needed.

You can determine the endianness of your system by using the 's' format, as you can see in the example below:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
my $v = unpack("h*", pack("s", 1));
if($v =~ /^1/) {
  print "Little endian system\n";
} elsif ($v =~ /01/) {
  print "Big endian system\n";
} else {
  print "Unknown endian format\n";
}
print "$v\n";  
# on my Windows system it displays: 1000
On my local Windows computer (which is little endian), after running this code I received the message: 'Little endian system'. The Perl unpack function was used to unpack the packed number in a hex format.

The 'U' template format of the pack function allows you to pack a Unicode number into its UTF-8 representation. The Unicode character sets associate characters with integers and the converting of the Unicode characters to UTF-8 format let you store only the bytes that are needed.

The most common cases are that when the Unicode characters are encoded in one or two bytes only. For instance, the next example converts into UTF-8 the smile face Unicode character:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
my $utfSmiley = pack 'U', 0x263A;
print "length of \$utfSmiley = ", length($utfSmiley),
      ", length of 0x263A = ", length(0x263A), "\n";
# it prints: length of $utfSmiley = 1, length of 0x263A = 4
 
my $uniSmiley = unpack 'U', $utfSmiley;
printf "%x\n", $uniSmiley;
# it prints 263a
You can notice the difference of the two item lengths in the memory. To get back the information in a Unicode format, the Perl unpack function was used.

Because of the endianness of a system, the integers and floating-point numbers are stored in a different order, so if you move binary data across the network, you could expect to meet some format issues.

A way to avoid this is by using 'U', the Unicode character number. You can use the pack function to pack a sequence of characters encoded as characters in UTF-8 format on a computer and use the Perl unpack function on another. See the following example where we use the pack function to pack a few integers into an UTF-8 format:

my @integers = (1234, 23, 456, 789);
my $utfIntegers = pack 'U*', @integers;
 
@integers = unpack 'U*', $utfIntegers;
print "@integers\n";
# it displays: 1234 23 456 789
You can use the 'U' format to encode the Unicode characters of an alphabet. For instance, the Unicode Hebrew alphabet ranges from 0x0590 to 0x05ff. The following example shows you how to pack and unpack the Hebrew Unicode alphabet:
 
my $utfHebr = pack 'U*', 0x0590..0x05ff;
my @UniHebr = unpack 'U*', $utfHebr;
 

The 'v' format is for 16-bit unsigned short numbers in "VAX" (little-endian) order. It is similar with the 'n' format that refers to "network" (big-endian) order. When you need to pack some unsigned short numbers in a little endian format, you should use this format.

The 'V' template format is for unsigned long (32 bit) numbers in "VAX" (little-endian) order. It is similar with the 'N' format that refers to "network" (big-endian) order.

To get back the numbers, you can use the Perl unpack function.

The following example shows you how to use both these templates:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
my $str = pack 'vV', 25, 3167;
my @array = unpack 'vV', $str;
print "@array\n";
# it prints: 25 3167
The pack function is used to pack little-endian 16- and 32-bit unsigned integers. To get back the numbers, the unpack function is used.

You use the unpack function with the 'x' format to skip forward a byte and with the 'X' format to skip backward a byte. Usually there are more bytes to skip so you need to use these templates with a count specifier.

The following snippet shows you a sample about how you can use them in your scripts:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
my $str = "These flowers are beautiful";
my ($a, $b) = unpack 'x6 A7 @* X9 A*', $str;
 
print "$a $b\n";
# it prints: flowers beautiful
In the above example, the Perl unpack function uses the following formats:
 
  • 'x6' to skip 6 bytes
  • 'A7' to grab the next 7 bytes and store them in $a
  • @* - to skip to the end value stored in $str
  • 'X9' to go backward 9 bytes
  • 'A*' to grab as many bytes are available and store them in $b

To reverse the bits in each character of a string, you can use the split function to turn the string into an array of characters. Then you can use a foreach loop to iterate through this array.

Inside the foreach loop, for each character from the array, calling the unpack and pack functions do the job.

For more details, see the code:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
my $str = "abcdefghi";
 
foreach my $ch (split //,$str ) {
  # reverse bits in each character
  $ch = pack "b*", unpack "B*", $ch;
}

If you need to perform some action on the ASCII values of a string characters, you can use the Perl unpack function with the 'C*' template. The 'C' format allows you to convert a character in an ASCII value and the '*' specifier allows you to process as many characters are available in the string. If you want to process the characters one by one, you can include the processing in a while loop.

You have a short sample here:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
my $str = "This is a script";
 
my @array = unpack 'C*', $str;
 
my $sum = 0;
while (@array) {
  my $val = shift @array;
  $sum += $val;
  printf ('%x ', $val);
}
print"\n";
# it prints: 54 68 69 73 20 69 73 20 61 20 73 63 72 69 70 74
print "\$sum = $sum\n";
# it prints: $sum = 1482
The ASCII values of the string characters were printed in hexadecimal using the '%x' format of the printf function. The shift function returns and removes the first available character in @array. The while loop will stop when @array has no more elements.

In the above example, the $sum scalar variable was used to compute the 32-bit checksum. You can get this checksum much faster by using the following line of code:

$sum = unpack '%32C*', $str;

This paragraph will show you a few examples about how you can play with the Perl unpack function to read fixed-length records, either from text files or binary files. Our mini application works with meteorological data: precipitation, pressure, temperature, humidity, wind speed and wind direction.

I created a sample text file 'Meteo.txt' having the following structure and content:

 

Fields

Prec

Press

Temp

Humid

Wind Speed

Wind Dir

Bytes

5

7

6

6

7

7

 

Cont.

0.00

750.8

11.3

54.1

112.7

237.0

0.03

750.7

11.4

52.7

2.8

238.0

0.07

750.4

8.2

63.7

6.7

232.0

0.01

747.8

6.1

76.1

5.8

95.4

 

The content of the file is as following:

 

 0.00750.80 11.3054.10112.70237.00

 0.03750.70 11.4052.70  2.80238.00

 0.07750.40  8.2063.70  6.70232.00

 0.01747.80  6.1076.10  5.80 95.40

 

Each field can be preceded from one or more spaces if its length is less than the maximum length permitted (for example 0.00 is preceded by one space because the length of the precipitation field is 5). Our sample file has 4 records only.

 

 How to use unpack to read and print a text file 

 

The following code snippet shows you how to read the above text file in order to have access at each field of the record. To simplify the things, let’s assume that we not have to deal with Unicode characters.

Please have a look at this snippet:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
open my $txtFile, 'Meteo.txt' or die "Couldn't open file: $!\n";
my @fields;
my $template = 'A5 A6 A6 A5 A6 A6';
 
print "\n Prec   Press   Temp Humidity Wind Speed Wind Dir\n"; 
print " ----   -----   ---- -------- ---------- --------\n"; 
 
while (<$txtFile>) {
  chomp;
  @fields = unpack $template, $_;
  my ($prec, $press, $temp, $humidity,
      $windSpeed, $windDir) = @fields;
 
  # convert the scalar strings to scalar numbers
  $prec += 0; $press += 0; $temp +=0;
  $humidity += 0; $windSpeed += 0; $windDir += 0;
 
  my $str = sprintf '%5.2f%8.2f%7.2f%9.2f%11.2f%9.2f',
               $prec, $press, $temp, $humidity,
               $windSpeed, $windDir;
  print "$str\n";
}
The output is as follows:
 
 Prec   Press   Temp Humidity Wind Speed Wind Dir
 ----   -----   ---- -------- ---------- --------
 0.00  750.80  11.30    54.10     112.70   237.00
 0.03  750.70  11.40    52.70       2.80   238.00
 0.07  750.40   8.20    63.70       6.70   232.00
 0.01  747.80   6.10    76.10       5.80    95.40
 
The built-in Perl unpack function of the Perl language is especially designed to parse structured data. Let’s take a look at the above example and see how it works.

Generally speaking, to read a text file whose records have a fixed format, you need:

  • $txtFile – a file handle
  • $recordSize– the length of the record in bytes
  • $template – the unpack template for the record
  • @fields – an array having the record fields as elements

In our example we need to read a text file and we don’t need the length of the record because the file records are delimited by newlines.

After opening the file and initializing the $template variable, all the work is done within a while loop. Regarding the unpack template, the 'A' template returns the full field as it is and it replaces trailing spaces with nulls. For example 'A5' is the template format for the first field of the record which is Prec. The number 5 means the length of the field.

As I mentioned before, the records of the file are read inside a while loop. At each iteration step:

  • the current line of the file is assigned to the $_ special variable
  • the chomp function is called to remove the trailing newline from $_
  • the Perl unpack function uses the $template to decode the fields of record and store them in the @fields array
  • the elements of the @fields array are stored in individual variables
  • each individual variable ($prec, …) is converted to a scalar number by adding 0 to it
  • the sprintf function is used to format the field values into the $str variable; for example, for the $prec variable the %5.2f format was used (%f means a floating-point number in fixed decimal notation, 5 is the maximum width and 2 specifies how many places right of the decimal point to show)
  • to show the $str variable, the print function was used

 

 How to use unpack to create a binary file from a text file 

 

First, let’s start by showing you the code:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
open my $txtFile, 'Meteo.txt' or die "Couldn't open file: $!\n";
open my $binaryFile, '>Meteo.bin' or die "Couldn't open file: $!\n";
binmode $binaryFile;
 
my @fields;
my ($txtTemplate, $binTemplate) = ('A5 A6 A6 A5 A6 A6', 'd*');
 
while (<$txtFile>) {
  chomp;
  @fields = unpack $txtTemplate, $_;
 
  # convert the scalar strings to scalar numbers
  $_ += 0 foreach (@fields);
 
  # write the record
  print $binaryFile pack($binTemplate, @fields);
}
 
close $binaryFile or die "error closing file: $!\n";
We have as input the Meteo.txttext file and as output the Meteo.bin binary file. To declare the Meteo.bin file as binary, the binmode function was used (not all OSes require the using of binmode but it’s safe to use it).

The text file is open in read mode and the binary file is open in write mode (using the '>'operand. The unpack function is used to expand the field values read from the text file into an array and the pack function to write these values in the binary file. Both unpack and pack function are used with specific template formats:

  • $txtTemplate - used by unpack and having as template 'A5 A6 A6 A5 A6 A6'
  • $binTemplate – used by pack  and having as template 'd*' (the 'd' character specifies a double-precision float in the native format and '*' means the repeat pack-format that will pack all the available numbers from a list; we used the 'd' format in order not to lose precision – Perl uses double internally for all numeric calculation and generally unpack("f",pack("f",$str)) is not equal with $str)

Everything it happens inside a while loop. At each iteration step:

  • a line is read from the text file and its content is stored in $_
  • the chomp function is used to delete the trailing newline from $_
  • the Perl unpack function expands the field values from $_ using the $txtTemplate template format and stores them in the @fields array
  • the foreach loop is used to convert the elements of the @fields array into scalar numbers
  • the pack function uses the $binTemplate template format to converts the elements of the @fields array into a binary string;
  • the string returned by pack is written in the binary file using the print function

Finally, the binary file is closed.

The following example shows you how to read this binary file by using the Perl unpack function with the same specific template format used at the creating of the file. 

 

How to use unpack to read and print a binary file 

 

Let’s begin by showing the script code first:

#!/usr/local/bin/perl
 
use strict;
use warnings;
 
open my $binaryFile, 'Meteo.bin' or die "Couldn't open file: $!\n";
binmode $binaryFile;
 
my @fields = ();
my $record;
my $binTemplate = 'd*';
my $recordSize = length pack 'd6',();
 
print "\n Prec   Press   Temp Humidity Wind Speed Wind Dir\n"; 
print " ----   -----   ---- -------- ---------- --------\n"; 
 
while ( my $bytesRead = read( $binaryFile, $record, $recordSize ) ) { 
  last if($bytesRead != $recordSize);
  @fields = unpack $binTemplate, $record;
  my $prec = $fields[0];       my $press = $fields[1]; 
  my $temp = $fields[2];       my $humidity = $fields[3]; 
  my $windSpeed = $fields[4];  my $windDir = $fields[5];   
 
  my $str = sprintf '%5.2f%8.2f%7.2f%9.2f%11.2f%9.2f',
               $prec, $press, $temp, $humidity,
               $windSpeed, $windDir;
  print "$str\n"; 
}
 
close $binaryFile or die "error closing file: $!\n";
The output of this script is as follows:
 
 Prec   Press   Temp Humidity Wind Speed Wind Dir
 ----   -----   ---- -------- ---------- --------
 0.00  750.80  11.30    54.10     112.70   237.00
 0.03  750.70  11.40    52.70       2.80   238.00
 0.07  750.40   8.20    63.70       6.70   232.00
 0.01  747.80   6.10    76.10       5.80    95.40
 
To read a binary file whose records have a fixed format, we use a while loop, a read function to get the data and the Perl unpack function to retrieve the values. To accomplish this, we need a few variables:
 
  • $binaryFile – a file handle
  • $record – a buffer to place the data read from the file
  • $recordSize – the length of the record in bytes
  • $binTemplate – the unpack template for the record
  • @fields – an array having the record fields as elements

The binmode function tells Perl that we have to deal with a file in binary format.

Again we use the unpack function to decode the records. The template format is the same we used when we created the file, i.e. 'd*' (see the previous example).

We know the binary layout of a record so to get the size of the record for this template, we need to call the pack function with an empty list:

my $recordSize = length pack 'd6',();
The 'd6'template specifies that we have to pack six double-precision float numbers. The length function returns the length of the string returned by pack and stores this value in the $recordSize variable. On my windows machine the length of the record is 48 (6 * 8).

The file is read by using the read function within a while loop. On success the read function returns the number of bytes read and it is safer to check if you got back the number of bytes you asked for. This is done by using the last operator which allows us to leave the loop if the number of bytes read is less than we expected.

The data read from the file is placed in the $record variable. The Perl unpack function decodes the values from the $record variable using the template format given by the $binTemplate variable and stores these values in the @fields array.

From this array, the values are stored in individual variables and next these values are formatted for printing using the sprintf function.

Finally, the binary file is closed.