Perl substr Function
The substr function is used to extract and return a substring from a string.
The substr function is one of the most important string functions in the Perl language and is meant to retrieve sub-strings of a given string. But this function is a bit complicated and it does much more than I pointed out above.
You can use it for manipulate strings, either you use it alone or in context with other string functions, like index or length.
A lot of strings manipulation can be done using the power of regular expressions but in many cases, the built-in string functions are straightforward and take less time to execute.
The syntax forms of this function are as follows:
substr EXPR, OFFSET, LENGTH
substr EXPR, OFFSET
- EXPR is a string expression from which the substring will be extracted
- OFFSET is an index from where the substring to be extracted starts
- LENGTH is the length of the substring to extract
- REPLACEMENT is a string that will replace the substring
Like in the case with other functions, you can use the parentheses or not, do it as you wish.
As you can see above, some arguments are mandatory and others are optional.
You must mention at least the string expression (EXPR) and the position (OFFSET) from where the substring to be extracted starts.
Before reviewing the Perl substr function parameters, I want to remind you that in Perl the first character of a string has the index 0, the second 1, and so on.
Actually, you can modify this by setting the special variable $[ with whatever you want, but be careful however if you decide to change it. For strings $[ is the index of the first character of the string and by default is set to 0.
And now let’s go back to our parameters.
OFFSET could be:
- positive – the substring starts that far from the beginning of the string
- negative – the substring starts that far from the end of the string
- 0 - that means that the substring starts at the first character of the string
LENGTH could be:
- omitted – the function will return all the characters beginning with the OFFSET position up to the end character of the string
- positive – the function will return from the string maximum LENGTH characters beginning with the OFFSET position
- negative – it will return the substring starting with the OFFSET position but without that many characters off the end of the string
- 0 – in this case the returned substring will be empty, no error warning
You can use the substr function to extract a substring starting from an index and having a given length. See the following snippet code:
Please note that $names variable value didn’t change after using the substr function.
You can use the substr function either in various comparisons or like a lvalue such as an assignment. In this last case, the value of the initial string will be modified.
See the next block of code for this:
The following example shows you how to replace a substring (if exists) with a different substring in a string, using the index, length and substr built-in functions.
See the following code snippet:
- the index function returns -1 if the substring is not found or the position of the first occurrence of the substring, otherwise
- the length function returns the number of characters/bytes of an expression
- the Perl substr function is used like a lvalue and has the syntax:
o EXPR is a string expression from which the substring will be extracted
o OFFSET is an index from where the substring to be extracted starts
o LENGTH is the length of the substring to extract
o REPLACEMENT is a string that will replace the substring
Another approach is to use the s/// substitution operator of the regular expressions. You can rewrite the above example as follows:
A flat file database consists of a number of records delimited by a separator, which in most cases is the newline ("\n") character. In this case we say that each record is specified on a single line. Each record consists by one or more fields, either of fixed width or delimited by some special character like whitespace or comma.
For instance, let’s suppose that each record of the file customers.txt includes the fields: Name, Phone and ZipCode and the entire file has three records only, like in the next figure:
Name |
Phone |
ZipCode |
John Abbot |
872-321-1212 |
55416 |
Clark Eliot |
205-321-1200 |
20037 |
Johnny Randolph |
345-767-3476 |
33702 |
Fixed-width columns
First, we’ll examine the case when the fields have fixed width: Name – 20, Phone – 12 and ZipCode – 5. If we’ll print the file, we’ll get something like this:
John Abbot 872-321-121255416
Clark Eliot 205-321-120020037
Johnny Randolph 345-767-347633702
The following block of code reads the file line by line using the while loop:
Clark Eliot,2053211200,20037
Johnny Randolph,34576734763,33702
The next example will illustrate the case when the fields are delimited by a character separator like comma. In this case the content of our file will be:
Clark Eliot,205-321-1200,20037
Johnny Randolph,345-767-34763,33702
See the next sample code to see how you could implement it:
You can use the Perl sprintf function to pad left and right with blanks or zeroes. If you need to pad with a character other than blank or zero, you can use the substr and length functions.
Have a look at the following code snippet:
Here $padLen is the length to which you wish to pad the string, $text contains the string to be padded and $padChar contains the padding character.
The substr function is used here like a lvalue, modifying the $text directly. The x operator is used to repeat the padding character as many positions are available. This method doesn’t truncate $text.
The Perl substr function can be used together with other functions as pack and unpack to make common conversions between number representations. Here the Perl substr function is used to left pad a character string with zeros.
This approach will show two examples about how to convert from hexadecimal / binary format into decimal.
The first example is about the conversion from hexadecimal to decimal. See the following code:
The next example shows you how to convert from binary to decimal. This example is suitable for larger strings of bit characters:
The following example shows you how to use the Perl substr function to split a string into an array of strings, where each string of the array has a given length.
See the following code snippet:
Finally, the array is printed, the elements of the array being separated by space.
If you’ll run this code, you’ll get the following output:
Because of @array, we are in a list context so the value stored in $1 will be append as a string to @array. The . (dot) matches any single character and the notation .{1,10} means . (i.e. any character) matches at least once, but no more than 10.
The output is the same as before.
If you need to extract a substring delimited by two other substrings from a string, you can use both the index and substr built-in functions, as you can see in the following example:
- the first index function returns in the $pos variable the position of the $str1 in the $url string.
- the first substr function will cut from $url the $str1 substring
- the second index function returns in the $pos variable the position of the $str2 in the new $url string
- the second substr function will cut from $url the portion beginning with the $pos until the end of the $url string
- finally, the $url is printed
To make the index function to search case insensitive, the lc function is used.
The string functions are often faster than regular expressions, because they have not metacharacters to worry about and they don’t set any of the memory variables.
But you can use sometimes together the regular expressions with the string function in order to provide some additional functionality to your code.
The following example shows you a way to use the Perl substr function with the =~ binding operator:
This code will have as effect the replacing of the text or with the text and wherever possible within just the last 17 characters of the $str string.
The output is as follows: