July 5, 2019
By: Jeff Terrell

Using Perl on the Command Line

  1. Foreword
  2. Abstract
  3. Audience
  4. The -e switch: inline scripting
  5. The -n switch: implicit loops
  6. The -l switch: automatic newline handling
  7. The -p switch: implicit looping and printing
  8. BEGIN/END Blocks
  9. The -a switch: automatic field splitting
  10. "Chunk" Processing: changing the input record separator
  11. Conclusion

Foreword

I wrote this article circa 2006 on a personal website that is now defunct. I still use Perl on the command line frequently, so despite the fact that Perl (5) is passé these days, I think this content is worthwhile enough to republish.

Abstract

Perl has become a very popular scripting and text-processing language, yet relatively few programmers and command-line geeks are aware of the usefulness of Perl from within an interactive, command-line interface. In this tutorial, I introduce several powerful command-line switches, including -e, -n, -p, -l, -a, and -F. I also demonstrate BEGIN{} and END{} blocks. I illustrate all new concepts with a variety of examples.

Audience

You are expected to be familiar with Perl. I will explain some of the more esoteric language features used below, but you should already be pretty savvy with the Perl language. If you want to learn Perl, I recommend O'Reilly's excellent books, Learning Perl and Programming Perl. Also, most of the information in these books is available from the perl manpages—and, furthermore, it's almost as digestible as the books are. Start with perl(1) and go from there.

The -e switch: inline scripting

Perl's -e switch enables inline or literal scripting. With -e, the script which perl executes is actually on the command-line itself, rather than being contained in a file. So we can say things like this:

$ perl -e 'print 1+1 . "\n"'

(Note: the initial $ character signifies the shell prompt, and is not part of the command. This is true for all the examples in this tutorial.)

Of course, if your perl script is that simple, you can just use bash operators (assuming you're using bash or zsh—and if you're not, why not?):

$ echo $((1+1))

Then again, perl is a little more useful for simple arithmetic operations, because bash doesn't support floating-point numbers:

$ echo $((22/7))
3
$ perl -e 'print 22/7 . "\n"'
3.14285714285714

Nevertheless, -e by itself is not very useful. It is, however, an essential part of more advanced command-line tricks.

The -n switch: implicit loops

Perl's -n switch implicitly wraps the literal script (i.e. the script specified on the command-line itself) in a while (<>) { } loop. In other words, -n tells perl: "do the following for every line of input".

For example, I have a few shell scripts for a project I worked on recently. Some of these shell scripts call perl:

$ grep perl *.sh
arrange.conns.sh:    perl -pe 's/ [<>?] / /;s/ t \d+ \d+ -?\d+//' |
arrange.conns.sh:    perl -ne '
arrange.conns.sh:    perl -e 'while (<>) {
arrange.conns.sh:  perl -lne '
arrange.conns.sh:    perl -e '
cvec.norm.sh:perl -pe 's/(^DIR1|^DIR2|^CONC|^SEQ)/\n$1/' |
cvec.norm.sh:  perl -ne 'BEGIN{$/="\n\n"} print if /^SEQ|^CONC/' |
cvec.norm.sh:  perl -lne 'BEGIN{$/="\n\n"} @a=(split(/\n/,$_)); print if scalar(@a) > 1'

Let's say we wanted to count the number of characters on each of these lines. First, we get rid of the file name. We could do this with cut -d: -f2-, but we'll simply use GNU grep's -h option instead. Note how -ne scripts often make heavy use of the $_ variable, which contains each line of input.

$ grep -h perl *.sh | perl -ne 'print length() . "\n"'
51
15
26
14
14
46
54
76

The -l switch: automatic newline handling

As it turns out, there's a slightly simpler way to do the above:

$ grep -h perl * | perl -lne 'print length'
50
14
25
13
13
45
53
75

The -l switch silently removes the newline character (or whatever $/ is set to, as we will see below) from the end of each line of input, and silently adds a newline character to the end of each line of output. Note that the answers are all reduced by one because each line is shorter by one character: the newline at the end.

The -p switch: implicit looping and printing

Perl's -p switch is the same thing as the -n switch with an additional shortcut: a 'print' statement is implied, which prints the $_ variable. I find this particularly useful for simply applying a Perl regex style substitution to each line. For example, say I have this (sanitized) output from tcpdump:

$ cat tcpdump.txt
1177350516.293578 64.233.100.100.4663 > 152.2.100.100.25: .
1177350516.293598 152.23.100.100.3398 > 64.81.100.100.80: .
1177350516.293590 208.111.100.100.80 > 152.23.100.100.2706: .

And say I want to modify the local IP address (the one starting with 152.2 or 152.23) so that only the /24 subnet is printed. (This might seem contrived, but I have actually done things like this in my research.)

$ perl -pe 's/ (152\.23?\.\d+)\.\d+\.\d+\b/ $1/' tcpdump.txt
1177350516.293578 64.233.100.100.4663 > 152.2.100: .
1177350516.293598 152.23.100 > 64.81.100.100.80: .
1177350516.293590 208.111.100.100.80 > 152.23.100: .

Or, say I want to separate the port number (the 5th dot-separated field) from the IP address:

$ perl -pe 's/ (\d+\.\d+\.\d+\.\d+)\.(\d+)\b/ $1 $2/g' tcpdump.txt
1177350516.293578 64.233.100.100 4663 > 152.2.100.100 25: .
1177350516.293598 152.23.100.100 3398 > 64.81.100.100 80: .
1177350516.293590 208.111.100.100 80 > 152.23.100.100 2706: .

BEGIN/END Blocks

What if we have some code that we want to execute after (or before) the implicit while loop in a -ne or -pe script? I bet you're not surprised that Perl gives us a way to do exactly this: BEGIN{} and END{} blocks.

The most common use for this feature (at least that I've found) is summing a bunch of numbers in a file. First, let's create such a file:

$ perl -e 'print "$_\n" for (1..1000)' > nums.txt

Now let's sum them up. If you're as clever as Carl Friedrich Gauss was in first grade, you'll know what the answer should be. But for the rest of us, we'll accept some help from Perl:

$ perl -lne '$c += $_; END{ print $c; }' nums.txt
500500

Let's have some more fun. Let's first generate a file with 1000 pseudorandom numbers from 0 to 100:

$ perl -e 'print(rand(100) . "\n") for (1..1000)' > rand.txt

Now let's average all of these numbers. If the pseudorandom number generator does a good job of picking numbers from a uniform distribution, we would expect the average to be near 50, right? (This is statistics stuff; don't worry if you don't understand it.) Well, let's see how well it does:

$ perl -lne '$c += $_; END{ print $c/$.; }' rand.txt
50.1172772862758

Not bad. If you're feeling ambitious, try doing the same thing with more and more samples and see what happens.

The -a switch: automatic field splitting

Perl's -a switch stands for "autosplit". When -a is specified, Perl implicitly splits the input by whitespace, and stores the results in @F, as if you had typed @F=(split); immediately after getting input. This behavior is sort of like the standard cut utility (part of the GNU coreutils with two exceptions. First, cut can only split on single characters, whereas Perl's autosplit switch can split on any regular expression. Second, cut merely prints the columns, whereas Perl can do more complex processing.

For example, let's count the total number of bytes used in a directory:

$ ll
total 80
-rw-r--r--   1 jsterrel  jsterrel   2948 Oct 31 13:37 j1
-rw-r--r--   1 jsterrel  jsterrel   2948 Oct 31 13:35 lorem.txt
-rw-r--r--   1 jsterrel  jsterrel   3893 Oct 31 15:53 nums.txt
-rw-r--r--   1 jsterrel  jsterrel   2180 Oct 31 16:02 perl-cli.txt
-rw-r--r--   1 jsterrel  jsterrel  16909 Oct 31 15:59 rand.txt
-rw-r--r--   1 jsterrel  jsterrel    434 Oct 31 15:41 tcpdump.txt
$ ll | perl -lane '$c += $F[4]; END{ print $c; }'
29312

You know how the ps auxwww command often prints a lot of columns you don't care about? Don't you hate that? Yeah, me too. Let's cut it down to the essentials.

$ ps auxwww | head -n3
USER       PID %CPU %MEM      VSZ    RSS  TT  STAT STARTED      TIME COMMAND
jsterrel   428   4.3  1.4   381096  14288  ??  S    10Oct07  13:10.01 /Applications/Utilities/Terminal.app/Contents/MacOS/Terminal -psn_0_3145729
jsterrel 14973   2.5 27.1  2327152 284428  ??  S    20Oct07 227:02.88 /Applications/Safari.app/Contents/MacOS/Safari -psn_0_16646145
$ ps auxwww | head -n3 | perl -lape '$_ = join(" ", @F[ 1,10..$#F ])'
PID COMMAND
14973 /Applications/Safari.app/Contents/MacOS/Safari -psn_0_16646145
4009 /Applications/Adium.app/Contents/MacOS/Adium -psn_0_11927553

The $#F in the above command is a Perl shortcut meaning "the index of the last element of the @F array". The full expression inside the brackets of @F[ ... ] specifies an array slice: multiple elements of the array, forming an array in themselves. Thus, the larger @F expression means "the array consisting of the second element of @F as well as all fields from the 11th to the last".

Lastly, note that you don't have to split just on whitespace. You can specify any regular expression to the -a option by the -F option. See the perlrun(1) for more details.

"Chunk" Processing: changing the input record separator

Often, in a Unix-style environment, you're dealing with data in which the records are separated by newline characters. In other words, one line is one record. Sometimes, however, it is more convenient to deal with records stored in more general chunks of data. For example, I often deal with chunks of lines, where chunks are separated by a blank line (or "\n\n"). Thankfully, Perl can operate on such data with relatively minor tweaks. Specifically, we must change the $/ variable, or the input record separator.

Let's go through an extended example with a file of junk "Lorem Ipsum" text, generated from lipsum.com. The text is arranged in paragraphs. There are 5 paragraphs, each of which is on a single line, and paragraphs are separated by a blank line. First things first, let's split the paragraphs by sentence:

$ perl -pe 's/\. /.\n/g' lorem.txt > j1

Now, let's count the number of sentences per paragraph:

$ perl -lpe 'BEGIN{ $/ = "\n\n"; } @c = (split /\n/); $_ = @c' j1
11
16
11
15
16

Note that there is an implicit scalar() around the last @c. (We actually could have done it just as easily without splitting the paragraphs into sentences. I'll leave that as an exercise to the reader.) Now let's calculate the average sentence length, in characters, over the entire text:

$ grep -v '^$' j1 | perl -lne '$c += length; END{ print $c/$.; }'
41.6666666666667

Nice. One more thing: let's calculate the average sentence length (in characters) per paragraph:

$ perl -lpe 'BEGIN{$/ = "\n\n"} $c = 0; @s = (split /\n/); $c += length foreach @s; $_ = $c/@s' j1
30.0909090909091
52.125
33.6363636363636
46.6666666666667
40

Very cool. If you're feeling ambitious, calculate the average number of words per sentence, per paragraph.

Conclusion

Perl can be a very valuable multi-purpose tool to add to your command-line toolbox. In this tutorial, we covered the essential -e switch, the implicit line loopers -n and -p, the auto-field-splitting -a switch, BEGIN{} and END{} blocks, and chunk-style processing by changing the input-record-separator variable.

Tags: perl cli