Recurring errors when dealing with Shell

This document collects recurring programming errors and stupidities found in a lot of code.

Using unsafe program output

Filtering the output of commands that were only designed to provide human-readable forms is very problematic. The output may change due to a variety of triggers, for example locale settings or other environmental variables. When a program translates strings into the desired language, strings may expand or shrink, add whitespace or other characters that can create confusion with the parser on the other end of the pipe.

Consider `ls`. The specification of time for example can change with the locale:

$ LC_TIME=de_DE ls -dl .
drwx--x--x 36 jengelh users 4096 15. Mai 21:59 .
$ LC_TIME=POSIX ls -dl .
drwx--x--x 36 jengelh users 4096 May 15 21:59 .
$ LC_TIME=en_US ls -dl .
drwx--x--x 36 jengelh users 4096 2008-05-15 21:59 .

This rules out the use of `cut` in either fixed form (-b option) or field form (-f option).

Blatant use of unescaped filenames in loops

When embedding a list of filenames without proper handling, problems will arise.

# Assuming `ls -1` is safe, which of course it is not
x=`ls -1`;
for i in $x; do
        ls -l -- "$x";
done;

There are at least two errors here. First, filenames with newlines—as rare as they may be—will get split at the newline. So if you had two files, “abc\ndef” and “ghi”, $x will contain “abc\ndef\nghi\n”. If you iterate over that with for in the way shown, you run through the loop thrice, not twice!—and that is wrong when there are just two files.

Second, consider files with spaces—not so uncommon—like “hello world.txt” and “goodbye world.txt”. $x will contain “hello world.txt\ngoodbye world.txt\n”. While no newlines interfere, the spaces do. The loop is executed four times instead of twice! Quoting does not help, you have to mess with the IFS shell variable and set it to the newline, but since the newline is a valid character for a filename, setting IFS is a wrong solution.

Note: Using globs in a for loop directly, i.e. `for i in *.txt` is without problems, even if they do contain spaces and newlines.

The correct solution is to use find and xargs with NUL-delimiting mode. You cannot directly use it to loop over a list of files, but safety comes first. Then comes a program that deals with the arguments.

find . -iname "*.txt" -print0 | xargs -0r ls -1

Slowpath

While we are at xargs, there is one thing to say. It can pass multiple arguments to a single invocation of the target program (in the last code block this was ls), only calling it once. The following may work, but it is slower because it spawns one ls instance for each file that was found, and you want your programs to go fast, don't you?:

Not good:
find . -iname "*.txt" -exec ls -dl '{}' \;

Better:
find . -iname "*.txt" -print0 | xargs -0 ls -dl

Redundant and useless use of constructs

Programs that have an internal way to process input or output should be used that way. Various redundant invocations of commands and pipes can be done away with.

Worse: cat /proc/cpuinfo | grep lm;
grep lm </proc/cpuinfo
Better: grep lm /proc/cpuinfo

Reason: Just do away with the pointless invocation of cat here (when passing multiple files or an unknown number of files to cat, use of cat is ok!). Or do away with the pointless redirection.
Worse: for i in `ls a*`
Better: for i in a*
Reason: Using `ls a*` in a for itself is a bad idea, see above. And you save spawning a process.
Worse: for i in `seq 1 10`
Better: for ((i = 1; i <= 10; ++i))
Reason: Obviously, you save spawning another process.
Worse: rpm -qa | grep foobar
Better: rpm -qa "*foobar*"
Reason: You save spawning another process. Only works as long as the rpm's built-in wildcard matching suits the needs. (Once you need `grep -P`-like semantics, you will only have grep or perl to pipe to.)
Can do: echo "$x" | grep this
Can do: grep this <<< "$x"
Reason: The triple less-than operator is bash-specific, so using echo is not really wrong. Also, echo is a shell builtin at least for bash, so it comes quite cheap anyway.