Shell Scripting Survival Guide

By Jan Matějka, June 04, 2019

Introduction

Shell scripting can be very effective and efficient tool in your toolbox saving you time spent as exemplified by the famous case of most frequently used words problem solved by D. E. Knuth and M. D. McIlroy.

As much as powerful shell is, it is as much difficult to figure out how to utilize it properly. It is so difficult, it is not uncommon for programmers to come to the conclusion that shell scripting is unfeasible for anything but miniscule and elementary scripts. This conclusion inevitably leads to preferring some general purpose language even when shell scripts would be significantly simpler and shorter.

This document aims to acquaint you with techniques for shell scripting that empowers to write readable, succinct, and reliable shell scripts. Suitable for writing CLI prototypes and often even for the end products.

Shell Choice

Shell choice is an important first decision you need to make.

If you need your shell script to be portable or acceptable as system component you are limited to the POSIX shell. Writing POSIX compatible shell scripts is pain but for non-system programs there are other, superior options.

You might be inclined to just dive into Bash as the de facto standard shell on Linux. However, Bash is not much of an improvement. Zsh is widely available shell that is, compared to Bash, delightful to use 2.

Most (all?) of the techniques presented here will be applicable to any POSIX compatible shell but for practical purposes I will focus on Bash here. Additional techniques specific to Zsh then will be presented in Zsh Scripting Guide but this document is still required reading.

Pre-Requisite Knowledge

In general only an elementary knowledge of shell scripting is assumed. Some techniques may require elementary knowledge of related topics:

  • basic scripting (simple commands, flow control, syntax & semantics)

  • filesystem model (cwd, pathnames, basic file operations, file descriptors)

  • Linux Filesystem Hierarchy

  • process model (environment, hierarchy, exit codes, signals 21, fork 6 & exec 32)

  • important variables like PATH

  • manual pages 26

  • GNU make 27

Deeper knowledge of these topics will also explain the internal working of these techniques and why some things are the way they are.

Conventions

Applicability

Techniques presented here assume they are to be used on optionally installed software. As opposed to basic system software, which brings its own unique set of challenges with each system and the techniques here may or may not be applicable.

To reduce duplication with Zsh Scripting Guide, references will include both Bash and Zsh references and some examples may also show Zsh in addition to Bash examples.

Filesystem Structure

Is shown as displayed by tree -F program 9.

Terminal Session Examples

$ foo
out
[1]
$

indicates a shell prompt. A % may be used instead of $ which implies a non-standard shell. In this document, % implies Zsh.

foo

execution of command named foo

out

combined stdout and stderr of the command unless qualified otherwise.

[1]

foo exited with exit code = 1.

Manual Pages Section References

When referring to sections of manual pages, a form like foo > bar in man 1 tree may be used where foo and bar are sections with > signifying bar being hierarchically under foo in manual page for program tree listed in manual pages section 1.

Simple Techniques

This section will deal with techniques that are implementation details, generally applicable regardless of your code structure.

Shebang

Start your scripts with

#!/usr/bin/env bash

To make your scripts executable by ./foo-cmd instead of a bash foo-cmd, you need to include a shebang in your script. That’s the #!... part.

You want to leverage PATH lookup for portability as different operating systems may install Bash into different paths. That is the /usr/bin/env part 28.

If your target system is linux, or specific set of linuxes, you may get away with #!/bin/bash as Bash is usually system shell or at least considered basic system software. I do not know if this is universal across all the Linux ecosystem though.

SELF

Right after shebang, start your script with SELF definition.

SELF=${0##*/}

SELF is the filename of your executable and will be useful later on at Error Message Printing, and Prelude.

The trick here consist of using zeroth argv element, which is the path to the file being executed 32 and then using parameter expansion 34 to get only the base filename.

Note the SELF definition can not be put into a function or Prelude file as in those places the $0 will refer to the sourced file or the function name, respectively.

Error Message Printing

Always lead the message with SELF and write to standard error.

printf >&2 "%s: %s\n" $SELF "error message"

Writing to standard error (stderr, that is the >&2 part 29) allows the user to suppress the stdandard output (stdout) by redirecting it to /dev/null without suppressing the error output as even when the user is not interested in the standard output, they will be interested in standard error if something goes wrong.

Leading with SELF is convention based on the assumption that any program may be used as part of another script. Then when something goes wrong, the user knows which program is responsible for the error message.

Error Induced Exits

Program termination due to an error must result into non-zero exit code.

foo || { printf >&2 "%s: %s\n" $SELF "foo failed"; exit 1; }

Exit code is fundamental way programs signal if something went wrong or everything is ok. You probably already depended on this behavior, make sure your programs set exit code properly as well.

Write Programs Not Functions

As your script will grow, you will need to structure the script into subroutines. Naturally, your first instinct may be to use functions but functions are problematic:

  1. Functions are difficult to write as their behavior is dependent on shell options and global variables.

  2. Functions are difficult to test for the same reasons they are difficult to write.

  3. Functions complicate Errexit usage.

  4. Functions are not usable with xargs (though there is zargs 35 in Zsh)

Instead you want to write subroutines as standalone programs (processes):

/bin/
├── foo
└── foo-subroutine

Clear Input/Output Definition

By having subroutine a program (process) you have clearly defined inputs:

  • environment

  • argv

and outputs:

  • exit code

  • stdout

  • stderr

This is the basic set of things you need to worry about here. Depending on what your program does, there may be more like stdin, filesystem effects, signals, and maybe more.

That is already plenty of things to worry about. We do not need to add global variables and shell options to the list.

Functionally (Almost) Equivalent

Writing subroutine as subprogram will give you all you need from the subroutine and you will use it much the same way as you would use a function, only simpler as you can run it as any other program. Sometimes you will need to make a function though. More about that later at Prelude.

Now You Have to Handle Installation

By separating a script into multiple files you may need to worry about installation but with right tools it is not much of an issue. More about that later at Installation.

Avoid Directory Changes

Most often cd is used only to construct a path or other simple things which can be done trivially by string manipulation or with dirname and realpath commands.

Code which needlessly changes directories is hard to follow and is prone to breakage on refactorings.

If you do need to change directories, builtin commands pushd and popd may be preferred to cd as popd also pushes the current working directory onto a stack so you can get back with a popd call.

Other valid strategy is isolating the directory change in a subshell:

$(cd foodir && cwd-sensitive-command)

The code withing command substitution ($( ... )) is executed in subshell and directory change does not effect the surrounding code.

xargs

Prefer xargs 11 to for loops or command substitutions 36:

docker ps -q | xargs -r docker kill

It is usually easier to follow once you learn to recognize the pattern as it is more succinct and removes potential needless state (compared to for i in ...).

Furthermore, xargs will automatically scale the command argv according to system limit (compared to docker kill $(docker ps -q)) and is trivial to parallelize via -P $(nproc).

The -r option prevents running the command on systems with GNU xargs if the input is empty which is usually the behavior you want.

Boolean Values

To represent boolean values use true and false

foo=true

if $foo ; then
   ...
fi

The trick here is that both true and false are either builtins or /bin executables no-ops with the appropriate exit code 30 and the general syntax for if keyword is if <command>; then.

You probably have typically seen variations like if [[ ... ]]; or if test -e ...; but these are also just commands and you can use any command possible as the ifs truth value is determined by the command’s exit code.

However, if you accept these as inputs you need to consider the risk of users injecting malicious commands depending on your use case.

This convention is motivated entirely by aesthetics and succinctness. What can be usually seen in the wild is something like:

foo=yes

if [[ $foo = "yes" ]]; then
   ...
fi

Null Globs

You will probably be globbing a lot. When globbing, you will mostly glob files that can have 1..N occurrences. But occasionally you will want to glob a path that may occur 0..N times.

Globbing a path that does not exist will normally yield an error.

Depending on your shell options, the error may be produced either by the glob itself:

% printf "%s" nonexistent*
zsh: no matches found: nonexistent*
[1]

or by the commands the glob expands to:

$ printf "%s\n" nonexistent*
nonexistent*
[0]

For these occasions, there are null glob options which will make the globs expand to nothing:

$ shopt -s nullglob
$ printf "%s" nonexistent*
[0]

In Zsh:

% set -G
% printf "%s" nonexistent*
[0]

This is useful in cases where 0 occurrences is valid expansion and the command can handle the null expansion correctly. As a counterexample, using null glob with cat may be even worse as it may instead just hang on waiting for input on stdin:

% set -G
% cat nonexistent*
^C
[130]

Note, the ^C here indicates the command has been terminated by Ctrl-C.

Sequence Expressions

$ echo {0..5}
0 1 2 3 4 5

Usable in for loops or printfable into xargs.

Unfortunately, this works only for a static numbers in bash:

$ x=5
$ for i in {0..$x}; do echo $i; done
{0..5}

But seq can be used for the dynamic purpose:

$ seq 3
1
2
3

Use Errexit Judiciously

set -o errexit

Looks like a good idea until you find how broken it is 3 4 5 . Generally I recommend to completely avoid it unless you know very well what you are doing.

We will discuss safe use of errexit later after learning about Architectural Techniques.

Resource Cleanup

It is all too common to see shell code like:

acquire-resource

do-things-with-resource

resource-cleanup

The issue is that script may terminate before it gets to executing the resource-cleanup.

You would not do this in general purpose language and you should not in shell as well. General purpose languages have various ways to deal with the problem such as try…finally construct, context managers, RAII, defer statements, etc.

In shell, this can be achieved with builtin trap command 20:

trap 'resource-cleanup' EXIT
acquire-resource

do-things-with-resource

You can always have only one trap registered so it gets hairy with longer scripts. But this issue will go away once you apply techniques presented at Architectural Techniques.

Architectural Techniques

In this section we introduce techniques that impose overall structure on your code and deal with general topics that almost every program needs to deal with.

Prelude

We already covered you do not want to write functions at Write Programs Not Functions. But sometimes you will have to. First thing you may want to do is to have a basic “standard library” for your script, that is your prelude.

If you do a lot of error handling, you may want to use

foo || fatal "foo failed"

instead of the lengthy error handling from Error induced exits.

fatal has to be a function in order to apply the exit 1 to the right process.

This is where a prelude file comes in with structure like:

/bin/
├── foo
├── foo_prelude
└── foo-subroutine

prelude code:

#!/bin/false

function fatal {
   printf >&2 "%s: %s" $SELF
   exit 1
}

foo code:

#!/usr/bin/env bash

SELF=${0##*/}
. foo_prelude

foo-subroutine || fatal "subroutine failed"

results:

$ foo
foo: subroutine failed
[1]

The trick here, is that you will abuse $PATH by adding your prelude in there as well. That will allow you to do a simple . foo_prelude without worrying where it is actually located.

Since prelude is intended to be sourced, not executed. It is a bit different. First, it has a special shebang #!/bin/false which ensures it will be a no-op and exit with non-zero exit code if someone tries to execute it. Second, its filename uses underscore instead of dash. More about that later at Command Dispatch.

Command Dispatch

Eventually, you will need to add subcommands like foo-cmd1 to your program:

/bin/
├── foo
├── foo_prelude
├── foo_dispatch
├── foo-cmd1
└── foo-cmd2

foo_dispatch code:

#! /usr/bin/env bash

SELF="${0##*/}"
. foo_prelude

: ${1:?}
: ${2:?}

cmd=$1-$2
shift 2

$cmd "$@"

foo code:

#! /usr/bin/env bash

SELF="${0##*/}"
. foo_prelude

foo_dispatch $SELF "$@"

Now we can just add foo-cmd1 and foo-cmd2 files and we have subcommands that are executable as foo cmd1 and foo cmd2.

This is very useful as it is generally more user friendly and foo executable may perform initialization like preparing environment variables common for all the subcommands.

Note that prelude and dispatch are named with underscore instead of dash, so these are not subcommands as foo_dispatch constructs the subcommand executables with dashes.

Further note that this construction with SELF passing allows foo_dispatch to be used to arbitrary subcommand nesting without any issue.

Even further, if the main entry point is not doing anything special, the next level of dispatch may be achieved just by symlinking to the main entry point because that will cause SELF to be assigned the name of the symlink and not the actual executable file:

/bin/
├── foo
├── foo_prelude
├── foo_dispatch
├── foo-cmd1
├── foo-cmd2
├── foo-bar-qux
└── foo-bar -> foo

Still even further, this approach (which is also used by git for example) is that it lends itself to modularization by 3rd parties by simply dropping foo-3rd in your PATH.

Argument Parsing

You may forego argument parsing for taking only fixed arguments or environment variables but it will quickly result into poor user experience until it becomes completely unusable and even user hostile interface. You will need to do argument parsing.

Your options are getopt 17 and getopts 18. getopts is limited to shortopts and getopt may have its own warts. So far I have successfully avoided this problem by using zparseopts 19. You might also be interested in haveopt 31. Or you might as well just write a custom parser. It will not be much different from the way you would write getopts usage.

Custom parser would look something like:

#!/usr/bin/env zsh

SELF="${0##*/}"
. foo_prelude

# set default values
o_x=false
o_opt=

# repeat while argv has elements
while (( $# > 0 )); do
   case $1 in
   -x)
      # parsing a boolean, set the value and
      # consume one element from argv
      o_x=true
      shift
      ;;
   --opt)
      # parsing a parameter option
      # set the value and consume two elements from argv
      o_opt=${2:?Missing --opt value}
      shift 2
      ;;
   *)
      # parameter not recognized, we either reached
      # positional arguments or user entered invalid
      # flag. Might want to check for "-" prefix or something.
      break
      ;;
   esac
done

The custom parser method adds only one another line per option and the line count is asymptotically the same with the getopt/getopts approaches. The only real thing you give up, is argument bundling (-xyz being equivalent with -x -y -z) which can often be sacrificed.

You should really prefer Zsh to Bash anyway. The Zsh solution is a bit more cryptic but much simpler and discussed in Zsh Scripting Guide.

Debugging

The simplest way to debug a script is enabling XTRACE 10:

set -x

We will need to also conveniently propagate it to the subcommands:

foo code:

#!/usr/bin/env zsh

SELF="${0##*/}"
. foo_prelude

while (( $# > 0 )); do
   case $1 in
   -x)
      export FOO_XTRACE=true
      shift
      ;;
   *)
      break
      ;;
   esac
done

foo_dispatch $SELF "$@"

prelude code:

#!/bin/false

# prelude functions ...

${FOO_XTRACE:-false} && set -x

Now foo -x will activate xtrace by parsing it from argv in the foo entrypoint and then exporting an environment variable FOO_XTRACE=true. As the next command will be executing the prelude, at the end of prelude the FOO_XTRACE will evaluate to true and set -x will enable the xtrace for it.

It is also possible to just export the FOO_XTRACE=true directly instead of using -x argument.

Note we are using custom argument parser as demonstrated in Argument Parsing and Boolean Values technique for the environment flag.

The xtrace output in bash is not very convenient but it is simple and gets the job done. This will be more comfortable in Zsh Scripting Guide.

Configuration

Just cat files for as long as you can get away with.

Assuming your program is named foo and you need a path to bar a simple:

bar_path=$(cat ~/.config/foo/bar_path)

will do. If you need configurable another option, add another file.

It is easy to read and easy to write. This topic is further discussed at yaml sucks^Wdoes not rock.

If you will want to use a single configuration file, you will need to structure it and provide a command like foo-config for correctly setting and reading configuration values.

This is the approach git takes. The git-config is what makes tutorials including commands like git config --global user.name "Jerry Mouse" possible.

Depending on your audience, or other factors, using single structured configuration file may be the right choice but it is more work and that is non-essential in early stages. Using file per option lets you focus on the core problem and you can redo configuration once you have something solid.

By using files you will also want to plug envdir 16 in the main entrypoint and get the configuration as environment variables for free.

In case you would want to support XDG Basedir Spec 23, you may additionally plug xdgenv 22 into the main entrypoint and have it for free similar to using envdir.

Installation

You want to have a simple, standard way to build and install your program regardless of the size or number of files of your program. You may get away without it if your program is a single executable file but as the program grows to multiple files this becomes a necessity.

Use GNU make. Refer to GNU Make Coding Guide.

Source Structure

To simplify the installation process and command dispatch we need some conventions for the source and installation structure.

To recap, this is the structure we install into:

/bin/
├── foo
├── foo_prelude
└── foo-subroutine

Source structure:

foo/
└── src/
    ├── foo.bash
    ├── foo_prelude.bash
    └── foo-subroutine.bash

Testing

Use cram 8. It has issues but it is the best tool for testing command line interfaces I know of. It is simple to write test cases and interpret failures.

Extend your makefile so tests can be ran with make check:

cram_opts ?= --shell=/usr/bin/bash
cram_root ?= cram
cram_path ?= $(cram_root)

check_path = $(pwd)/$(build_dir)/fakeroot/usr/local/bin:/bin:/usr/bin:/usr/local/bin

.PHONY: clean
clean:

        $(RM) -r $(build_dir) $(cram_root)/*.t.err

.PHONY: check
check: build

        mkdir -p $(build_dir)/fakeroot
        DESTDIR=$(build_dir)/fakeroot $(MAKE) install
        env -i PATH=$(check_path) cram $(cram_opts) $(cram_path)
  • We make install our code into a fakeroot to make sure if our tests pass, the code was not just built correctly but installed as well.

  • We override the PATH variable to make sure we do not rely on non-standard executables.

  • We run cram within env -i to ensure the tests does not depend on our custom / development environment variables.

  • And finally we extend clean target to cleanup cram artefacts if there are any.

  • To fake commands you may simply generate their fake versions into the $(build_dir)/fakeroot/usr/local/bin either by printfing a fake shell script or with fake 12

Documentation

Write man pages. If you are not comfortable with {g,t,n,}roff, you may use man page generators. I personally use rst2man 13 as I generally consider reStructuredText 7 the sweet spot between power and complexity.

See rst2man.txt for an example man page written rst.

To incorporate documentation, we need to update our source structure:

foo
├── Documentation/
│   └── man1/
│       ├── foo-cmd.rst
│       └── foo.rst
└── src/
    ├── foo-cmd.zsh
    ├── foo_dispatch.zsh
    ├── foo_prelude.zsh
    └── foo.zsh

and makefile:

## installation targets
i_bin_dir     = $(DESTDIR)$(PREFIX)/bin
i_man_dir     = $(DESTDIR)$(PREFIX)/man/man1

## build targets
b_bin_dir     = $(build_dir)/bin
b_man_dir     = $(build_dir)/man/man1

cmds      = $(patsubst $(src_dir)/%.zsh,%,$(wildcard $(src_dir)/*.zsh))
mans      = $(patsubst Documentation/man1/%.rst,%.1,$(wildcard Documentation/man1/*.rst))

dirs      =
dirs     += $(b_bin_dir) $(i_bin_dir)
dirs     += $(b_man_dir) $(i_man_dir)

## build dependencies
b_deps    =
b_deps   += $(b_bin_dir)
b_deps   += $(b_man_dir)
b_deps   += $(addprefix $(b_bin_dir)/,$(cmds))
b_deps   += $(addprefix $(b_man_dir)/,$(mans))

## install dependencies
i_deps    =
i_deps   += $(i_bin_dir)
i_deps   += $(i_man_dir)
i_deps   += $(addprefix $(i_bin_dir)/,$(cmds))
i_deps   += $(addprefix $(i_man_dir)/,$(mans))


# build man pages
$(b_man_dir)/%.1: Documentation/man1/%.rst

        rst2man $< $@

# install man pages
$(i_man_dir)/%: $(b_man_dir)/%

        $(install_data) $< $@

Nothing much new is going on here. We just extended

  • the directories we need to build/install with man page directories

  • read the manpage files into mans variable the same way we do with commands.

  • extend build/install dependencies with man pages

  • And finally add targets to build and install the man pages.

This makefile is limited to section 1 man pages but should be trivial to extend to more sections if needed.

Help output

To support foo -h arguments the simplest solution is to exec man foo 14.

Code Style

This section deals with techniques that could technically fall under code style but have functional effects. It will not deal with subjective things like indent length which has no effect on function.

Breaking long lines

Conditionals can be broken simply after the logical operators without the need for line ending escape:

foo ||
   bar ||
   baz

Argument lists can be broken via a helper array:

args=(
   foo
   bar
)

cmd "${args[@]}"

This has the advantage that

  • you do not need the line ending escape again

  • you may experiment with different argument combination simply commenting lines out

  • easily extendable on conditionals or passthru options

Path Definitions

When writing path literals, get into the habit of not ending with trailing slash. Ever. When reading file paths, normalize them to not end with trailing slash as well.

some_dir=/foo/bar

echo $some_dir/qux

The main reason is that in some commands the trailing slash implies different semantics. This is the case with rsync, or some instances of cp, and probably others.

Secondary reason is that not all paths are file paths and commands using them will not normalize double slashes into single one. For example abstract unix socket path foo/bar and foo//bar refer to different sockets. URL paths should normalize double slash into single one but that is handled by the HTTP server 1.

Adherence to “definitions are without trailing slash” makes this a non-issue as each call site may decide whether it needs to add a slash or not.

Imagine what are your options when having a path defined with trailing slash or possibly either way:

  • $some_dir/qux looks ok. But you have double slash problem.

  • ${some_dir}qux forces braces. Lacks taste. And now you have to wonder “How exactly is some_dir defined?” each time you want to use its value.

Conclusion

At this point we are mostly done with what can be achieved in POSIX shell or Bash and we got surprisingly far.

Unfortunately, many solutions are too clunky (as is the case of Debugging, Argument Parsing and more of which has not been discussed yet). It is necessary to pick up a more powerful shell to continue our quest for readable and succinct code. This quest continues at Zsh Scripting Guide 24.

Acknowledgements

Thanks to Roman Neuhauser 33 who I learned much from.

Backlog

https://stackoverflow.com/questions/72889857/why-does-bash-treat-undefined-variables-as-true-in-an-if-statement

What exactly was the point of [ “x$var” = “xval” ]?

References

1

I even think the double slash is not permitted by the HTTP protocol. The normalization may be courtesy of the server implementors.

2

Other shells maybe just as fine or even better choice as Zsh but I am not familiar enough with other shells. Particularly https://www.skarnet.org/software/execline/ is something quite different with interesting properties.

3

https://mywiki.wooledge.org/BashFAQ/105

4

SHELL BUILTIN COMMANDS > set > -e in man 1 bash

ERR_EXIT in man 1 zshoptions https://linux.die.net/man/1/zshoptions

5

https://github.com/jan-matejka/errexit-considered-harmful

6

man 2 fork https://linux.die.net/man/2/fork

7

http://docutils.sourceforge.net/rst.html

8

https://github.com/brodie/cram

9

http://mama.indstate.edu/users/ice/tree/

https://linux.die.net/man/1/tree

10

SHELL BUILTIN COMMANDS > set > -x in man 1 bash https://linux.die.net/man/1/bash

XTRACE in man 1 zshoptions https://linux.die.net/man/1/zshoptions

11

man 1 xargs https://linux.die.net/man/1/xargs

12

https://github.com/roman-neuhauser/fake

13

python-docutils http://docutils.sourceforge.net/

14

Also an aproach taken by git --help. Though git distinguishes short opt (-h) and long opt (--help) semantically.

16

http://cr.yp.to/daemontools/envdir.html

http://thedjbway.b0llix.net/daemontools/envdir.html

17

man 1 getopt https://linux.die.net/man/1/getopt

18

SHELL BUILTIN COMMANDS > getopts in man 1 bash https://linux.die.net/man/1/bash

19

THE ZSH/ZUTIL MODULE > zparseopts in man 1 zshmodules https://linux.die.net/man/1/zshmodules

20

SHELL BUILTIN COMMANDS > trap in man 1 bash

SHELL BUILTIN COMMANDS > trap in man 1 zshbuiltins https://linux.die.net/man/1/zshbuiltins

21

man 7 signal https://linux.die.net/man/7/signal

22

https://github.com/jan-matejka/xdgenv

23

https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html

24

https://www.matejka.ninja/software/lang/zsh.html

26

man 1 man https://linux.die.net/man/1/man

27

https://www.gnu.org/software/make/

man 1 make https://linux.die.net/man/1/make

28

Recently I learned that shebang in the form #!bash is also possible I do not know what additional assumptions (if any) this makes about the target system.

29

REDIRECTION in man 1 bash https://linux.die.net/man/1/bash

REDIRECTION in man 1 zshmisc https://linux.die.net/man/1/zshmisc

30

Shell builtins in case of Zsh

31

https://github.com/roman-neuhauser/haveopt

32(1,2)

man 3 exec https://linux.die.net/man/3/exec

33

https://github.com/roman-neuhauser

http://rants.sigpipe.cz/

34

EXPANSION > Pameter Expansion in man 1 bash

35

man 1 zshcontrib https://linux.die.net/man/1/zshcontrib

36

EXPANSION > Command Substitution in man 1 bash