Shell Scripting Survival Guide¶
By Jan Matějka, June 04, 2019
Table of Contents
Shell scripting can be very effective and efficient tool in your toolbox saving you time spent as exemplified by the famous case of most frequently used words problem solved by D. E. Knuth and M. D. McIlroy.
As much as powerful shell is, it is as much difficult to figure out how to utilize it properly. It is so difficult, it is not uncommon for programmers to come to the conclusion that shell scripting is unfeasible for anything but miniscule and elementary scripts. This conclusion inevitably leads to preferring some general purpose language even when shell scripts would be significantly simpler and shorter.
This document aims to acquaint you with techniques for shell scripting that empowers to write readable, succinct, and reliable shell scripts. Suitable for writing CLI prototypes and often even for the end products.
Shell Choice¶
Shell choice is an important first decision you need to make.
If you need your shell script to be portable or acceptable as system component you are limited to the POSIX shell. Writing POSIX compatible shell scripts is pain but for non-system programs there are other, superior options.
You might be inclined to just dive into Bash as the de facto standard shell on Linux. However, Bash is not much of an improvement. Zsh is widely available shell that is, compared to Bash, delightful to use 2.
Most (all?) of the techniques presented here will be applicable to any POSIX compatible shell but for practical purposes I will focus on Bash here. Additional techniques specific to Zsh then will be presented in Zsh Scripting Guide but this document is still required reading.
Pre-Requisite Knowledge¶
In general only an elementary knowledge of shell scripting is assumed. Some techniques may require elementary knowledge of related topics:
basic scripting (simple commands, flow control, syntax & semantics)
filesystem model (cwd, pathnames, basic file operations, file descriptors)
process model (environment, hierarchy, exit codes, signals 21, fork 6 & exec 32)
important variables like PATH
manual pages 26
GNU make 27
Deeper knowledge of these topics will also explain the internal working of these techniques and why some things are the way they are.
Techniques presented here assume they are to be used on optionally installed software. As opposed to basic system software, which brings its own unique set of challenges with each system and the techniques here may or may not be applicable.
To reduce duplication with Zsh Scripting Guide, references will include both Bash and Zsh references and some examples may also show Zsh in addition to Bash examples.
Filesystem Structure¶
Is shown as displayed by tree -F
program 9.
Terminal Session Examples¶
$ foo
indicates a shell prompt. A
may be used instead of$
which implies a non-standard shell. In this document,%
execution of command named
combined stdout and stderr of the command unless qualified otherwise.
foo exited with exit code =
Manual Pages Section References¶
When referring to sections of manual pages, a form like foo > bar in man 1 tree
may be used
where foo
and bar
are sections with >
signifying bar
being hierarchically under
in manual page for program tree
listed in manual pages section 1
Simple Techniques¶
This section will deal with techniques that are implementation details, generally applicable regardless of your code structure.
Start your scripts with
#!/usr/bin/env bash
To make your scripts executable by ./foo-cmd
instead of a bash foo-cmd
, you need to include
a shebang in your script. That’s the #!...
You want to leverage PATH lookup for portability as different operating systems may install Bash
into different paths. That is the /usr/bin/env
part 28.
If your target system is linux, or specific set of linuxes, you may get away with #!/bin/bash
Bash is usually system shell or at least considered basic system software. I do not know if this is
universal across all the Linux ecosystem though.
Right after shebang, start your script with SELF definition.
SELF is the filename of your executable and will be useful later on at Error Message Printing, and Prelude.
The trick here consist of using zeroth argv element, which is the path to the file being executed 32 and then using parameter expansion 34 to get only the base filename.
Note the SELF definition can not be put into a function or Prelude file as in those places the
will refer to the sourced file or the function name, respectively.
Error Message Printing¶
Always lead the message with SELF and write to standard error.
printf >&2 "%s: %s\n" $SELF "error message"
Writing to standard error (stderr, that is the >&2
part 29) allows the user to suppress the
stdandard output (stdout) by redirecting it to /dev/null without suppressing the error output as
even when the user is not interested in the standard output, they will be interested in standard
error if something goes wrong.
Leading with SELF is convention based on the assumption that any program may be used as part of another script. Then when something goes wrong, the user knows which program is responsible for the error message.
Error Induced Exits¶
Program termination due to an error must result into non-zero exit code.
foo || { printf >&2 "%s: %s\n" $SELF "foo failed"; exit 1; }
Exit code is fundamental way programs signal if something went wrong or everything is ok. You probably already depended on this behavior, make sure your programs set exit code properly as well.
Write Programs Not Functions¶
As your script will grow, you will need to structure the script into subroutines. Naturally, your first instinct may be to use functions but functions are problematic:
Functions are difficult to write as their behavior is dependent on shell options and global variables.
Functions are difficult to test for the same reasons they are difficult to write.
Functions complicate Errexit usage.
Functions are not usable with xargs (though there is zargs 35 in Zsh)
Instead you want to write subroutines as standalone programs (processes):
├── foo
└── foo-subroutine
Clear Input/Output Definition¶
By having subroutine a program (process) you have clearly defined inputs:
and outputs:
exit code
This is the basic set of things you need to worry about here. Depending on what your program does, there may be more like stdin, filesystem effects, signals, and maybe more.
That is already plenty of things to worry about. We do not need to add global variables and shell options to the list.
Functionally (Almost) Equivalent¶
Writing subroutine as subprogram will give you all you need from the subroutine and you will use it much the same way as you would use a function, only simpler as you can run it as any other program. Sometimes you will need to make a function though. More about that later at Prelude.
Now You Have to Handle Installation¶
By separating a script into multiple files you may need to worry about installation but with right tools it is not much of an issue. More about that later at Installation.
Avoid Directory Changes¶
Most often cd
is used only to construct a path or other simple things which can be done trivially
by string manipulation or with dirname
and realpath
Code which needlessly changes directories is hard to follow and is prone to breakage on refactorings.
If you do need to change directories, builtin commands pushd
and popd
may be preferred to
as popd
also pushes the current working directory onto a stack so you can get back with a
Other valid strategy is isolating the directory change in a subshell:
$(cd foodir && cwd-sensitive-command)
The code withing command substitution ($( ... )
) is executed in subshell and directory change
does not effect the surrounding code.
Prefer xargs 11 to for loops or command substitutions 36:
docker ps -q | xargs -r docker kill
It is usually easier to follow once you learn to recognize the pattern as it is more succinct and
removes potential needless state (compared to for i in ...
Furthermore, xargs will automatically scale the command argv according
to system limit (compared to docker kill $(docker ps -q)
) and is trivial to parallelize via -P
The -r
option prevents running the command on systems with GNU xargs if the input is empty which
is usually the behavior you want.
Boolean Values¶
To represent boolean values use true
and false
if $foo ; then
The trick here is that both true
and false
are either builtins or /bin executables no-ops
with the appropriate exit code 30 and the general syntax for if
keyword is if <command>;
You probably have typically seen variations like if [[ ... ]];
or if test -e ...;
but these
are also just commands and you can use any command possible as the ifs truth value is determined by
the command’s exit code.
However, if you accept these as inputs you need to consider the risk of users injecting malicious commands depending on your use case.
This convention is motivated entirely by aesthetics and succinctness. What can be usually seen in the wild is something like:
if [[ $foo = "yes" ]]; then
Null Globs¶
You will probably be globbing a lot. When globbing, you will mostly glob files that can have
occurrences. But occasionally you will want to glob a path that may occur 0..N
Globbing a path that does not exist will normally yield an error.
Depending on your shell options, the error may be produced either by the glob itself:
% printf "%s" nonexistent*
zsh: no matches found: nonexistent*
or by the commands the glob expands to:
$ printf "%s\n" nonexistent*
For these occasions, there are null glob options which will make the globs expand to nothing:
$ shopt -s nullglob
$ printf "%s" nonexistent*
In Zsh:
% set -G
% printf "%s" nonexistent*
This is useful in cases where 0 occurrences is valid expansion and the command can handle the null expansion correctly. As a counterexample, using null glob with cat may be even worse as it may instead just hang on waiting for input on stdin:
% set -G
% cat nonexistent*
Note, the ^C
here indicates the command has been terminated by Ctrl-C.
Sequence Expressions¶
$ echo {0..5}
0 1 2 3 4 5
Usable in for
loops or printfable into xargs
Unfortunately, this works only for a static numbers in bash:
$ x=5
$ for i in {0..$x}; do echo $i; done
But seq
can be used for the dynamic purpose:
$ seq 3
Use Errexit Judiciously¶
set -o errexit
Looks like a good idea until you find how broken it is 3 4 5 . Generally I recommend to completely avoid it unless you know very well what you are doing.
We will discuss safe use of errexit later after learning about Architectural Techniques
Resource Cleanup¶
It is all too common to see shell code like:
The issue is that script may terminate before it gets to executing the resource-cleanup
You would not do this in general purpose language and you should not in shell as well. General purpose languages have various ways to deal with the problem such as try…finally construct, context managers, RAII, defer statements, etc.
In shell, this can be achieved with builtin trap
command 20:
trap 'resource-cleanup' EXIT
You can always have only one trap registered so it gets hairy with longer scripts. But this issue will go away once you apply techniques presented at Architectural Techniques.
Architectural Techniques¶
In this section we introduce techniques that impose overall structure on your code and deal with general topics that almost every program needs to deal with.
We already covered you do not want to write functions at Write Programs Not Functions. But sometimes you will have to. First thing you may want to do is to have a basic “standard library” for your script, that is your prelude.
If you do a lot of error handling, you may want to use
foo || fatal "foo failed"
instead of the lengthy error handling from Error induced exits.
has to be a function in order to apply the exit 1
to the right process.
This is where a prelude file comes in with structure like:
├── foo
├── foo_prelude
└── foo-subroutine
prelude code:
function fatal {
printf >&2 "%s: %s" $SELF
exit 1
foo code:
#!/usr/bin/env bash
. foo_prelude
foo-subroutine || fatal "subroutine failed"
$ foo
foo: subroutine failed
The trick here, is that you will abuse $PATH
by adding your prelude in there as well. That will
allow you to do a simple . foo_prelude
without worrying where it is actually located.
Since prelude is intended to be sourced, not executed. It is a bit different.
First, it has a special shebang #!/bin/false
which ensures it will be a
no-op and exit with non-zero exit code if someone tries to execute it. Second, its filename uses
underscore instead of dash. More about that later at Command Dispatch.
Command Dispatch¶
Eventually, you will need to add subcommands like foo-cmd1
to your program:
├── foo
├── foo_prelude
├── foo_dispatch
├── foo-cmd1
└── foo-cmd2
foo_dispatch code:
#! /usr/bin/env bash
. foo_prelude
: ${1:?}
: ${2:?}
shift 2
$cmd "$@"
foo code:
#! /usr/bin/env bash
. foo_prelude
foo_dispatch $SELF "$@"
Now we can just add foo-cmd1
and foo-cmd2
files and we have subcommands that are executable
as foo cmd1
and foo cmd2
This is very useful as it is generally more user friendly and foo
executable may perform
initialization like preparing environment variables common for all the subcommands.
Note that prelude and dispatch are named with underscore instead of dash, so these are not
subcommands as foo_dispatch
constructs the subcommand executables with dashes.
Further note that this construction with SELF
passing allows foo_dispatch
to be used to
arbitrary subcommand nesting without any issue.
Even further, if the main entry point is not doing anything special, the next level of dispatch may
be achieved just by symlinking to the main entry point because that will cause SELF
to be
assigned the name of the symlink and not the actual executable file:
├── foo
├── foo_prelude
├── foo_dispatch
├── foo-cmd1
├── foo-cmd2
├── foo-bar-qux
└── foo-bar -> foo
Still even further, this approach (which is also used by git
for example) is that it lends
itself to modularization by 3rd parties by simply dropping foo-3rd
in your PATH
Argument Parsing¶
You may forego argument parsing for taking only fixed arguments or environment variables but it will quickly result into poor user experience until it becomes completely unusable and even user hostile interface. You will need to do argument parsing.
Your options are getopt
17 and getopts
18. getopts
is limited to shortopts and
may have its own warts. So far I have successfully avoided this problem by using
19. You might also be interested in haveopt
31. Or you might as well just
write a custom parser. It will not be much different from the way you would write getopts
Custom parser would look something like:
#!/usr/bin/env zsh
. foo_prelude
# set default values
# repeat while argv has elements
while (( $# > 0 )); do
case $1 in
# parsing a boolean, set the value and
# consume one element from argv
# parsing a parameter option
# set the value and consume two elements from argv
o_opt=${2:?Missing --opt value}
shift 2
# parameter not recognized, we either reached
# positional arguments or user entered invalid
# flag. Might want to check for "-" prefix or something.
The custom parser method adds only one another line per option and the line count is asymptotically
the same with the getopt/getopts approaches. The only real thing you give up, is argument bundling
being equivalent with -x -y -z
) which can often be sacrificed.
You should really prefer Zsh to Bash anyway. The Zsh solution is a bit more cryptic but much simpler and discussed in Zsh Scripting Guide.
The simplest way to debug a script is enabling XTRACE 10:
set -x
We will need to also conveniently propagate it to the subcommands:
foo code:
#!/usr/bin/env zsh
. foo_prelude
while (( $# > 0 )); do
case $1 in
export FOO_XTRACE=true
foo_dispatch $SELF "$@"
prelude code:
# prelude functions ...
${FOO_XTRACE:-false} && set -x
Now foo -x
will activate xtrace by parsing it from argv
in the foo
entrypoint and then
exporting an environment variable FOO_XTRACE=true
. As the next command will be executing the
prelude, at the end of prelude the FOO_XTRACE
will evaluate to true and set -x
will enable the xtrace for it.
It is also possible to just export the FOO_XTRACE=true directly instead of using -x
Note we are using custom argument parser as demonstrated in Argument Parsing and Boolean Values technique for the environment flag.
The xtrace output in bash is not very convenient but it is simple and gets the job done. This will be more comfortable in Zsh Scripting Guide.
Just cat files for as long as you can get away with.
Assuming your program is named foo
and you need a path to bar
a simple:
bar_path=$(cat ~/.config/foo/bar_path)
will do. If you need configurable another option, add another file.
It is easy to read and easy to write. This topic is further discussed at yaml sucks^Wdoes not rock.
If you will want to use a single configuration file, you will need to structure it and provide a
command like foo-config
for correctly setting and reading configuration values.
This is the approach git
takes. The git-config
is what makes tutorials including commands
like git config --global "Jerry Mouse"
Depending on your audience, or other factors, using single structured configuration file may be the right choice but it is more work and that is non-essential in early stages. Using file per option lets you focus on the core problem and you can redo configuration once you have something solid.
By using files you will also want to plug envdir
16 in the main entrypoint and get the
configuration as environment variables for free.
In case you would want to support XDG Basedir Spec
23, you may additionally plug xdgenv
22 into the main entrypoint and have it for free similar to using envdir
You want to have a simple, standard way to build and install your program regardless of the size or number of files of your program. You may get away without it if your program is a single executable file but as the program grows to multiple files this becomes a necessity.
Use GNU make. Refer to GNU Make Coding Guide.
Source Structure¶
To simplify the installation process and command dispatch we need some conventions for the source and installation structure.
To recap, this is the structure we install into:
├── foo
├── foo_prelude
└── foo-subroutine
Source structure:
└── src/
├── foo.bash
├── foo_prelude.bash
└── foo-subroutine.bash
Use cram
8. It has issues but it is the best tool for testing command line interfaces I know
of. It is simple to write test cases and interpret failures.
Extend your makefile so tests can be ran with make check
cram_opts ?= --shell=/usr/bin/bash
cram_root ?= cram
cram_path ?= $(cram_root)
check_path = $(pwd)/$(build_dir)/fakeroot/usr/local/bin:/bin:/usr/bin:/usr/local/bin
.PHONY: clean
$(RM) -r $(build_dir) $(cram_root)/*.t.err
.PHONY: check
check: build
mkdir -p $(build_dir)/fakeroot
DESTDIR=$(build_dir)/fakeroot $(MAKE) install
env -i PATH=$(check_path) cram $(cram_opts) $(cram_path)
make install
our code into a fakeroot to make sure if our tests pass, the code was not just built correctly but installed as well.We override the
variable to make sure we do not rely on non-standard executables.We run cram within
env -i
to ensure the tests does not depend on our custom / development environment variables.And finally we extend
target to cleanup cram artefacts if there are any.To fake commands you may simply generate their fake versions into the
either by printfing a fake shell script or withfake
Write man pages. If you are not comfortable with {g,t,n,}roff, you may use man page generators. I
personally use rst2man
13 as I generally consider reStructuredText
7 the sweet spot
between power and complexity.
See rst2man.txt for an example man page written rst.
To incorporate documentation, we need to update our source structure:
├── Documentation/
│ └── man1/
│ ├── foo-cmd.rst
│ └── foo.rst
└── src/
├── foo-cmd.zsh
├── foo_dispatch.zsh
├── foo_prelude.zsh
└── foo.zsh
and makefile:
## installation targets
i_bin_dir = $(DESTDIR)$(PREFIX)/bin
i_man_dir = $(DESTDIR)$(PREFIX)/man/man1
## build targets
b_bin_dir = $(build_dir)/bin
b_man_dir = $(build_dir)/man/man1
cmds = $(patsubst $(src_dir)/%.zsh,%,$(wildcard $(src_dir)/*.zsh))
mans = $(patsubst Documentation/man1/%.rst,%.1,$(wildcard Documentation/man1/*.rst))
dirs =
dirs += $(b_bin_dir) $(i_bin_dir)
dirs += $(b_man_dir) $(i_man_dir)
## build dependencies
b_deps =
b_deps += $(b_bin_dir)
b_deps += $(b_man_dir)
b_deps += $(addprefix $(b_bin_dir)/,$(cmds))
b_deps += $(addprefix $(b_man_dir)/,$(mans))
## install dependencies
i_deps =
i_deps += $(i_bin_dir)
i_deps += $(i_man_dir)
i_deps += $(addprefix $(i_bin_dir)/,$(cmds))
i_deps += $(addprefix $(i_man_dir)/,$(mans))
# build man pages
$(b_man_dir)/%.1: Documentation/man1/%.rst
rst2man $< $@
# install man pages
$(i_man_dir)/%: $(b_man_dir)/%
$(install_data) $< $@
Nothing much new is going on here. We just extended
the directories we need to build/install with man page directories
read the manpage files into
variable the same way we do with commands.extend build/install dependencies with man pages
And finally add targets to build and install the man pages.
This makefile is limited to section 1 man pages but should be trivial to extend to more sections if needed.
Help output¶
To support foo -h
arguments the simplest solution is to exec man foo
Code Style¶
This section deals with techniques that could technically fall under code style but have functional effects. It will not deal with subjective things like indent length which has no effect on function.
Breaking long lines¶
Conditionals can be broken simply after the logical operators without the need for line ending escape:
foo ||
bar ||
Argument lists can be broken via a helper array:
cmd "${args[@]}"
This has the advantage that
you do not need the line ending escape again
you may experiment with different argument combination simply commenting lines out
easily extendable on conditionals or passthru options
Path Definitions¶
When writing path literals, get into the habit of not ending with trailing slash. Ever. When reading file paths, normalize them to not end with trailing slash as well.
echo $some_dir/qux
The main reason is that in some commands the trailing slash implies different semantics. This is the
case with rsync
, or some instances of cp
, and probably others.
Secondary reason is that not all paths are file paths and commands using them will not normalize
double slashes into single one. For example abstract unix socket path foo/bar
and foo//bar
refer to different sockets. URL paths should normalize double slash into single one but that is
handled by the HTTP server 1.
Adherence to “definitions are without trailing slash” makes this a non-issue as each call site may decide whether it needs to add a slash or not.
Imagine what are your options when having a path defined with trailing slash or possibly either way:
looks ok. But you have double slash problem.${some_dir}qux
forces braces. Lacks taste. And now you have to wonder “How exactly issome_dir
defined?” each time you want to use its value.
At this point we are mostly done with what can be achieved in POSIX shell or Bash and we got surprisingly far.
Unfortunately, many solutions are too clunky (as is the case of Debugging, Argument Parsing and more of which has not been discussed yet). It is necessary to pick up a more powerful shell to continue our quest for readable and succinct code. This quest continues at Zsh Scripting Guide 24.
Thanks to Roman Neuhauser
33 who I learned much from.
What exactly was the point of [ “x$var” = “xval” ]?
- 1
I even think the double slash is not permitted by the HTTP protocol. The normalization may be courtesy of the server implementors.
- 2
Other shells maybe just as fine or even better choice as Zsh but I am not familiar enough with other shells. Particularly is something quite different with interesting properties.
- 3
- 4
man 1 bash
man 1 zshoptions 5
- 6
man 2 fork 7
- 8
- 9
- 10
man 1 bash in
man 1 zshoptions 11
man 1 xargs 12
- 13
- 14
Also an aproach taken by
git --help
. Though git distinguishes short opt (-h
) and long opt (--help
) semantically.- 16
- 17
man 1 getopt 18
man 1 bash 19
THE ZSH/ZUTIL MODULE > zparseopts in
man 1 zshmodules 20
man 1 bash
man 1 zshbuiltins 21
man 7 signal 22
- 23
- 24
- 26
man 1 man 27
man 1 make 28
Recently I learned that shebang in the form
is also possible I do not know what additional assumptions (if any) this makes about the target system.- 29
man 1 bash in
man 1 zshmisc 30
Shell builtins in case of Zsh
- 31
- 32(1,2)
man 3 exec 33
- 34
EXPANSION > Pameter Expansion in
man 1 bash
- 35
man 1 zshcontrib 36
EXPANSION > Command Substitution in
man 1 bash