The Ultimate Guide to Argv

What is argv?

Argv is short for arguments vector. Vector is a fancy way of saying n-tuple.

When executing a program, e.g. from the terminal you may pass in the argv:

$ echo foo bar
  • echo is the program to execute.

  • (echo, foo, bar) is the argv given to the program.

  • (foo, bar) are referred to as arguments

argv can also be referred to as command line or cmdline.

How is argv useful?

argv is one of several mechanisms to provide input to the program to either provide data it should work on or configuration of the program modifying its standard behavior.

argv can also be used for process identification in tools like htop, ps, pgrep, or pkill.

argv is also available in the special proc(5) file system as /proc/[pid]/cmdline.

How is argv represented?

This depends on your Runtime System.

In python, argv is a list available via sys.argv. It is similar in other high level languages.

In C, argv is given to the program in its int main(int argc, char ** agrv) function. An integer parameter signifying the length of the tuple and an array of pointers to the individual strings of the tuple [1] [2].

At the assembler level. The argv is present on top of the stack [3] when the program is started by the kernel. It is there again as integer parameter signifying the length of the tuple, followed by the pointers to the individual character arrays of the tuple [4].

Argv[0]

The first element in the argv tuple contains the name of the program [1]. The argv[0] is defined by the parent process (execve(2)) and by convention it is the basename(1) of the executed file. If the file is referred to through a file system link, the argv[0] is determined by the link name.

The argv[0] can usually be ignored but you need to be aware of it. For example, if you want to dispatch the argv to another program you usually want to pass only the 1..n elements (instead of 0..n, omitting the 0th argument).

Sometimes it can also be useful. For example, BusyBox [5] implements functionality of several different programs and determines the program name from the argv[0] [6]. Thanks to this trick, it can be used as a drop-in replacement for these programs while also existing on the system as a single executable.

argv[0] can also be rewritten by the program itself typically to present itself in more useful manner but also to hide itself by appearing as something else [C17-5.1.2.2.1].

Interpreting argv

The semantics of the argv contents are defined by the program. However, there are some patterns and standards.

Conventions

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

1. SUSv4

Most software (on Linux) is mostly conforming to the [SUSv4-2018] or [GNU-Coding-Standards-4.8] standards. We will start by examining the [SUSv4-2018], specifically the Utility Conventions section [XBD-Utility-Conventions]. I took some liberties in the interpretation of the standard to enable this section to also serve as best current practice description for new programs.

You may refer only to the [12.2-Utility-Syntax-Guidelines] as your only guidelines but it is my hope that this document provides best current practices unconstrained by historical consideration of [SUSv4-2018] and it does so with more clarity while still describing practices that are no longer best but still somewhat current.

Note

The SUSv4 standard refers to programs as “utilities” [12].

  1. The arguments given to a program can be categorized into disjunct sets of options, operands, and option-arguments.

2. Options and Operands

  1. All arguments that can not be recognized as options or option-arguments, SHALL be recognized as operands.

Caution

This definition violates the [SUSv4-2018] [12.2.9] but it is the current best practice. It has to be included in this section because it is foundational to the rest of the document. This is on purpose to include options that follow operands. Also see 2. GNU Option Order.

Note

Operands are sometimes referred to as positional arguments and usually provide data.

  1. The arguments that consist of <hyphen-minus> characters and single letters or digits SHALL be recognized as options [12.1.1] [12.2.3] [12.2.4] unless they are preceded in the argv by 10. Options Terminator Operand.

Hint

These are also referred to as short options or shortopts. Long options will be discussed in the 2. GNU Coding Standards and further sections.

Caution

The definition of options is complemented by 6. Bundled Options and 1. Long Options.

Note

Options are sometimes also referred to as flags and usually modify the program behavior.

Example

Command line to execute program echo with argv (echo, -n) where -n is a short option.

$ echo -n

3. Option Arguments

  1. Options MAY require option-arguments [12.1.1]. Option-arguments are passed in as the argv element successive to the option name argument.

Example

$ xargs -I %

The -I is an option and % is its option-argument. argv = (xargs, -I, %).

  1. Options that require option-argument may also be referred to as argumented-options.

  2. Options that do not require an option-argument may also be referred to as non-argumented-options.

4. Mandatory Option Arguments

For historical reasons we need to distinguish option-arguments that are optional and that are mandatory.

  1. If an option accepts an option-argument, that option-argument SHALL be mandatory [12.2.7], i.e. not optional.

Example

$ xargs -I
xargs: option requires an argument -- 'I'
Try 'xargs --help' for more information.
[1]

Note

The [SUSv4-2018] recommends against optional option-arguments but ultimately permits them.

Question

Motivation not entirely clear. It may include:

  • implementation simplicity

  • not providing significant benefit

  • future portability reasons (different implementation choosing different defaults)

5. Bundled Option-Argument

  1. Bundled option-argument refers to option and its option-argument represented as single argv element with option-argument immediately following the option.

  2. Bundled option arguments SHALL not be recognized as options and not accepted as options unless for historical compatibility reasons [12.2.6] [12.1.2] [12.1.2.a].

Example

The bundled form below would be be equivalent to unbundled form xargs -I % if permitted.

$ xargs -I%

Caution

Not to be confused with 6. Bundled Options which is recommended.

Hint

Option-argument bundling is not permitted because it creates ambiguities with 6. Bundled Options and complicates parsing implementation for no significant benefit.

6. Bundled Options

Bundled options are one or more short options without option-arguments, followed by at most one option that takes an option-argument, grouped into single argv element behind one - delimiter [12.2.5] [12.2.14].

Bundled options SHOULD be recognized as options.

Example

Unbundled form of options:

$ echo -n -e

is semantically equivalent to bundled form of options:

$ echo -ne

Note

Improves user experience when using the program manually and often.

7. Option Order

  1. Option order SHOULD NOT be semantically significant [12.2.11] [12.1.3].

Example

grep -ri and grep -ir being equivalent.

Caution

The standard excepts 9. Mutually Exclusive Options from this requirement.

Note

Improves user experience when using the program manually and often.

Option order may be semantically significant.

Some programs domain and/or purpose requires semantical significance in order to function correctly.

Example

find(1) can express propositional logic in argv, e.g.:

$ `find '(' -name 'a' -or -name 'foo' ')' -and -not -type d`.

Note

find(1) is actually posix conforming because the -name, etc arguments are actually not options. They are operands.

  1. If a program chooses to treat option order as semantically significant, it MUST be documented in the OPTIONS section [12.1.3] [12.2.11].

8. Option Repetition

  1. Non-argumented-options may be repeated in the argv [12.1.3].

Note

The standard does not impose any requirements on program behavior “unless otherwise stated in the OPTIONS section” [12.1.3]. This likely refers to standard utilities described in other volumes of the standard.

  1. If a non-argumented-option is repeated in the argv and the program documentation does not explicitly specify this behavior, the program MUST terminate erroneously or accept the options as if they were not repeated.

    Note

    Repetition of non-argumented-options are sometimes used to e.g. increase verbosity levels. Example lspci(8):

    $ lspci -vv
    

    Note

    Accepting the options as if not repeated is likely the default behavior of argument parsers that do not consider this use case.

    Note

    Erroneous termination is strictly saner behavior if the argument parsing does consider the use case and does not assign any significant semantics to it.

    Note

    Mandating this behavior only if not otherwise specified by the program’s documentation allows for some niche use cases but it is probably advisable to consider other solutions, such as option with mandatory option argument before opting into this behavior.

  2. Argumented-options may be repeated [12.1.9].

    Hint

    These are sometimes also referred to as cumulative options or cumulative arguments.

    1. Interpretation of repeated argumented-options is determined on program-specific basis.

    2. If the repetition is accepted, the options should be interpreted in the order specified in the argv [12.2.11].

      Example

      $ sed -e 'script-1' -e 'script-2'
      $ rsync -av --exclude foo --exclude bar 1 2
      

    Question

    • Should repetition where some instance of the option overrides the other be permitted?

    • It may be useful in niche cases when composing the argv by allowing the successive options to override the preceding ones but it feels wrong.

    • It should probably also be consistent with resolving mutualy exclusive options which have a precedent in the [12.2.11].

    • Also see 9. Mutually Exclusive Options.

9. Mutually Exclusive Options

  1. Programs may interpret options as mutually exclusive [12.1.3] [12.2.11].

  2. Multiple mutually exclusive options may be accepted in single argv as long as such options are documented as mutually exclusive and are documented to override any incompatible options preceding it.

    Caution

    Single argv. Not single argv element.

    Note

    These options are exempted from the insignificant option order recommendation by the standard [12.2.11].

    Note

    When considering whether the guidelines should permit or not this behavior, it should be considered in the context of general option repetition. The standard does not seem to provide a guideline (ie. allows arbitrary repetition) on this topic except for this specific exception.

10. Options Terminator Operand

  1. The first -- operand should be accepted as a delimiter indicating end of options [12.2.10].

    Note

    This exists to distinguish operands that would otherwise be recognized as an option. Example argv = (/usr/bin/printf, --, --version):

    $ /usr/bin/printf -- --version
    --version
    

    vs:

    $ /usr/bin/printf --version
    printf (GNU coreutils) 8.32
    [...]
    

11. Standard Input/Output Operand

  1. The - operand may refer to standard input, standard output, or file named - [12.2.13].

    Question

    Motivation unclear except for being short. And occasional use case when utilities are composed where passing in the stdin/out the same way as file name may be convenient to implement.

2. GNU Coding Standards

GNU Coding Standards for Command Line Interfaces [GNU-Coding-Standards-4.8] mostly extend the [SUSv4-2018] standard but occasionally violate it [9].

1. Long Options

  1. Programs SHOULD also accept options in the form of long options. Also referred to as longopts. Longopts are signified by prefix of two <hyphen-minus> characters --.

    Example

    $ rsync --version
    
  2. Long options generally follow the same guidelines as short options as defined in 1. SUSv4 except for obvious incompatibilities such as option bundling.

  3. Programs should accept long options version of each short option in the hope of more user friendliness. E.g. rsync --verbose and rsync -v are equivalent

2. GNU Option Order

  1. Programs SHOULD violate [SUSv4-2018] to accept options regardless of their relative position to operands if possible.

    Hint

    Options following operands may be referred to as tail options [14].

  1. Operands used as file name arguments should be used for input files only. Output files should be specified using options -o or --output.

  2. GNU Coding Standards also contain a table of recommended long option names and their semantics: https://www.gnu.org/prep/standards/html_node/Option-Table.html#Option-Table.

    The table interestingly specifies --quiet and --silent as synonyms. There is at least one common software that uses these differently. Can’t remember which right now.

  3. There is also a list recommended short options in the [TaO-Command-Line-Options].

  1. Programs should support two standard options --version and --help.

CGI programs should accept these as command-line options as well as as PATH_INFO; for example http://example.org/p.cgi/–help` should output the same information as invoking p.cgi --help on the command line.

Note

Well this interesting. This probably should be disregarded from the best current practice. For one because CGI is practically non-existent nowadays. And for second, looks funky. But its only –version and –help. Idk.

De Facto Standards

A de facto standard is a custom or convention that has achieved a dominant position by public acceptance or market forces [DeFacto].

  1. Long options option-argument bundling is NOT RECOMMENDED.

    However, it is common to see in existing software. Usually with a = as separator.

    Example

    $ git --git-dir=foo
    

    Question

    • Why is this still a thing?

    • It seems like more work for no benefit compared to no bundling --git-dir foo.

    • At first I thought this is historical, possibly because shell scripts could just remove the prefix and eval the rest but no. This is still implemented by modern software.

  2. Options are often only boolean switches (aka flags). Normally, if unspecified, the option is off. When specified, the option is on. E.g.:

    $ grep -q
    
  3. If an option is by default on and when specified is off, it may be realized as long option with --no- prefix. E.g.:

    $ wget --no-verbose
    
  4. The -- operand is in addition to the meaning in 10. Options Terminator Operand also commonly used to signify end of operands meant for the program in the argv[0] position and start of an argv for another program to execute. This technique is known as Bernstein chaining [7].

Golang Standards

Golang kind of goes its own way. This style actually seems to originate from X toolkit [TaO-Command-Line-Options].

Historical Standards

All recommendations in this section SHOULD NOT be regarded as best current practice unless historical reasons are involved in which case this entire document is irrelevant.

First we will discuss patterns that are common and allowed by [SUSv4-2018].

  1. All options should precede the operands [12.2.9].

    Hint

    Disregard. See 2. Options and Operands and 2. GNU Option Order.

    Note

    This guidelines seems to be strictly adhered to in FreeBSD world and it is kind of annoying.

  2. Programs accepting option-arguments may accept multiple option-arguments bundled into a single argv. In that case, the option-arguments should be separated by comma , or <blank> [11] characters [12.2.8].

    Hint

    Cumulative options should be used for this purpose.

    This bundled approach may be chosen due to compositional synergy with other utilities. But in that case it should be considered whether the other tools may be modified to also be synergistic with the cumulative approach.

  3. Optional option-arguments are permitted [12.1.2] [12.1.7].

    Example

    xargs -i and xargs -i{} being equivalent as per xargs(1).

    Hint

    Optional option-arguments are not recommended. See 4. Mandatory Option Arguments.

  4. Bundled option-arguments may be accepted [12.1.2] [12.1.2.a] [12.1.2.b].

    Hint

    Mostly only of historical significance. See 5. Bundled Option-Argument.

  5. Optional option-arguments MUST be option-argument bundled. [12.1.2.b]:

    $ xargs -i{}
    

    Hint

    This is required to distinguish option-arguments from operands. However, optional option-arguments are not recommended in the first place. See 4. Mandatory Option Arguments.

Now, we will look at some other practices that are relatively rare by now.

  1. Flags (as in boolean switch options) were sometimes signified by prefixes - and + to turn an option on and off respectively.

    Example

    $ setopt -x
    $ setopt +x
    

    Note

    This style seems to originate in the [X-Toolkit-Style]

  2. Flags may also be realized by switching the letter case with e.g. -d flag being on and -D flag being off [10].

    Note

    This seems to originate from UNIX style on a ASR-33 teletypes [TaO-Command-Line-Options]:

    https://upload.wikimedia.org/wikipedia/commons/thumb/3/33/Teletype-IMG_7287.jpg/450px-Teletype-IMG_7287.jpg
  3. Some programs may also accept short opts without the - signifier. Example:

    $ ps eof
    

    Note

    The ps(1) indicates this style originates from some kind of BSD. Why is this distinct from UNIX eludes me.

Thanks, I Hate This

The [SUSv4-2018] seems well formed and relatively straightforward but it is not an easy read. It is actually pretty confusing at several places.

The important thing to realize when reading [SUSv4-2018] is that [12.1.2.a] refers to “standard utility” [13] which is quite easy to miss. Actually the entire [12.1.2] and related guidelines require careful deconstruction.

Thanks, I hate it and I hope to never see [XBD-Utility-Conventions] again.

Backlog

This document still suffers from:

  • missing guidelines on some edge cases

  • inconsistent phrasing

  • inconsistent use of admonitions

  • some terms may be over-specified

  • some terms may be under-specified

  • missing section on conformity status of existing argument parsing solutions

  • The 1. SUSv4 section mentions relatively lot best current practices that violate it. May be the structuring into sections by standards should be abandoned by now.

  • I know this document violates rfc2119 section 6 and I don’t care. I find it useful. This is not a an internet standard. It’s not even a standard. It’s an

    old man yelling at cloud

Any contradictions, ambiguities, typographical issues, inconsistencies, or complaints shall be directed to /dev/null jan@matejka.ninja

References

[12.1.1] (1,2)

Section 12.1 paragraph 1 in SUSv4-2018.

[12.1.2] (1,2,3,4)

Section 12.1 paragraph 2 in SUSv4-2018.

[12.1.2.a] (1,2,3)

Section 12.1 paragraph 2.a in SUSv4-2018.

[12.1.2.b] (1,2)

Section 12.1 paragraph 2.b in SUSv4-2018.

[12.1.3] (1,2,3,4,5)

Section 12.1 paragraph 3 in SUSv4-2018.

[12.1.7]

Section 12.1 paragraph 6 in SUSv4-2018.

[12.1.9]

Section 12.1 paragraph 9 in SUSv4-2018.

[12.2-Utility-Syntax-Guidelines]

Section 12.2 in SUSv4-2018.

[12.2.3]

Section 12.2 Guideline 3 in SUSv4-2018.

[12.2.4]

Section 12.2 Guideline 4 in SUSv4-2018.

[12.2.5]

Section 12.2 Guideline 4 in SUSv4-2018.

[12.2.6]

Section 12.2 Guideline 6 in SUSv4-2018.

[12.2.7]

Section 12.2 Guideline 7 in SUSv4-2018.

[12.2.8]

Section 12.2 Guideline 8 in SUSv4-2018.

[12.2.9] (1,2)

Section 12.2 Guideline 9 in SUSv4-2018.

[12.2.10]

Section 12.2 Guideline 10 in SUSv4-2018.

[12.2.11] (1,2,3,4,5,6)

Section 12.2 Guideline 11 in SUSv4-2018.

[12.2.13]

Section 12.2 Guideline 13 in SUSv4-2018.

[12.2.14]

Section 12.2 Guideline 14 in SUSv4-2018.

[C17]

ISO/IEC9899:2017

[C17-5.1.2.2.1]

[C17] $ 5.1.2.2.1

[DeFacto]
[SUSv4-2018] (1,2,3,4,5,6,7,8,9,10)
[XBD-Utility-Conventions] (1,2)