There are many ways to make programs that use settings to
customise their behavior. In this post, we provide an overview of
these methods and some best practices.
Different approaches to passing settings
Settings as global state versus passing settings as an
argument
The first distinction to make is between passing settings as an
argument to the operating part of your program, or to make settings
part of the global state that is available to the entire
program.
In pseudocode, the difference looks like this:
% Passing settings as an argument
main () {
settings =: getSettings(getArgs())
myMain(settings)
}
myMain (settings) {
if (settings.shouldIDoSomething) {
doSomething()
}
}
versus:
% Settings in the global state
global settings =: getSettings(getArgs())
main () {
myMain()
}
myMain () {
if (settings.shouldIDoSomething) {
doSomething()
}
}
The Commons
CLI (Java) and Optparse
Applicative libraries are examples of the former. The gflags (C++) library is an
example of the latter.
The advantage of using settings as global state is that any part
of your program has access to them. The disadvantage of passing
settings as arguments is that you may have to refactor your
program, should you wish to add some customization, to give the
appropriate part access to the settings.
The disadvantages of using settings as global state are
numerous:
- The size of the relevant state is increased globally as you
make more settings that can be configured.
- This is not testable without setting the global variables
before running a test.
- You cannot run the same program twice with different arguments
in an automated fashion without setting global variables in between
the runs.
- The settings become available to all parts of your program,
even the parts that should be parametric in the settings.
Mutable versus immutable settings
A second distinction is between allowing or disallowing the
mutation of settings after building them. If mutating settings is
not allowed, we call the settings immutable.
In pseudo code, the question is whether this should be
allowed:
settings.poolSize += 1
The Commons
CLI (Java) and
Optparse applicative (Haskell) are examples of libraries that
treat settings as immutable objects. On the other hand, the
optparse (Python)
library is an example of a library that provides mutable
settings.
Why are mutable settings a bad idea?
- You cannot assume that settings do not change throughout
execution.
- If settings are a mutable resource, they have to be locked to
prevent race conditions.
Purely functional versus impure argument parsing
The next distinction is describes whether the argument parsing
operates on a list of strings, or gathers the given program
arguments from global state.
% Parsing given arguments:
settings =: parseArgs(getArgs())
versus:
% Letting the argument parsing get the arguments from global state:
settings =: parseArgs()
parseArgs () {
args = getArgs()
[...]
}
Why is impure parsing a bad idea?
- You can never assume that the parser does not access any global
state like the environment variables
- Testing becomes harder because you have to set the program
arguments from within the test instead of just passing a list of
strings to the parser.
- Because settings are a global resource, this means parsing
cannot be concurrent (also relevant for testing).
Passing settings as-is versus pre-processing settings
Command-line arguments are usually not the only way a user would
want to customize the behaviour of your program. A user may want
also want to use the process environment and configuration files.
In this case, the actual settings that a program will use will
depend on multiple pieces of information.
The difference here, in pseudo code, looks as follows:
% Pre-processing argumnets
arguments =: parseArgs(getArgs())
settings =: gatherSettings(arguments)
myMain(settings)
gatherSettings (arguments) {
s =: settings.new()
environment =: getEnvironment
s.doSomething =: arguments.doSomething
‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑|| environment.get("DO_SOMETHING")
return s
}
myMain () {
if (settings.doSomething) {
doSomething()
}
}
versus
% Using arguments as-is
arguments =: parseArgs(getArgs())
myMain(arguments)
myMain () {
environment =: getEnvironment
if (arguments.doSomething || environment.get("DO_SOMETHING")) {
doSomething()
}
}
Why is 'passing settings as-is' a bad idea:
- Either no flexibility in conditional settings, or pollution of
supposed-to-be irrelevant settings.
- No separation of concerns between the 'deciding what the
settings should be' and 'using the settings'.
Standardised meaning of some words:
Because the naming of some relevant terms can be confusing, here
are some proposed standard definitions:
- Real constant: A fixed constant in the program that is
universal accross programs e.g. 'decimalBase = 10',
'multiplicativeIdentityForNumbers = 1'
- Configuration Constant: A fixed constant in the program that
dictates functionality e.g. approximationIterations = 6
- Program name: The name of the executable being called. This may
be relevant to functionality. e.g. 'git'
- Command-line Arguments: Anything passed on the command-line as
the list of strings
- Command: The specific action indication passed as the arguments
e.g. find and its specific arguments and options like the query
Note that not every program (needs to) use commands.
- Options: Any optional argument, they mostly start with -- or -
and are followed by the argument. e.g. --message='I made a git
commit, yay!'
- Flags: Usually only binary options, but could also be any
option or even everything except the command; The argument values
that are comon to all commands and/or relevant in further option
parsing e.g. --verbose
- Environment variable: A single variable in the environment that
is available to a process. e.g. DATABASE_SECRET
- Environment: The mapping of environment variables e.g.
[PORT=8000, DATABASE_SECRET=hunter2]
- Configuration: The total of all file system state that
configures your program: mostly files e.g. A file config.yaml, its
existence, and its contents: `exclude-extensions: .hi'
- Settings: The values that the program actually uses to decide
what it will do. In certain contextst, this can also mean: The
non-action-specific settings. I.e. global settings e.g. a boolean
representing --verbose
- Dispatch: The description of the chosen action and
action-specific settings e.g. a value that represents the intention
to run the 'find' part of the program and all the relevant
action-specific settings
General tips:
General
Ideally, anything configurable should be configurable in the
configuration file, the environment variables and command-line
options. This allows users to choose the way they configure the
program.
Command-line options should override the environment variables,
and they should override the config files. The reasoning is that
the ease of overriding should be proportional to its ephemerality
such that settings are always chosen on purpose.
Make all data involved in the optparse process printable. (i.e.
do not store functions instead of data) This ensures that you can
write property tests for anything involving that data.
Constants
Wherever possible, use real constants defined by a library
instead of defining them yourself. e.g. SECONDS_IN_AN_HOUR This
turns the library into a single source of truth.
Do not define constants as constants if it's not really a
constant. You probably want to be able to configure those. e.g.
NB_DB_CONNECTIONS
Conversely: Do not make real constants configurable. e.g. Do not
make --decimal-base=INT# and option You will save yourself a world
of headaches.
Leave magic numbers if they're part of
a formula and you would just refer back to the formula e.g.
discriminant = b ^ 2 - 4 * a * c instead of D = b ^
EXPONENT_OF_B_IN_DISCRIMINANT_FORMULA -
FACTOR_OF_SECOND_TERM_IN_DISCRIMINANT_FORMULA * a * c.
Arguments and Options
Use kebab-case for option names. It integrates well with the
dashes in front of them.
Use the standard format for arguments:
- Use a single dash - for short (one character options).
- Use a double dash -- for long options. Use kebab case names
that look-like-this for long options.
- Do not use a single dash for long options. E.g. -force instead
of --force or -f.
Do not use - in front of commands. I.e my-grep find instead of
my-grep --find (GPG famously does this wrong.) There are exactly
two exceptions to this rule: --help and --version. In a perfect
world, we would have my-grep help instead of my-grep --help, but
these two have become such standard practice that they cannot be
ignored. Going against this convention will only cause
headaches.
Do not make arguments that look like options required. I.e.
greet hello --name Richard The - in front of an option is a great
way to distinguish between optional and required arguments.
Do not use short flags if they're not obvious. I.e.: -f for
--force, but not -l for --files-with-matches (actual example from
grep) Short flags are annoying enough to use as-is, their mnemonic
should at least make sense.
Environment variables
Use UPPER_CASE names for your environment variables. Some
programmers even think that you cannot use lower case variables in
environment variables. Let us use this assumption to prevent
headaches.
Because the environment has just one global namespace, you
should prefix your environment variables with the name of your
program: LD_LIBRARY_PATH. This way there can never be confusion as
to which program the variable is for.
Configuration
Make sure config files are human-readable. A binary config file
is not a config file, it is a data file. Config files are made for
humans to edit, so make them readable for humans.
Make sure config files are modular. Sharing parts of your config
can be a great way to reduce the total amount of configuration that
a user has to manage.
Put config files in a considerate place.
~/.config/my-program.cfg instead of ~/.my-programrc.cfg There are
dedicated libraries in most languages that will help you to
decide.
Make the location of your config file override-able with a flag
(i.e. --config-file) A user should not have to replace a file to
change the configuration. Instead, they should be able to choose a
different config file on a granular basis.
Consider looking for configuration files in more than one
(sensible) location. This can be great for the user experience. See
stack that looks recursively upwards, so that a user does not have
to think about where they run the command.
Stick with standard configuration formats: YAML, JSON, INI.
Refrain from inventing your own format. This will make third party
tooling a lot easier to build.
If you liked this post you may also like:
Subscribe to our blog via email
Email subscriptions come from our Atom feed and are handled by Blogtrottr. You will only receive notifications of blog posts, and can unsubscribe any time.
Do you like this blog post and need help with Next Generation Software Engineering, Platform Engineering or Blockchain & Smart Contracts? Contact us.