There are many ways to make programs that use settings to customise their behavior. In this post, we provide an overview of these methods and some best practices.

Different approaches to passing settings

Settings as global state versus passing settings as an argument

The first distinction to make is between passing settings as an argument to the operating part of your program, or to make settings part of the global state that is available to the entire program.

In pseudocode, the difference looks like this:

% Passing settings as an argument
main () {
 settings =: getSettings(getArgs())
 myMain(settings)
}

myMain (settings) {
 if (settings.shouldIDoSomething) {
    doSomething()
  }
}

versus:

% Settings in the global state
global settings =: getSettings(getArgs())

main () {
  myMain()
}

myMain () {
  if (settings.shouldIDoSomething) {
    doSomething()
  }
}

The Commons CLI (Java) and Optparse Applicative libraries are examples of the former. The gflags (C++) library is an example of the latter.

The advantage of using settings as global state is that any part of your program has access to them. The disadvantage of passing settings as arguments is that you may have to refactor your program, should you wish to add some customization, to give the appropriate part access to the settings.

The disadvantages of using settings as global state are numerous:

Mutable versus immutable settings

A second distinction is between allowing or disallowing the mutation of settings after building them. If mutating settings is not allowed, we call the settings immutable.

In pseudo code, the question is whether this should be allowed:

settings.poolSize += 1

The Commons CLI (Java) and Optparse applicative (Haskell) are examples of libraries that treat settings as immutable objects. On the other hand, the optparse (Python) library is an example of a library that provides mutable settings.

Why are mutable settings a bad idea?

Immutable Object -Small.jpg

Purely functional versus impure argument parsing

The next distinction is describes whether the argument parsing operates on a list of strings, or gathers the given program arguments from global state.

% Parsing given arguments:
settings =: parseArgs(getArgs())

versus:

% Letting the argument parsing get the arguments from global state:
settings =: parseArgs()

parseArgs () {
  args = getArgs()
  [...]
}

Why is impure parsing a bad idea?

Passing settings as-is versus pre-processing settings

Command-line arguments are usually not the only way a user would want to customize the behaviour of your program. A user may want also want to use the process environment and configuration files. In this case, the actual settings that a program will use will depend on multiple pieces of information.

The difference here, in pseudo code, looks as follows:

% Pre-processing argumnets
arguments =: parseArgs(getArgs())
settings =: gatherSettings(arguments)
myMain(settings)

gatherSettings (arguments) {
  s =: settings.new()
  environment =: getEnvironment
  s.doSomething =: arguments.doSomething
 ‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑|| environment.get("DO_SOMETHING")
  return s
}

myMain () {
  if (settings.doSomething) {
    doSomething()
  }
}

versus

% Using arguments as-is

arguments =: parseArgs(getArgs())
myMain(arguments)

myMain () {
  environment =: getEnvironment
  if (arguments.doSomething || environment.get("DO_SOMETHING")) {
    doSomething()
  }
}

Why is 'passing settings as-is' a bad idea:

Standardised meaning of some words:

Because the naming of some relevant terms can be confusing, here are some proposed standard definitions:

General tips:

General

Ideally, anything configurable should be configurable in the configuration file, the environment variables and command-line options. This allows users to choose the way they configure the program.

Command-line options should override the environment variables, and they should override the config files. The reasoning is that the ease of overriding should be proportional to its ephemerality such that settings are always chosen on purpose.

Make all data involved in the optparse process printable. (i.e. do not store functions instead of data) This ensures that you can write property tests for anything involving that data.

Constants

Wherever possible, use real constants defined by a library instead of defining them yourself. e.g. SECONDS_IN_AN_HOUR This turns the library into a single source of truth.

Do not define constants as constants if it's not really a constant. You probably want to be able to configure those. e.g. NB_DB_CONNECTIONS

Conversely: Do not make real constants configurable. e.g. Do not make --decimal-base=INT# and option You will save yourself a world of headaches.

Leave magic numbers if they're part of a formula and you would just refer back to the formula e.g. discriminant = b ^ 2 - 4 * a * c instead of D = b ^ EXPONENT_OF_B_IN_DISCRIMINANT_FORMULA - FACTOR_OF_SECOND_TERM_IN_DISCRIMINANT_FORMULA * a * c.

Arguments and Options

Use kebab-case for option names. It integrates well with the dashes in front of them.

Use the standard format for arguments:

Do not use - in front of commands. I.e my-grep find instead of my-grep --find (GPG famously does this wrong.) There are exactly two exceptions to this rule: --help and --version. In a perfect world, we would have my-grep help instead of my-grep --help, but these two have become such standard practice that they cannot be ignored. Going against this convention will only cause headaches.

Do not make arguments that look like options required. I.e. greet hello --name Richard The - in front of an option is a great way to distinguish between optional and required arguments.

Do not use short flags if they're not obvious. I.e.: -f for --force, but not -l for --files-with-matches (actual example from grep) Short flags are annoying enough to use as-is, their mnemonic should at least make sense.

Environment variables

Use UPPER_CASE names for your environment variables. Some programmers even think that you cannot use lower case variables in environment variables. Let us use this assumption to prevent headaches.

Because the environment has just one global namespace, you should prefix your environment variables with the name of your program: LD_LIBRARY_PATH. This way there can never be confusion as to which program the variable is for.

Configuration

Make sure config files are human-readable. A binary config file is not a config file, it is a data file. Config files are made for humans to edit, so make them readable for humans.

Make sure config files are modular. Sharing parts of your config can be a great way to reduce the total amount of configuration that a user has to manage.

Put config files in a considerate place. ~/.config/my-program.cfg instead of ~/.my-programrc.cfg There are dedicated libraries in most languages that will help you to decide.

Make the location of your config file override-able with a flag (i.e. --config-file) A user should not have to replace a file to change the configuration. Instead, they should be able to choose a different config file on a granular basis.

Consider looking for configuration files in more than one (sensible) location. This can be great for the user experience. See stack that looks recursively upwards, so that a user does not have to think about where they run the command.

Stick with standard configuration formats: YAML, JSON, INI. Refrain from inventing your own format. This will make third party tooling a lot easier to build.

If you liked this post you may also like:

Subscribe to our blog via email
Email subscriptions come from our Atom feed and are handled by Blogtrottr. You will only receive notifications of blog posts, and can unsubscribe any time.

Do you like this blog post and need help with Next Generation Software Engineering, Platform Engineering or Blockchain & Smart Contracts? Contact us.