Controlling access to Nomad clusters

In this blog post, we will learn how to control access to nomad.

Introduction

Nomad is an application scheduler, that helps you schedule application-processes efficiently, across multiple servers, and keep your infrastructure costs low. Nomad is capable of scheduling containers, virtual machines, as well as isolated forked processes.

There are other schedulers available, such as Kubernetes, Mesos or Docker Swarm, but each has different mechanisms for securing access. By following this post, you will understand the main components in securing your Nomad cluster, but the overall idea is valid across any of the other schedulers available.

One of Nomad's selling points, and why you could consider it over tools like Kubernetes, is that you can schedule not only containers, but also QEMU images, LXC, isolated fork/exec processes, and even Java applications in a chroot(!). All you need is a driver implemented for Nomad. On the other hand, its community is smaller than Kubernetes, so the tradeoffs have to be measured on a project-by-project basis.

We will start by deploying a test cluster and configuring access control lists (ACLs).

Overview

Nomad uses tokens to authenticate client requests.
Each token is associated with policies.
Policies are a collection of rules to allow or deny operations on resources.

In this tutorial, we will:

Setup our environment to run nomad inside a Vagrant virtual machine for running experiments
We generate a root/admin token (usually known as the "management" token) and activate ACLs
Using the management token, we add a new "non-admin" policy and create a token associated with this new policy
Use the "non-admin" token to demonstrate access control.

Setup the environment

Pre-requisites:

POSIX shell, such as GNU Bash
Vagrant > 2.0.1
Nomad demo Vagrantfile

We will run everything from within a virtual machine with all the necessary configuration and applications. Execute the following commands on your shell:

$ cd $(mktemp --directory)
$ curl -LO https://raw.githubusercontent.com/hashicorp/nomad/master/demo/vagrant/Vagrantfile
$ vagrant up
    ...
    lines and lines of Vagrant output
    this might take a while
    ...
$ vagrant ssh
    ...
    Message of the day greeting from VM
    Anything after this point is being executed inside the virtual machine
    ...
vagrant@nomad:~$ nomad version
Nomad vX.X.X
vagrant@nomad:~$ uname -n
nomad

Depending on your system and the version of Vagrantfile used, the prompt may be different.

Setup Nomad

We configure nomad to execute both as server and client for convenience, as opposed to a production environment where the server is remote and client is local to each machine or node. Create a nomad-agent.conf with the following contents:

bind_addr = "0.0.0.0"
data_dir = "/var/lib/nomad"
region = "global"
acl {
  enabled = true
}
server {
  enabled              = true
  bootstrap_expect     = 1
  authoritative_region = "global"
}
client {
  enabled = true
}

Then, execute:

vagrant@nomad:~$ sudo nomad agent -config=nomad-agent.conf # sudo is needed to run as a client

You should see output indicating that Nomad is running.

Clients need root access to be able to execute processes, while servers only communicate to synchronize state.

ACL Bootstrap

On another terminal, after running vagrant ssh from our temporary working directory, run the following command:

vagrant@nomad:~$ nomad acl bootstrap

Accessor ID  = 2f34299b-0403-074d-83e2-60511341a54c
Secret ID    = 9fff6a06-b991-22db-7fed-55f17918e846
Name         = Bootstrap Token
Type         = management
Global       = true
Policies     = n/a
Create Time  = 2018-02-14 19:09:23.424119008 +0000 UTC
Create Index = 13
Modify Index = 13

This Secret ID is our management (admin) token. This token is valid globally and all operations are permitted. No policies are necessary while authenticating with the management token, and so, none are configured by default.

It is important to copy the Accessor ID and Secret ID to some file, for safekeeping, as we will need these values later. For a production environment, it is safest to store these in a separate vault permanently.

Once ACLs are on, all operations are denied unless a valid token is provided with each request, and the operation we want is allowed by a policy associated with the provided token.

vagrant@nomad:~$ nomad node-status
Error querying node status: Unexpected response code: 403 (Permission denied)

vagrant@nomad:~$ export NOMAD_TOKEN='9fff6a06-b991-22db-7fed-55f17918e846' # Secret ID, above
vagrant@nomad:~$ nomad node-status

ID        DC   Name   Class   Drain  Status
1f638a17  dc1  nomad  <none>  false  ready

Designing policies

Policies are a collection of (ideally, non-overlapping) roles, that provide access to different operations. The table below shows typical users of a Nomad cluster.

Role	Namespace	Agent	Node	Remarks
Anonymous	`deny`	`deny`	`deny`	Unnecessary, as token-less requests are denied all operations.
Developer	`write`	`deny`	`read`	Developers are permitted to debug their applications, but not to perform cluster management
Logger	`list-jobs`, `read-logs`	`deny`	`read`	Automated log aggregators or analyzers that need read access to logs
Job requester	`submit-job`	`deny`	`deny`	CI systems create new jobs, but don't interact with running jobs.
Infrastructure	`read`	`write`	`write`	DevOps teams perform cluster management but seldom need to interact with running jobs.

For namespace access, read is equivalent to [read-job, list-jobs]. write is equivalent to [list-jobs, read-job, submit-job, read-logs, read-fs, dispatch-job].

In the event that operators do need to have access to namespaces, one can always create a token that has both Developer and Infrastructure policies attached. This is equivalent to having a management token.

We have left out multi-region and multi-namespace setups here. We have assumed everything to be running under the default namespace. It should be noted that on production deployments, with much larger needs, the policies could be designed per-namespace, and tracked between regions.

Policy specification

Policies are expressed by a combination of rules Note that the deny rule will preside over any conflicting capability.

Nomad accepts a JSON payload with the name and description of a policy, along with a quoted JSON or HCL document with rules, like the following.

{
  "Description": "Agent and node management",
  "Name": "infrastructure",
  "Rules": "{\"agent\":{\"policy\":\"write\"},\"node\":{\"policy\":\"write\"}}"
}

This policy matches what we have in the table above. Create an infrastructure.json with the content above for use in the next step.

TIP:

To avoid error-prone quoting, one could write the policies in YAML:
Name: infrastructure
Description: Agent and node management
Rules:
  agent:
    policy: write
  node:
    policy: write
And then, convert them to JSON with the necessary quoting, by:
$ yaml2json < infrastructure.yaml | jq '.Rules = (.Rules | @text)' > infrastructure.json

Adding a policy

To add the policy, simply make an HTTP POST request to the server. The NOMAD_TOKEN below is the "management" token that we first created.

vagrant@nomad:~$ curl \
    --request POST \
    --data @infrastructure.json \
    --header "X-Nomad-Token: ${NOMAD_TOKEN}" \
    https://127.0.0.1:4646/v1/acl/policy/infrastructure

vagrant@nomad:~$ nomad acl policy list
Name            Description
infrastructure  Agent and node management

vagrant@nomad:~$ nomad acl policy info infrastructure
Name        = infrastructure
Description = Agent and node management
Rules       = {"agent":{"policy":"write"},"node":{"policy":"write"}}
CreateIndex = 425
ModifyIndex = 425

Creating a token for a policy

We now create a token for the infrastructure policy, and attempt a few operations with it:

vagrant@nomad:~$ nomad acl token create \
    -name='devops-team' \
    -type='client' \
    -global='true' \
    -policy='infrastructure'

Accessor ID  = 927ea7a4-e689-037f-be89-54a2cdbd338c
Secret ID    = 26832c8d-9315-c1ef-aabf-2058c8632da8
Name         = devops-team
Type         = client
Global       = true
Policies     = [infrastructure]
Create Time  = 2018-02-15 19:53:59.97900843 +0000 UTC
Create Index = 432
Modify Index = 432

vagrant@nomad:~$ export NOMAD_TOKEN='26832c8d-9315-c1ef-aabf-2058c8632da8' # change the token to the new one with the "infrastructure" policy attached
vagrant@nomad:~$ nomad status
Error querying jobs: Unexpected response code: 403 (Permission denied)

vagrant@nomad:~$ nomad node-status
ID        DC   Name   Class   Drain  Status
1f638a17  dc1  nomad  <none>  false  ready

As you can see, anyone with the devops-team token will be allowed to run operations on nodes, but not on jobs -- i.e. on namespace resources.

Where to go next

The example above demonstrates adding one of the policies from our list at the beginning. Adding the rest of them and trying different commands could be a good exercise.

As a reference, the FP Complete team maintains a repository with policies ready for use.

Subscribe to our blog via email
Email subscriptions come from our Atom feed and are handled by Blogtrottr. You will only receive notifications of blog posts, and can unsubscribe any time.

Do you like this blog post and need help with Next Generation Software Engineering, Platform Engineering or Blockchain & Smart Contracts? Contact us.