TL;DR: if you just want to get started use stack's Docker support, see the Docker page on the stack wiki. The rest of this post gives background on the benefits, implementation, and reasons for our choices.

A brief history

Using LXC for containerization is an integral component of the FP Complete Haskell Center and School of Haskell, so lightweight virtualization was not new to us. We started tentative experiments using Docker for command-line development about a year ago and it quickly became an indispensable part of our development tool chain. We soon wrote a wrapper script that did user ID mapping and volume mounting so that developers could just prefix their usual cabal or build script commands with the wrapper and have them automagically run in a container without developers needing to adjust their usual workflow for Docker. The wrapper's functionality was integrated into an internal build tool and formed the core of its sandboxing approach. Then that internal build tool became stack which got its own non-Docker based sandboxing approach. But the basic core of that original wrapper script is still available, and there are significant benefits to using stack's Docker integration for teams.

Benefits

The primary pain point we are solving with our use of Docker is ensuring that all developers are using a consistent environment for building and testing code.

Before Docker, our approach involved having developers all run the same Linux distribution version, install the same additional OS packages, and use hsenv sandboxes (and, as they stabilized, Cabal sandboxes) for Haskell package sandboxing. However, this proved deficient in several ways:

In the process of solving the main problems, there were some additional goals:

Approach

When Docker is enabled in stack.yaml, every invocation of stack (with the exception of certain sub-commands) transparently re-invokes itself in an ephemeral Docker container which has the project root directory and the stack home (~/.stack) bind-mounted. The container exists only to provide the environment in which the build runs, nothing is actually written to the container's file-system (any writes happen in the bind-mounted directories) and it the container is destroyed immediately after stack exits (using docker run --rm). This means upgrading to a new image is easy, since it's just a matter of creating ephemeral containers from the new image. The directories are bind-mounted to the same file-system location in the container, which makes it possible to switch between using docker and not and still have everything work.

Docker runs processes in containers as root by default, which would result in files all over our project and stack home being owned by root when they should be owned by the host OS user. There is the docker run --user option to specify a different user ID to run the process as, but it works best if that user already exists in the Docker image. In this case, we don't know the user ID of the developer at image creation. We work around that by using docker run --env to pass in the host user's UID and GID, and adding an ENTRYPOINT which, inside the container, creates the user and then uses sudo -u to run the build command as that user.

In addition, stack and the entrypoint:

Images

For each GHC version + Stackage LTS snapshot combination, we tag several images (which layer on top of each other):

Most of a developer's work is done using a build or full image, and they can test using a run image. The actual production environment for a server can be built on run.

In addition, there are variants of these images that include GHCJS (ghcjs-build and ghcjs-full), plus additional private variants for internal and clients' use that include proprietary extensions.

We create and push these images using a Shake script on the host, and Propellor in the container to control what is in the image. This provides far more flexibility than basic Dockerfiles, and is why we can easily mix-and-match and patch images. Our image build process also allows us to provide custom images for clients. These might include additional tools or proprietary libraries specific to a customer's requirements. We intend to open the image build tool, but it currently contains proprietary information and needs to be refactored before we can extract that and open the rest.

Challenges

Nothing is perfect, and we have run into some challenges with Docker:

Alternative approaches

There are many other ways to use Docker, but we didn't find that the "obvious" ones met our goals.

The Official Haskell image (which didn't exist when we started using Docker) approach of iteratively developing using docker build and Dockerfile has some disadvantages:

The Vagrant-style approach of having a persistent container with the project directory bind-mounted into it, while much better, has other disadvantages:

Future

There are plenty of directions to take Docker support in stack as the container ecosystem evolves. There is work-in-progress to have stack create new Docker images containing executables automatically, and this works even if you perform the builds without Docker. Moving toward more general opencontainers.org support is another direction we are considering. A better solution to using containers on non-Linux operating system is desirable. As stack's support for editor integration via ide-backend improves, this will apply equally well to Docker use.

Subscribe to our blog via email
Email subscriptions come from our Atom feed and are handled by Blogtrottr. You will only receive notifications of blog posts, and can unsubscribe any time.

Do you like this blog post and need help with Next Generation Software Engineering, Platform Engineering or Blockchain & Smart Contracts? Contact us.