The most common pattern for using Docker to build and deploy
software in an image uses a single Dockerfile to build the software
and produce the image that gets deployed. The basic pattern
goes:
FROM base-image
RUN install-some-extra-build-tools
COPY . /build-directory
RUN /build-directory/build-my-software
CMD /run/my/software
This works, but you end up with a great deal of unncessary cruft
in the image that gets deployed. Does your software need its own
source code and the tools used to build itself in order to run?
Unless you're using an interpreted language like Ruby or Python, it
probably doesn't, so why does it have to be in the deployed image?
Disadvantages of this approach include:
-
Compilers are often huge (likely to be much bigger than your own
software), which means the majority of the deployed image's
contents are not used. That's a lot of pointless overhead.
-
Since your source code and vendor libraries are in the image, a
security hole in your software could leak proprietary
information.
-
Every extra component introduces a potential attack vector.
Instead, we should separate concerns: use one Dockerfile to
create a build environment and use that to build our software, and
another to create the deployed runtime image using the artifacts
generated by the build. What follows is a simple example that using
one-line "Hello, world" program written in Haskell (our preferred
language, of course, but also illustrative since the compiler is
not generally considered small). The full example is available
on
Github.
The conventional approach
What do we need to build this program? Let's just do the
obvious: use the official haskell image. We'll start
with the "conventional" approach, and work toward something better.
Here's the
Dockerfile:
FROM haskell:7.10.2
# [insert additional build and runtime requirements here]
RUN mkdir /artifacts
COPY src /src/
RUN ghc -o /artifacts/hello /src/Main.hs
CMD /artifacts/hello
We build the image using docker build -t haskell-hello
.
, and run it:
$ docker run --rm haskell-hello
Hello, world
Great, all done and ready to deploy! So how big is the
image?
$ docker inspect -f '' haskell-hello
715052740
It's ~700 MB, just to run a "Hello, world" program! There must
be a better way.
The split-image approach
What do we need in the image to actually run this tiny program?
Not very much at all; just a minimal Linux system with the libgmp
shared library (which all programs compiled with GHC need unless
special options are used). Conveniently, there is the ~4 MB
haskell-scratch
image for that (see our Haskell Web Server in a
5MB Docker Image blog post, but note that it's too minimal for
most real-world Haskell programs and suggest using something like
ubuntu-with-libgmp
instead). Here's the runtime image's
Dockerfile (in the run/
subdirectory) to create
the runtime image that we'll deploy:
FROM fpco/haskell-scratch:integer-gmp
# [insert additional runtime requirements here]
COPY artifacts /artifacts/
CMD /artifacts/hello
Where does the contents of the artifacts
directory
come from? That's the job of the build image's
Dockerfile (in the build/
subdirectory):
FROM haskell:7.10.2
# [insert additional build requirements here]
VOLUME /artifacts
VOLUME /src
CMD ghc -o /artifacts/hello /src/Main.hs
This uses the same official Haskell image and compilation
command as our original Dockerfile, but it uses VOLUME mounts and
the CMD
instruction instead. That means the source
code is compiled when you docker run
the image,
not when you docker build
it. That, in turn,
allows us to use VOLUME mounts (which cannot be used with
docker build
) to expose the host's
run/artifacts
directory to the build, so that it puts
the artifacts where the runtime image's Dockerfile looks for them.
To put it all together, run
these commands:
$ docker build -t build_haskell-hello build/
$ docker run --rm \
--volume="$PWD/build/src:/src" \
--volume="$PWD/run/artifacts:/artifacts" \
build_haskell-hello
$ docker build -t haskell-hello run/
Notice that we also mounted the source code from the host. While
we could have continued COPYing the source code into the image,
mounting it has some advantages. You don't end up with a bunch of
large build images full of intermediate files for every time you
change the code (that you have to remember to clean up), and you
can do incremental builds since intermediate files are
preserved.
That was more complicated, but did it make a difference?
Well...
$ docker inspect -f '' haskell-hello
5466526
Down to ~5.5 MB, over two orders of magnitude better. I'd say
that was worth it!
Of course, everyone's projects are different, and require
different trade-offs, so while the above is illustrative of the
approach, you will tweak it as you see fit. You may prefer to COPY
the source code into the image to minimize risk of leakage between
iterations (at the expense of time and disk space). You may want to
clear the artifacts
directory between builds for the
same reason. For more complex project, there will be OS
requirements shared between the build and runtime images, so it
often makes sense to derive both from a common parent.
Stack support
At FP Complete, we use and recommend this approach for deploying
production software with Docker, but without easy-to-use tool
support it is a bit cumbersome. Unsurprisingly, Stack has excellent support for
this approach. The stack image container
command will
create a runtime image from artifacts generated during the build
(optionally using Docker for the build as well). See
Yesod hosting with Docker and Kubernetes for an example, and
Docker
section of the user's guide for more details.
Subscribe to our blog via email
Email subscriptions come from our Atom feed and are handled by Blogtrottr. You will only receive notifications of blog posts, and can unsubscribe any time.
Do you like this blog post and need help with Next Generation Software Engineering, Platform Engineering or Blockchain & Smart Contracts? Contact us.