Haskell Web Server in a 5MB Docker Image

The Problem

Recently we needed to redirect all Amazon Elastic Load Balancer (ELB) HTTP traffic to HTTPS. AWS ELB doesn't provide this automatic redirection as a service. ELB will, however, let you map multiple ports from the ELB into the auto-scaling cluster of nodes attached to that ELB.

People usually just point both port 80 & 443 to a webserver that is configured to redirect traffic through the secure port. The question of how to configure your webserver for this task is asked over & over again on the internet. People have to go scrape the config snip off the internet & put it in their webserver's configuration files. You might be using a different webserver for your new project than you used for your last.

Lifting this configuration into place also takes some dev-ops work (chef, puppet, etc) & testing to make sure it works. If you have to mix redirect-to-https configuration with your other configuration for the webserver it takes even more care & testing. Wouldn't it be nicer to have a microservice for this that redirects out of the box without any configuration needed?

We could map port 80 (HTTP) to our own fast webserver to do the job of redirecting to HTTPS (TLS). The requirements are just that it always redirects to HTTPS & doesn't need configuration to do so (at least in its default mode).

The Solution

I wrote a Haskell service using the fast webserver library/server combo of Wai & Warp. It only took about an hour to write the basic service from start time to ready-for-deployment time. Working on it for an hour solved a problem for us for the foreseeable future for forcing HTTPS on AWS ELB. It does the job well & logs in Apache webserver format. We had it deployed the same day.

The project is open source & can be found on github.com.

Why Haskell?

Haskell can be a great tool for solving systems/dev-ops problems. Its performance can compete with other popular natively compiled systems languages like Go, Rust or even (hand-written) C.

In addition to great performance, Haskell helps you to communicate your intent in code with precision. Mistakes are often caught at compile time instead of runtime. You often hear Haskellers talk about having their code just work after they write it & it compiles.

After installing the GHC compiler and the `cabal-install` build tool, compiling a native executable of the webserver is as simple as these 3 commands in the project root.

cabal update
cabal sandbox init
cabal install

After installation you will have a single binary in $PROJECT/.cabal-sandbox/bin/rdr2tls.

Deployment

What gets installed is a native executable with just a few dynamic links (because GPL licensing). Since we have a nice self-contained native executable, we have a multitude of options for deployment. We could create a Debian package. We could package things up as an RPM. We could deliver the code as a Docker container.

We chose to deploy our first run of the project as Docker container. The first deploy was 200MB (because we based the deployment on the Ubuntu docker image). This is not a huge image but we wanted to see if we could shrink that if possible.

What if we could take everything out of the image that wasn't necessary to running our webserver code? There isn't a whole lot needed to create a working Docker image from an executable. If you run ldd <binary> on the native executable you'll see the following.

tim@kaku:~/src/github.com/dysinger/rdr2tls% ldd .cabal-sandbox/bin/rdr2tls
linux-vdso.so.1 =>  (0x00007ffef0fa8000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fb3fc3e2000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fb3fc1de000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fb3fbfbf000)
libgmp.so.10 => /usr/lib/x86_64-linux-gnu/libgmp.so.10 (0x00007fb3fbd3f000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fb3fba37000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb3fb66c000)
/lib64/ld-linux-x86-64.so.2 (0x00007fb3fc608000)

If we package up just the libraries that are linked, is that enough? No. It didn't work. Michael Snoyman did some digging around & found we also need some gconv UTF libraries. I also found we needed /bin/sh for Docker to be happy. We created a small project for building a base docker image with these things in place. It's just a few megabytes!

When we inject our webserver into the base image we get a complete Docker image for our webserver in less than 20MB. That's not bad!

Into the Rabbit Hole

We went from nearly 200MB to 20MB. Can we do any better? How deep does the rabbit hole go? Luckily I had the weekend so I could really geek out on it.

GHC can be configured with a number of options when it is compiled. We can matrix on the following options:

GHC Version: 7.8 or 7.10 (the last two stable)
GHC Build Flavour: (e.g., quick, perf & perf-llvm)
GHC Integer Library: libgmp-based or 'simple'
LLVM Version: 3.4 or 3.5 (the last two stable)
Split Objects: not recommended in the GHC manual (so we didn't)

In addition to tweaking GHC compiler options while installing GHC, we can tell GHC to compile the code with different backends:

GHC Backend: asm or llvm

I used a script to run through and compile all the different combinations of GHC. I ended up with many, many versions of GHC installed (11GB of them actually). I wanted to see what difference it would make in the size of the webserver executable.

After compiling the webserver a couple dozen times we see that flags & options makes a difference. Sizes for the stripped native executable ranged from 13879600 bytes (13.88MB) to 5963632 bytes (5.96MB) depending on options. No doubt there will be performance trade offs in size vs performance. We are just looking at size for the moment.

If we add UPX in the mix, we can further shrink the executable to the range of 3022828 bytes (3.02MB) to 1224368 bytes (1.22MB!).

Our 'scratch' base docker image is 3.67MB (w/o libgmp) and 4.19MB (w/ libgmp) currently. If we add a stripped & compressed executable weighing in at 1.22MB to 3.67MB we should get something around 5MB. Not to shabby for a complete running Docker image!

REPOSITORY	TAG	SIZE
rdr2tls	7.8.4-perf_llvm-llvm_3_4-integer_gmp-llvm	7.21MB
rdr2tls	7.8.4-perf_llvm-llvm_3_4-integer_gmp-asm	7.11MB
rdr2tls	7.8.4-perf_llvm-llvm_3_4-integer_simple-llvm	6.69MB
rdr2tls	7.8.4-perf_llvm-llvm_3_4-integer_simple-asm	6.59MB
rdr2tls	7.8.4-perf-llvm_3_4-integer_gmp-llvm	5.70MB
rdr2tls	7.8.4-perf-llvm_3_5-integer_gmp-asm	5.60MB
rdr2tls	7.8.4-perf-llvm_3_4-integer_gmp-asm	5.60MB
rdr2tls	7.8.4-perf-llvm_3_4-integer_simple-llvm	5.18MB
rdr2tls	7.8.4-perf-llvm_3_5-integer_simple-asm	5.08MB
rdr2tls	7.8.4-perf-llvm_3_4-integer_simple-asm	5.08MB
haskell-scratch	integer-gmp	4.19MB
haskell-scratch	integer-simple	3.66MB

The 7MB LLVM-backend-compiled version is now pushed to Dockerhub.

Appendix: The Data

Stripped Executable Size (bytes)

Version	Build Flavour	LLVM	Integer Library	Backend	Size
7.8.4	perf_llvm	llvm_3_4	integer_simple	llvm	13879600
7.8.4	perf_llvm	llvm_3_4	integer_gmp	llvm	13875952
7.8.4	perf_llvm	llvm_3_4	integer_simple	asm	13768888
7.8.4	perf_llvm	llvm_3_4	integer_gmp	asm	13763704
7.8.4	quick	llvm_3_4	integer_gmp	llvm	11854264
7.8.4	quick	llvm_3_4	integer_simple	llvm	11841336
7.8.4	quick	llvm_3_4	integer_gmp	asm	11640248
7.8.4	quick	llvm_3_5	integer_gmp	asm	11640248
7.8.4	quick	llvm_3_4	integer_simple	asm	11624760
7.8.4	quick	llvm_3_5	integer_simple	asm	11624760
7.8.4	perf	llvm_3_4	integer_simple	llvm	6570680
7.8.4	perf	llvm_3_4	integer_gmp	llvm	6568888
7.8.4	perf	llvm_3_4	integer_gmp	asm	6456632
7.8.4	perf	llvm_3_5	integer_gmp	asm	6456632
7.8.4	perf	llvm_3_4	integer_simple	asm	6455864
7.8.4	perf	llvm_3_5	integer_simple	asm	6455864
7.10.1	perf	llvm_3_5	integer_gmp	llvm	6267568
7.8.4	perf_llvm	llvm_3_5	integer_gmp	llvm	6267568
7.8.4	perf_llvm	llvm_3_5	integer_simple	llvm	6267568
7.10.1	perf_llvm	llvm_3_5	integer_gmp	llvm	6267568
7.10.1	quick	llvm_3_5	integer_gmp	llvm	6267568
7.10.1	perf	llvm_3_4	integer_gmp	llvm	6259376
7.10.1	perf_llvm	llvm_3_4	integer_gmp	llvm	6259376
7.10.1	perf_llvm	llvm_3_4	integer_simple	llvm	6259376
7.10.1	quick	llvm_3_4	integer_gmp	llvm	6259376
7.10.1	perf	llvm_3_4	integer_gmp	asm	5963632
7.10.1	perf	llvm_3_5	integer_gmp	asm	5963632
7.8.4	perf_llvm	llvm_3_5	integer_gmp	asm	5963632
7.8.4	perf_llvm	llvm_3_5	integer_simple	asm	5963632
7.10.1	perf_llvm	llvm_3_4	integer_gmp	asm	5963632
7.10.1	perf_llvm	llvm_3_4	integer_simple	asm	5963632
7.10.1	perf_llvm	llvm_3_5	integer_gmp	asm	5963632
7.10.1	quick	llvm_3_4	integer_gmp	asm	5963632
7.10.1	quick	llvm_3_5	integer_gmp	asm	5963632

Compressed Executable Size (bytes)

Version	Build Flavour	LLVM	Integer Library	Backend	Size
7.8.4	perf_llvm	llvm_3_4	integer_simple	llvm	3022828
7.8.4	perf_llvm	llvm_3_4	integer_gmp	llvm	3022228
7.8.4	perf_llvm	llvm_3_4	integer_simple	asm	2924580
7.8.4	perf_llvm	llvm_3_4	integer_gmp	asm	2924084
7.8.4	quick	llvm_3_4	integer_gmp	llvm	2526344
7.8.4	quick	llvm_3_4	integer_simple	llvm	2523524
7.8.4	quick	llvm_3_4	integer_gmp	asm	2415588
7.8.4	quick	llvm_3_5	integer_gmp	asm	2415588
7.8.4	quick	llvm_3_4	integer_simple	asm	2412936
7.8.4	quick	llvm_3_5	integer_simple	asm	2412936
7.8.4	perf	llvm_3_4	integer_simple	llvm	1516816
7.8.4	perf	llvm_3_4	integer_gmp	llvm	1513672
7.8.4	perf	llvm_3_4	integer_simple	asm	1412060
7.8.4	perf	llvm_3_5	integer_simple	asm	1412060
7.8.4	perf	llvm_3_4	integer_gmp	asm	1409684
7.8.4	perf	llvm_3_5	integer_gmp	asm	1409684
7.8.4	perf_llvm	llvm_3_5	integer_simple	llvm	1339448
7.10.1	perf	llvm_3_5	integer_gmp	llvm	1339192
7.8.4	perf_llvm	llvm_3_5	integer_gmp	llvm	1339192
7.10.1	perf_llvm	llvm_3_5	integer_gmp	llvm	1339192
7.10.1	quick	llvm_3_5	integer_gmp	llvm	1339192
7.10.1	perf	llvm_3_4	integer_gmp	llvm	1338580
7.10.1	perf_llvm	llvm_3_4	integer_gmp	llvm	1338572
7.10.1	quick	llvm_3_4	integer_gmp	llvm	1338572
7.10.1	perf_llvm	llvm_3_4	integer_simple	llvm	1338540
7.8.4	perf_llvm	llvm_3_5	integer_simple	asm	1224440
7.10.1	perf_llvm	llvm_3_4	integer_simple	asm	1224440
7.10.1	perf	llvm_3_4	integer_gmp	asm	1224368
7.10.1	perf	llvm_3_5	integer_gmp	asm	1224368
7.8.4	perf_llvm	llvm_3_5	integer_gmp	asm	1224368
7.10.1	perf_llvm	llvm_3_4	integer_gmp	asm	1224368
7.10.1	perf_llvm	llvm_3_5	integer_gmp	asm	1224368
7.10.1	quick	llvm_3_4	integer_gmp	asm	1224368
7.10.1	quick	llvm_3_5	integer_gmp	asm	1224368

GHC Compiler Size

Version	Build Flavour	LLVM	Integer Library	Size
7.8.4	quick	llvm_3_4	integer_gmp	272M
7.8.4	quick	llvm_3_5	integer_gmp	272M
7.8.4	quick	llvm_3_4	integer_simple	273M
7.8.4	quick	llvm_3_5	integer_simple	273M
7.10.1	quick	llvm_3_4	integer_simple	332M
7.10.1	quick	llvm_3_5	integer_simple	332M
7.8.4	perf_llvm	llvm_3_4	integer_gmp	912M
7.8.4	perf_llvm	llvm_3_4	integer_simple	913M
7.8.4	perf	llvm_3_4	integer_gmp	927M
7.8.4	perf	llvm_3_5	integer_gmp	927M
7.8.4	perf	llvm_3_4	integer_simple	928M
7.8.4	perf	llvm_3_5	integer_simple	928M
7.10.1	perf	llvm_3_4	integer_simple	1.1G
7.10.1	perf	llvm_3_5	integer_simple	1.1G
7.10.1	perf_llvm	llvm_3_5	integer_simple	1.1G

Subscribe to our blog via email
Email subscriptions come from our Atom feed and are handled by Blogtrottr. You will only receive notifications of blog posts, and can unsubscribe any time.

Do you like this blog post and need help with Next Generation Software Engineering, Platform Engineering or Blockchain & Smart Contracts? Contact us.