Many common programming languages today eschew manual memory management in preference to garbage collection. While the former certainly has its place in certain use cases, I believe the majority of application development today occurs in garbage collected languages, and for good reason. Garbage collection moves responsibility for many memory issues from the developer into the language's runtime, mostly removing the possibility of entire classes of bugs while reducing cognitive overhead. In other words, it separates out a concern. That's not to say that garbage collection is perfect or appropriate in all cases, but in many common cases it greatly simplifies code.

Languages like Haskell, Erlang, and Go provide a similar separation-of-concern in the form of green threads. Instead of requiring the developer to manually deal with asynchronous (or non-blocking) I/O calls, the runtime system takes responsibility for this. Like garbage collection, it's not appropriate for all use cases, but for many common cases it's a great fit and reduces code complexity. This post will discuss what green threads are, the problems they solve, some cases where they aren't the best fit, and how to get started with green thread based concurrency easily.

If you want to jump to that last point, you can download the Stack build tool and start with the async library tutorial.

Blocking and non-blocking I/O

Suppose you're writing a web server. A naive approach would be to write something like the following psuedo-code:

function webServer(port) {
  var listenSocket = bindSocket(port);
  while(true) {
    var socket = accept(listenSocket);
    forkThread(handleConnection(socket));
  }
}

function handleConnection(socket) {
  while(true) {
    var bytes = read(socket);
    var request = parseRequest(bytes);
    var response = getResponse(request);
    write(socket, renderResponse(response));
  }
}

The read and write calls appear to perform blocking I/O, which means that the entire system thread running them will be blocked on the kernel until data is available. Depending on what our forkThread call did, this could mean one of two things:

Neither of these approaches is very attractive for writing robust concurrent applications (though the former is certainly better than the latter). Another approach is to use non-blocking I/O. In this case, instead of making a call to read or write which blocks until data is available, you make a call and provide a callback function or continuation to handle what to do with the result data. Let's see what our web server above will look like:

function webServer(port) {
  var listenSocket = bindSocket(port);
  listenLoop(listenSocket);
}

function listenLoop(listenSocket) {
  acceptAsync(listenSocket, function(socket) {
    handleConnection(socket);
    listenLoop(listenSocket);
  });
}

function handleConnection(socket) {
  readAsync(socket, function(bytes) {
    var request = parseRequest(bytes);
    var response = getResponse(request);
    writeAsync(socket, renderResponse(response), function() {
      handleConnection(socket);
    });
  });
}

Let's note some differences:

This approach solves the problems listed above with blocking I/O: no performance overhead of spawning threads or fibers, and multiple requests can be processed concurrently without being blocked by each other's I/O calls.

The downsides of non-blocking I/O

Unfortunately, this new style does not get away scot-free.

For those interested, we had a previous blog post on Concurrency and Node which delved more deeply into these points.

Making non-blocking a runtime system concern

Let's deliver on our promise from the introduction: turning non-blocking I/O into a runtime system concern. The theory behind this is:

That may sound like a lot to deliver on, but green threads are up to the challenge. They are very similar to the fibers that we described above, with one major difference: seemingly blocking I/O calls actually use non-blocking I/O under the surface. Let's see how this would work with our web server example from above (copied here for convenience):

function webServer(port) {
  var listenSocket = bindSocket(port);
  while(true) {
    var socket = accept(listenSocket);
    forkThread(handleConnection(socket));
  }
}

function handleConnection(socket) {
  while(true) {
    var bytes = read(socket);
    var request = parseRequest(bytes);
    var response = getResponse(request);
    write(socket, renderResponse(response));
  }
}

Starting from after the bindSocket call:

  1. Our main green thread calls accept. The runtime system essentially rewrites this to an acceptAsync call like our callback version, puts the main green thread to sleep, and has the runtime system's event loop schedule a wakeup when new data is available on the listenSocket.
  2. When a new connection is available, the runtime system wakes up the main green thread with the new socket value filled in. Our thread then forks a new green thread (let's call it worker 1) running the handleConnection call.
  3. Inside worker 1, our read call is similarly rewritten to readAsync with a callback, and the runtime system puts worker 1 to sleep and schedules a wakeup when data is available on socket.
  4. The runtime system then goes back to the list of green threads that have work to do and finds the main thread. The main thread continues from the forkThread call, and iterates on the while loop. It arrives back at the accept call and, like before, the runtime system puts the main thread to sleep and schedules a wakeup when there's a connection available on listenSocket.
  5. Importantly, at this point, the entire application is able to simply wait for some data. We're waiting for the operating system to tell us that either listenSocket or socket have some data available. When the operating system tells us (via a system call like epoll or select) that data is available, the runtime system can wake the relevant thread up, and do some more work until the next I/O call that puts it to sleep.
  6. Since the runtime system is handling the scheduling of threads, it is free to determine that a thread has been active for too long and pause execution in favor of a different thread, solving the long CPU processing problem mentioned above. (This is also the difference between cooperative and preemptive multithreading.)
  7. Instead of needing a separate error handling mechanism for callbacks, the runtime system can reuse existing exception throwing mechanisms from the language. For example, if an error occurs during the read call, the runtime system can throw an exception from that point in the code.

I'd argue that this green thread approach - for the most part - gives us the best of both worlds: the high level, easy to read and write, robust code that comes from using threads, with the high performance of the non-blocking callback code.

Downsides of green threads

Like garbage collection, there are downsides to green threads as well. While not a comprehensive list, here are some such downsides I'm aware of:

As you can see, like garbage collection, the main downside is that for specific performance cases, green threads may be an impediment. But also like garbage collection, there is a wide array of cases where the gains in code simplicity and lower bug rates more than pay for the slight performance overhead.

Experiment with green threads

Let's go ahead and get started with a short example right now. The only tools you'll need are the Stack build tool and a text editor. If you're on a Mac or Linux system, you can get Stack by running curl -sSL https://get.haskellstack.org/ | sh. On Windows, you probably want the 64-bit installer.

Once you have Stack installed, save the following code into a file called echo.hs:

#!/usr/bin/env stack
-- stack --resolver lts-7.14 --install-ghc runghc --package conduit-extra
{-# LANGUAGE OverloadedStrings #-}
import Data.Conduit
import Data.Conduit.Network
import qualified Data.ByteString.Char8 as B8

main = do
    putStrLn "OK, I'm running!"

    -- This automatically binds a new listening socket and forks a new
    -- green thread for each incoming connection.
    runTCPServer settings app
  where
    -- Listen on all interfaces on port 4500
    settings = serverSettings 4500 "*"

-- Create a simple pipeline connecting the input from the network
-- (appSource) to our echo program to the output to the network
-- (appSink).
app appData = runConduit (appSource appData .| echo .| appSink appData)

-- awaitForever keeps running the inner function as long as new data
-- is available from the client
echo = awaitForever (\bytes -> do
    -- Echo back the raw message we received
    yield bytes
    -- And now send back the Fibonacci value at the length of the
    -- input. We need to use B8.pack to convert our String into a
    -- binary format we can send over the network.
    yield (B8.pack (show (fib (B8.length bytes)) ++ "\n")))

-- Written badly for high CPU usage!
fib 0 = 1
fib 1 = 1
fib n = fib (n - 1) + fib (n - 2)

Now you can run this program with stack echo.hs. The first run will take a bit of time as it downloads a compiler and a number of libraries. This will only happen on the first run; subsequent runs will reuse the previously downloaded and installed tools and libraries. Once you've got this running, you can connect to it to play with:

$ telnet localhost 4500

Go ahead and play with it with some short lines, and confirm that it responds. For example, here's a sample session:

$ telnet localhost 4500
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Hello world!
Hello world!
610
Bye!
Bye!
13

Now to prove our claims of the system remaining responsive in the presence of high CPU: try entering a long string, which will require a lot of CPU time to calculate the Fibonacci value, e.g.:

This is a fairly long string which will take a bit of time unless you have a supercomputer.
This is a fairly long string which will take a bit of time unless you have a supercomputer.

As you might expect, further interactions on this connection will have no effect as it is computing its response. But go ahead and open up a new telnet session in a different terminal. You should be able to continue interacting with the echo server and get results, thanks to the scheduling behavior of the runtime system. Notice how we get this behavior without any explicit work on our part to break up the expensive CPU operation into smaller bits!

EDIT As pointed out on lobste.rs, the above will not expand to multiple cores, since GHC's interpreter mode will only use a single core. In order to see this take advantage of additional CPU cores for additional requests, first compile the executable with:

stack ghc --resolver lts-7.14 --install-ghc --package conduit-extra -- --make -threaded echo.hs

And then run it with:

./echo +RTS -N4

The -N4 tells the GHC runtime to use 4 cores.

Learn more

There are great libraries in Haskell to take advantage of green threads for easy and beautiful concurrency. My all time favorite is the async package. Paired with powerful abstractions like Software Transactional Memory, you can quickly whip up high-performance, concurrent network services while avoiding common pitfalls like deadlocks and race conditions.

If you or your team are interested in learning more about how functional programming and Haskell, check out our training options and our Haskell syllabus. If you'd like to learn about how FP Complete can help you with your server applications needs, contact us about our consulting.

Subscribe to our blog via email
Email subscriptions come from our Atom feed and are handled by Blogtrottr. You will only receive notifications of blog posts, and can unsubscribe any time.

Do you like this blog post and need help with Next Generation Software Engineering, Platform Engineering or Blockchain & Smart Contracts? Contact us.