Many common programming languages today eschew manual memory
management in preference to garbage collection. While the former
certainly has its place in certain use cases, I believe the
majority of application development today occurs in garbage
collected languages, and for good reason. Garbage collection moves
responsibility for many memory issues from the developer into the
language's runtime, mostly removing the possibility of entire
classes of bugs while reducing cognitive overhead. In other words,
it separates out a concern. That's not to say that garbage
collection is perfect or appropriate in all cases, but in many
common cases it greatly simplifies code.
Languages like Haskell, Erlang, and Go provide a similar
separation-of-concern in the form of green threads. Instead
of requiring the developer to manually deal with asynchronous (or
non-blocking) I/O calls, the runtime system takes responsibility
for this. Like garbage collection, it's not appropriate for all use
cases, but for many common cases it's a great fit and reduces code
complexity. This post will discuss what green threads are, the
problems they solve, some cases where they aren't the best fit, and
how to get started with green thread based concurrency easily.
If you want to jump to that last point, you can download the Stack build
tool and start with the async library
tutorial.
Blocking and non-blocking
I/O
Suppose you're writing a web server. A naive approach would be
to write something like the following psuedo-code:
function webServer(port) {
var listenSocket = bindSocket(port);
while(true) {
var socket = accept(listenSocket);
forkThread(handleConnection(socket));
}
}
function handleConnection(socket) {
while(true) {
var bytes = read(socket);
var request = parseRequest(bytes);
var response = getResponse(request);
write(socket, renderResponse(response));
}
}
The read
and write
calls appear to
perform blocking I/O, which means that the entire system
thread running them will be blocked on the kernel until data is
available. Depending on what our forkThread
call did,
this could mean one of two things:
- If
forkThread
forks a new system thread, then
performing blocking I/O isn't a problem: that thread has nothing to
do until read
and write
complete.
However, forking a new system thread for each connection is
expensive, and does not scale well to hundreds of thousands of
concurrent requests.
- If
forkThread
forks a new thread within your
runtime (sometimes called a fiber),
then multiple fibers will all be running on a single system thread,
and each time you make a blocking I/O call, the entire thread will
be blocked, preventing any progress on other connections. You've
essentially reduced your application to handling one connection at
a time.
Neither of these approaches is very attractive for writing
robust concurrent applications (though the former is certainly
better than the latter). Another approach is to use non-blocking
I/O. In this case, instead of making a call to read
or
write
which blocks until data is available, you make a
call and provide a callback function or continuation
to handle what to do with the result data. Let's see what our web
server above will look like:
function webServer(port) {
var listenSocket = bindSocket(port);
listenLoop(listenSocket);
}
function listenLoop(listenSocket) {
acceptAsync(listenSocket, function(socket) {
handleConnection(socket);
listenLoop(listenSocket);
});
}
function handleConnection(socket) {
readAsync(socket, function(bytes) {
var request = parseRequest(bytes);
var response = getResponse(request);
writeAsync(socket, renderResponse(response), function() {
handleConnection(socket);
});
});
}
Let's note some differences:
- We're no longer performing any forking. All actions are
occuring in one single thread, removing the possibility of overhead
from spawning a system thread (or even a fiber).
- Instead of capturing the output of
read
in a
variable bytes
, we provide readAsync
a
callback function, and that callback function is provided the
bytes
value when available. We sometimes call these
callbacks continuations, since they tell us how to continue
processing from where you left off.
- The loops are gone. Instead, our callbacks recursively call
functions to create the necessary infinite looping, while allowing
for proper asynchronous behavior.
This approach solves the problems listed above with blocking
I/O: no performance overhead of spawning threads or fibers, and
multiple requests can be processed concurrently without being
blocked by each other's I/O calls.
The downsides of
non-blocking I/O
Unfortunately, this new style does not get away scot-free.
- Subjectively, the callback-style of coding is not as elegant as
the blocking style. There are workarounds for this with techniques
like promises.
- Error/exception handling isn't as straight-forward in the
callback setup as with the blocking code. It's certainly a solvable
problem, but often involves techniques like passing in an extra
callback function for the error case. Many languages out there
today use runtime exceptions, and they don't translate too well to
callbacks.
- Our code is limited to running on one CPU core, at least in the
simplest case. You can work around this with techniques like
prefork, but it's not automatically handled by the callback
approach. If our goal is to maximize requests per second, using
every available CPU core for processing is definitely desired.
- It's still possible for handling of multiple requests to cause
blockage for each other. For example, if our
parseRequest
or renderResponse
functions
perform any blocking I/O, or use a significant amount of CPU time,
other requests will need to wait until that processing finishes
before they can resume their processing.
For those interested, we had a previous blog post on
Concurrency and Node which delved more deeply into these
points.
Making
non-blocking a runtime system concern
Let's deliver on our promise from the introduction: turning
non-blocking I/O into a runtime system concern. The theory behind
this is:
- Blocking I/O calls are conceptually easier to think about and
work with
- While spawning lightweight threads/fibers does entail some
overhead, it's lightweight enough to be generally acceptable
- If we can limit the amount of CPU time spent in running each
fiber, we won't need to worry about high-CPU processing blocking
other requests
- The runtime system can handle the scheduling of lightweight
threads onto separate CPU cores (via separate system threads),
which will result in the ability to saturate the entire CPU without
techniques like prefork
That may sound like a lot to deliver on, but green threads are
up to the challenge. They are very similar to the fibers that we
described above, with one major difference: seemingly blocking
I/O calls actually use non-blocking I/O under the surface.
Let's see how this would work with our web server example from
above (copied here for convenience):
function webServer(port) {
var listenSocket = bindSocket(port);
while(true) {
var socket = accept(listenSocket);
forkThread(handleConnection(socket));
}
}
function handleConnection(socket) {
while(true) {
var bytes = read(socket);
var request = parseRequest(bytes);
var response = getResponse(request);
write(socket, renderResponse(response));
}
}
Starting from after the bindSocket
call:
- Our main green thread calls
accept
. The runtime
system essentially rewrites this to an acceptAsync
call like our callback version, puts the main green thread to
sleep, and has the runtime system's event loop schedule a wakeup
when new data is available on the listenSocket
.
- When a new connection is available, the runtime system wakes up
the main green thread with the new
socket
value filled
in. Our thread then forks a new green thread (let's call it worker
1) running the handleConnection
call.
- Inside worker 1, our
read
call is similarly
rewritten to readAsync
with a callback, and the
runtime system puts worker 1 to sleep and schedules a wakeup when
data is available on socket
.
- The runtime system then goes back to the list of green threads
that have work to do and finds the main thread. The main thread
continues from the
forkThread
call, and iterates on
the while
loop. It arrives back at the
accept
call and, like before, the runtime system puts
the main thread to sleep and schedules a wakeup when there's a
connection available on listenSocket
.
- Importantly, at this point, the entire application is able to
simply wait for some data. We're waiting for the operating system
to tell us that either
listenSocket
or
socket
have some data available. When the operating
system tells us (via a system call like epoll
or
select
) that data is available, the runtime system can
wake the relevant thread up, and do some more work until the next
I/O call that puts it to sleep.
- Since the runtime system is handling the scheduling of threads,
it is free to determine that a thread has been active for too long
and pause execution in favor of a different thread, solving the
long CPU processing problem mentioned above. (This is also the
difference between cooperative and preemptive
multithreading.)
- Instead of needing a separate error handling mechanism for
callbacks, the runtime system can reuse existing exception throwing
mechanisms from the language. For example, if an error occurs
during the
read
call, the runtime system can throw an
exception from that point in the code.
I'd argue that this green thread approach - for the most part -
gives us the best of both worlds: the high level, easy to read and
write, robust code that comes from using threads, with the high
performance of the non-blocking callback code.
Downsides of green threads
Like garbage collection, there are downsides to green threads as
well. While not a comprehensive list, here are some such downsides
I'm aware of:
- By passing control of scheduling to the runtime system, we lose
control of when exactly a context switch will occur, which
can result in performance degradation. (Note that this only applies
in relation to the async approach; the other thread-based
approaches also have the context switch issue.) In my experience,
this is rarely a problem, though certain performance-sensitive code
bases may be affected by this. In a distributed computation project
at FP Complete, for example, we ultimately went the route of creating
our own event loop.
- As mentioned, spawning a green thread is cheap, but not free.
An event loop can bypass this overhead. Again, with most green
thread systems, the overhead is small enough so as not to be
prohibitive, but if the highest performance possible is your goal,
green threads may ultimately be your bottleneck.
As you can see, like garbage collection, the main downside is
that for specific performance cases, green threads may be an
impediment. But also like garbage collection, there is a wide array
of cases where the gains in code simplicity and lower bug rates
more than pay for the slight performance overhead.
Experiment with green
threads
Let's go ahead and get started with a short example right now.
The only tools you'll need are the Stack build tool and
a text editor. If you're on a Mac or Linux system, you can get
Stack by running curl -sSL https://get.haskellstack.org/ |
sh
. On Windows, you probably want the 64-bit
installer.
Once you have Stack installed, save the following code into a
file called echo.hs
:
#!/usr/bin/env stack
import Data.Conduit
import Data.Conduit.Network
import qualified Data.ByteString.Char8 as B8
main = do
putStrLn "OK, I'm running!"
runTCPServer settings app
where
settings = serverSettings 4500 "*"
app appData = runConduit (appSource appData .| echo .| appSink appData)
echo = awaitForever (\bytes -> do
yield bytes
yield (B8.pack (show (fib (B8.length bytes)) ++ "\n")))
fib 0 = 1
fib 1 = 1
fib n = fib (n 1) + fib (n 2)
Now you can run this program with stack echo.hs
.
The first run will take a bit of time as it downloads a compiler
and a number of libraries. This will only happen on the first run;
subsequent runs will reuse the previously downloaded and installed
tools and libraries. Once you've got this running, you can connect
to it to play with:
$ telnet localhost 4500
Go ahead and play with it with some short lines, and confirm
that it responds. For example, here's a sample session:
$ telnet localhost 4500
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Hello world!
Hello world!
610
Bye!
Bye!
13
Now to prove our claims of the system remaining responsive in
the presence of high CPU: try entering a long string, which will
require a lot of CPU time to calculate the Fibonacci value,
e.g.:
This is a fairly long string which will take a bit of time unless you have a supercomputer.
This is a fairly long string which will take a bit of time unless you have a supercomputer.
As you might expect, further interactions on this connection
will have no effect as it is computing its response. But go ahead
and open up a new telnet session in a different terminal. You
should be able to continue interacting with the echo server and get
results, thanks to the scheduling behavior of the runtime system.
Notice how we get this behavior without any explicit work on our
part to break up the expensive CPU operation into smaller bits!
EDIT As pointed out
on lobste.rs, the above will not expand to multiple cores,
since GHC's interpreter mode will only use a single core. In order
to see this take advantage of additional CPU cores for additional
requests, first compile the executable with:
stack ghc --resolver lts-7.14 --install-ghc --package conduit-extra -- --make -threaded echo.hs
And then run it with:
./echo +RTS -N4
The -N4
tells the GHC runtime to use 4 cores.
Learn more
There are great libraries in Haskell to take advantage of green
threads for easy and beautiful concurrency. My all time favorite is
the async
package. Paired with powerful abstractions like Software Transactional
Memory, you can quickly whip up high-performance, concurrent
network services while avoiding common pitfalls like deadlocks and
race conditions.
If you or your team are interested in learning more about how
functional programming and Haskell, check out our training options and our Haskell syllabus. If you'd like to learn
about how FP Complete can help you with your server applications
needs, contact us about our consulting.
Subscribe to our blog via email
Email subscriptions come from our Atom feed and are handled by Blogtrottr. You will only receive notifications of blog posts, and can unsubscribe any time.
Do you like this blog post and need help with Next Generation Software Engineering, Platform Engineering or Blockchain & Smart Contracts? Contact us.