Example code can be found
here.
When Node.JS first came onto the scene it successfully
popularized the event-loop. Ryan Dahl correctly identified a
serious problem with the way that I/O is generally handled in
concurrent environments. Many web servers, for example achieve
concurrency by creating a new thread for every connection. In most
platforms, this comes at a substantial cost. The default stack size
in Java is 512KB, which means that if you have 1000 concurrent
connections, your program will consume half a gigabyte of memory
just for stack space. Additionally, forking threads in most systems
costs an enormous amount of time, as does performing a context
switch between two threads.
To address these issues, Node.JS uses a single thread with an
event-loop. In this way, Node can handle 1000s of concurrent
connections without any of the traditional detriments associated
with threads. There is essentially no memory overhead
per-connection, and there is no context switching.
When compared to traditional threading architectures, this is
exactly the right approach to handling concurrency. In fact,
Erlang's OTP libraries function in a very similar way. Each actor
has a queue of messages, and only processes a single message at a
time. Only after fully processing one message does it move on to
the next.
However, this programming model imposes some extreme costs when
it comes to the legibility of code. Consider the following two
examples.
var request = require('request');
request('https://example.com/random-number', function (error, response, body) {
if (!error && response.statusCode === 200) {
console.log(JSON.parse(body));
}
});
var request = require('request');
var response = request('https://example.com/random-number');
if (response.statusCode === 200) {
console.log(JSON.parse(response.body));
}
In the first example we see what an HTTP request would typically
look like in Node. In the second we see what it could be like if
Node had threads.
As a direct result of the event-loop, we have lost the ability
to express a program as a linear series of steps. Which becomes
even more important were we to need to make many I/O calls in the
same function.
request('https://example.com/random-number', function(error, response1, body) {
request('https://example.com/random-number', function(error, response2, body) {
request('https://example.com/random-number', function(error, response3, body) {
...
});
});
});
versus:
var response1 = request('https://example.com/random-number');
var response2 = request('https://example.com/random-number');
var response3 = request('https://example.com/random-number');
Which leads to the 'callback hell' that we all know and hate.
Part of the bill-of-goods we accept when using Node is that in
exchange for better time and space characteristics, we lose the
thread as an abstraction.
What is the value of a
thread
It is important to keep in mind that a thread isn't just a
primitive for dealing with concurrency. It is also a powerful
abstraction to make the existence of latency invisible to the
developer.
import Data.Aeson
import Network.HTTP.Simple
import Network.HTTP.Types.Status
main :: IO ()
main = do
res <- httpJSON "https://httpbin.org/get"
if getResponseStatus res == status200
then print $ (getResponseBody res :: Value)
else error $ show $ getResponseStatus res
Looking at the code above we notice a characteristic difference
about it when compared to the asynchronous JavaScript examples
above. There are no callbacks. When we say response <- get
...
, the program will halt until we get a response back from
the server. We have written a piece of code that is linear, from
step-to-step, and the underlying system knows that the server on
the other end will take time to respond and handles it
accordingly.
This is possible because a thread keeps track of the state
belonging to an ongoing computation. By tracking that state, a
thread can halt and resume a computation arbitrarily. In the event
of I/O, a thread will halt a computation while the I/O is
occurring, and only resume when a response comes back. Giving your
program the appearance of having no delay.
A thread, therefore, is both a way of running multiple
computations concurrently and also accounting for the asynchronous
nature of I/O. By not exposing this abstraction to the developer,
two huge drawbacks exist in Node as a concurrency-oriented
platform.
In Node.JS concurrency can only occur when adjacent to I/O
Node.JS only exposes one primary thread to a program. While it
may spawn new threads under the hood to deal with input buffers,
you don't have any control over them. As a result, you cannot write
a program that takes advantage of multiple cores in a machine. The
result is that your program will only be able to perform two
actions concurrently if at least one of them is bounded by I/O.
To demonstrate, we have setup an example that you can download
alongside this article. It is linked to at the top of the article.
Once you have downloaded the example, look in the
README.md
for the "Local vs. Remote Test"
instructions.
We start by defining a web server that supports a slow
operation. In our case, generating Fibonacci numbers by the most
naive implementation. Then we define two tests in Node.JS and two
tests in Haskell. Using each language's native async
libraries to attempt to acquire multiple Fibonacci numbers
concurrently. In each language one test times how long it takes to
concurrently request two Fibonacci numbers from our web server, the
other times how long it takes to do the same operation locally. All
tests are also run without the async
library for
comparison.
Test Name |
Without Async Time |
With Async Time |
Async / No Async |
Node - Local |
6.439s |
6.306s |
0.979 |
Node - Remote |
4.391s |
2.500s |
0.569 |
Haskell - Local |
3.849s |
2.487s |
0.646 |
Haskell - Remote |
4.117s |
2.521s |
0.612 |
Taking a look at the first row in our table, we find that
Node.JS when attempting to concurrently compute two Fibonacci
numbers is unable to give us any time savings in spite of the tests
being run on a multi-core machine.
return async.parallel([
(callback) => {
x = getFibs(43);
callback();
},
(callback) => {
y = getFibs(44);
callback();
}
]).then(() => {
Even when the functions to generate numbers are run inside
async.parallel
they are unable to run concurrently.
This is because the Node.JS event-loop only allows one callback to
be executed at any given time. So instead of running the two
callbacks defined above in parallel, they are run sequentially. If
we make I/O calls inside our callbacks, as we do in the Node
- Remote
test, the system is able to issue both requests to
a web server concurrently.
return async.parallel([
(callback) => {
request('http://localhost:8080/api/fibs/43')
.then((xStr) => {
x = JSON.parse(xStr).fibResult;
callback();
});
},
(callback) => {
request('http://localhost:8080/api/fibs/44')
.then((yStr) => {
y = JSON.parse(yStr).fibResult;
callback();
});
}
]).then(() => {
In Node.JS concurrent computations starve each other of
resources
This inability to handle multiple computations at the same time
hampers Node.JS even in its intended use case as a web server. The
phenomenon is referred to as "starvation" and we can demonstrate it
by reversing our previous tests.
To run the demonstration code, download the example project
linked at the top of the article. Then look in the
README.md
for instructions on the "Starvation
Test."
In this demonstration we setup a web server in each language.
The web server contains two routes. The first route is a very fast
computation, calculating the square of the input number. The second
route is a very slow computation, calculating the Nth Fibonacci
number based on the input. We then constructed a program that could
perform two tests against each web server.
In the first test, labeled "Fast Only," the test program counts
the number of requests the each server can respond to in 5 seconds;
using only the fast route. In the second test, labeled "Fast with
Slow", the test program makes one request to the slow route, then
counts how many requests to the fast route it can respond to in 5
seconds.
To make this demonstration as compelling as possible, we have
also disabled threading in the Haskell run-time system on the
Haskell web server. The result is that the Haskell server will only
be able to use its internal thread model. It will not be able to
create additional operating system threads, and it will not be able
to take advantage of more than one core.
Test Name |
Request Throughput (higher is better) |
Node - Fast Only |
572 |
Node - Fast with Slow |
114 |
Haskell - Fast Only |
591 |
Haskell - Fast with Slow |
589 |
The results are quite striking. From Haskell's baseline of 591,
having a background computation only reduced the throughput by 2
requests; which is likely smaller than our margin of error. Node.JS
on the other hand was reduced from its baseline to one fifth
capacity.
The difference here is stark because in Node.JS's execution
model, the moment it receives a request on the slow route it must
fully complete the computation for that route before it can begin
on a new request. There is no I/O action taking place which will
allow the server to jump to processing a new request.
In contrast, Haskell can preempt threads in execution. Which
means that when the Haskell server is chugging along on the slow
request, and a request on the fast route comes in, it can jump over
and process it. Then returning to the slow request while the
network is delivering the fast request's response back to a
client.
Haskell does not suffer from traditional threading overheads
When Ryan Dahl originally presented Node, his biggest concern
with threaded systems was their memory consumption. He pointed to
an example of Apache versus Nginx to show that Nginx had a
massively lighter footprint because of its event-loop architecture
(whereas Apache used one thread per connection). The difference in
memory usage was at least an order of magnitude as the connection
count increased.
To demonstrate that Haskell's threading architecture does not
suffer this problem, we have constructed one final example, again
available at the repository linked above. Look in the
README.md
for "Thread Spawn".
In this example we create 100,000 threads each with a 'start'
and 'stop' MVar; which is Haskell's equivalent to a locked
variable. We then issue a start command on each start MVar, wait
three seconds, and finally issue a stop commend on each end
MVar.
When we run the program, we do so with threading enabled, and
with the -s
option on the run-time system; which gives
us the following output.
stack exec -- haskell-thread-spawn +RTS -sout
Created 100000 threads.
Destroyed 100000 threads.
cat out
/home/andrew/src/fpcomplete/fpco-article-examples/concurrency-and-node/.stack-work/install/x86_64-linux-dkda49f7ca9b244180d3cfb1987cbc9743/lts-7.11/8.0.1/bin/haskell-thread-spawn +RTS -N -sout
137,593,048 bytes allocated in the heap
405,750,472 bytes copied during GC
77,103,392 bytes maximum residency (11 sample(s))
10,891,144 bytes maximum slop
165 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 104 colls, 104 par 0.332s 0.082s 0.0008s 0.0186s
Gen 1 11 colls, 10 par 0.620s 0.147s 0.0134s 0.0425s
Parallel GC work balance: 28.08% (serial 0%, perfect 100%)
TASKS: 18 (1 bound, 17 peak workers (17 total), using -N8)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.000s ( 0.001s elapsed)
MUT time 0.312s ( 3.110s elapsed)
GC time 0.952s ( 0.229s elapsed)
EXIT time 0.000s ( 0.003s elapsed)
Total time 1.316s ( 3.343s elapsed)
Alloc rate 441,003,358 bytes per MUT second
Productivity 27.7% of total user, 10.9% of total elapsed
gc_alloc_block_sync: 104599
whitehole_spin: 0
gen[0].sync: 504
gen[1].sync: 2465
Looking near the top of the output, we see that Haskell's
run-time system was able to create 100,000 threads while only using
165 megabytes of memory. We are roughly consuming 1.65 kilobytes
per thread. So an average web server, which might see 2,000
concurrent connections at the top end, would expect to use only 3
megabytes of memory for being multi-threaded.
Why Haskell
Node and Haskell both offer a solution to the traditional woes
of using threading in a web server. They both allow you to deal
with very high numbers of concurrent connections without a huge
memory overhead. However, Haskell doesn't impose an additional
burden on the design of your software to accomplish that goal. You
don't have to worry about routes that take varying amounts of time.
You don't have to dump long-running tasks off to an event queue.
You don't have to spawn multiple processes to utilize multiple
cores of a machine.
We've only just barely scratched the surface of what you can do
with Haskell. If you're interested in learning more, please
check out our Haskell Syllabus for
a recommended learning route. There's also lots of content on the
haskell-lang get
started page.
FP Complete also provides corporate and group webinar training
sessions. Please check out our training
page for more information, or see our consulting page for how we can help your team
succeed with devops and functional programming.
Contact FP Complete
Subscribe to our blog via email
Email subscriptions come from our Atom feed and are handled by Blogtrottr. You will only receive notifications of blog posts, and can unsubscribe any time.
Do you like this blog post and need help with Next Generation Software Engineering, Platform Engineering or Blockchain & Smart Contracts? Contact us.