Haskell is—perhaps infamously—a lazy language. The basic
idea of laziness is pretty easy to sum up in one sentence: values
are only computed when they're needed. But the implications of this
are more subtle. In particular, it's important to understand some
crucial topics if you want to write memory- and time-efficient
code:
- Weak head normal form (WHNF) versus normal form (NF)
- How to use the
seq
and deepseq
functions (and related concepts)
- Strictness annotations on data types
- Bang patterns
- Strictness of data structures: lazy, spine-strict, and
value-strict
- Choosing the appropriate helper functions, especially with
folds
This blog post was inspired by some questions around writing
efficient conduit code, so
I'll try to address some of that directly at the end. The concepts,
though, are general, and will transfer to not only other streaming
libraries, but non-streaming data libraries too.
NOTE This blog post will mostly treat laziness
as a problem to be solved, as opposed to the reality: laziness is
sometimes an asset, and sometimes a liability. I'm focusing on the
negative exclusively, because our goal here is to understand the
rough edges and how to avoid them. There are many great things
about laziness that I'm not even hinting at. I trust my readers to
add some great links to articles speaking on the virtues of
laziness in the comments :)
Basics of laziness
Let's elaborate on my one liner above:
Values are only computed when they're needed
Let's explore this by comparison with a strict language: C.
#include <stdio.h>
int add(int x, int y) {
return x + y;
}
int main() {
int five = add(1 + 1, 1 + 2);
int seven = add(1 + 2, 1 + 3);
printf("Five: %d\n", five);
return 0;
}
Our function add
is strict in both of its
arguments. And its result is also strict. This means that:
- Before
add
is called the first time, we will
compute the result of both 1 + 1
and 1 +
2
.
- We will call the
add
function on 2
and 3
, get a result of 5
, and place that
value in memory pointed at by the variable five
.
- Then we'll do the same thing with
1 + 2
, 1 +
3
, and placing 7
in seven
.
- Then we'll call
printf
with our five
value, which is already fully computed.
Let's compare that to the equivalent Haskell code:
add :: Int -> Int -> Int
add x y = x + y
main :: IO ()
main = do
let five = add (1 + 1) (1 + 2)
seven = add (1 + 2) (1 + 3)
putStrLn $ "Five: " ++ show five
There's something called strictness analysis which will
result in something more efficient than what I'll describe here in
practice, but semantically, we'll end up with the following:
- Instead of immediately computing
1 + 1
and 1
+ 2
, the compiler will create a thunk (which you can
think of as a promise) for those computations, and pass
those thunks to the add
function.
- Except: we won't call the
add
function right away
either: five
will be a thunk representing the
application of the add
function to the thunk for
1 + 1
and 1 + 2
.
- We'll end up doing the same thing with
seven
: it
will be a thunk for applying add
to two other
thunks.
- When we finally try to print out the value
five
,
we need to know the actual number. This is called forcing
evaluation. We'll get into more detail on when and how this
happens below, but for now, suffice it to say that when
putStrLn
is executed, it forces evaluation of
five
, which forces evaluation of 1 + 1
and 1 + 2
, converting the thunks into real values
(2
, 3
, and ultimately
5
).
- Because
seven
is never used, it remains a thunk,
and we don't spend time evaluating it.
Compared to the C (strict) evaluation, there is one clear
benefit: we don't bother wasting time evaluating the
seven
value at all. That's three addition operations
bypassed, woohoo! And in a real world scenario, instead of being
three additions, that could be a seriously expensive operation.
However, it's not all rosey. Creating a thunk does not
come for free: we need to allocate space for the thunk, which costs
both allocation, and causes GC pressure for freeing them
afterwards. Perhaps most importantly: the thunked version of an
expression can be far more costly than the evaluated version.
Ignoring some confusing overhead from data constructors (which only
make the problem worse), let's compare our two representations of
five
. In C, five
takes up exactly one
machine word*. In Haskell, our five
thunk will take up
roughly:
* Or perhaps less, as int
is probably only 32 bits,
and you're probably on a 64 bit machine. But then you get into
alignment issues, and registers... so let's just say one machine
word.
- One machine word to say "I'm a thunk"
- Within that thunk, pointers to the
add
function,
and the 1 + 1
and 1 + 2
thunks (one
machine word each). So three machine words total.
- Within the
1 + 1
thunk, one machine word for the
thunk, and then again a pointer to the +
operator, and
the 1
values. (GHC has an optimization where it will
keep small int values in dedicated parts of memory, avoiding extra
overhead for the ints themselves. But you could theoretically add
in an extra machine word for each.) Again, conservatively: three
machine words.
- Same logic for the
1 + 2
thunk, so three more
machine words.
- For a whopping total of 10 machine words, or
10 times the memory usage as C!
Now in practice, it's not going to work out that way. I
mentioned the strictness analysis step, which will say "hey, wait a
second, it's totally better to just add two numbers than allocate a
thunk, I'mma do that now, kthxbye." But it's vital when writing
Haskell to understand all of these places where laziness and thunks
can creep in.
Bang!
Let's look at how we can force Haskell to be more strict in its
evaluation. Likely the easiest way to do this is with bang
patterns. Let's look at the code first:
add :: Int -> Int -> Int
add !x !y = x + y
main :: IO ()
main = do
let !five = add (1 + 1) (1 + 2)
!seven = add (1 + 2) (1 + 3)
putStrLn $ "Five: " ++ show five
This code now behaves exactly like the strict C code. Because
we've put a bang (!
) in front of the x
and y
in the add
function, GHC knows that
it must force evaluation of those values before evaluating it.
Similarly, by placing bangs on five
and
seven
, GHC must evaluate these immediately, before
getting to putStrLn
.
As with many things in Haskell, however, bang patterns are just
syntactic sugar for something else. And in this case, that
something else is the seq
function. This function
looks like:
seq :: a -> b -> b
You could implement this type signature yourself, of course, by
just ignoring the a
value:
badseq :: a -> b -> b
badseq a b = b
However, seq
uses primitive operations from GHC
itself to ensure that, when b
is evaluated,
a
is evaluated too. Let's rewrite our add
function to use seq
instead of bang patterns:
add :: Int -> Int -> Int
add x y =
let part1 = seq x part2
part2 = seq y answer
answer = x + y
in part1
add x y = x `seq` y `seq` x + y
What this is saying is this:
part1
is an expression which will tell you the
value of part2
, after it evaluates x
part2
is an expression which will tell you the
value of answer
, after it evaluates
y
answer
is just x + y
Of course, that's a long way to write this out, and the pattern
is common enough that people will usually just use seq
infix as demonstrated above.
EXERCISE What would happen if, instead of
in part1
, the code said in part2
? How
about in answer
?
There is always a straightforward translation from bang patterns
to usage of let
. We can do the same with the
main
function:
main :: IO ()
main = do
let five = add (1 + 1) (1 + 2)
seven = add (1 + 2) (1 + 3)
five `seq` seven `seq` putStrLn ("Five: " ++ show five)
It's vital to understand how seq
is working, but
there's no advantage to using it over bang patterns where the
latter are clear and easy to read. Choose whichever option makes
the code easiest to read, which will often be bang patterns.
Tracing evaluation
So far, you've just had to trust me about the evaluation of
thunks occurring. Let's see a method to more directly observe
evaluation. The trace
function from
Debug.Trace
will print a message when it is evaluated.
Take a guess at the output of these programs:
#!/usr/bin/env stack
import Debug.Trace
add :: Int -> Int -> Int
add x y = x + y
main :: IO ()
main = do
let five = trace "five" (add (1 + 1) (1 + 2))
seven = trace "seven" (add (1 + 2) (1 + 3))
putStrLn $ "Five: " ++ show five
Versus:
#!/usr/bin/env stack
import Debug.Trace
add :: Int -> Int -> Int
add x y = x + y
main :: IO ()
main = do
let !five = trace "five" (add (1 + 1) (1 + 2))
!seven = trace "seven" (add (1 + 2) (1 + 3))
putStrLn $ "Five: " ++ show five
Think about this before looking at the answer...
OK, hope you had a good think. Here's the answer:
- The first program will print both
five
and
Five: 5
. It will not bother printing
seven
, since that expression is never forced. (Due to
strangeness around output buffering, you may see interleaving of
these two output values.)
- The second will print both
five
and
seven
, because the bang patterns force their
evaluation. However, the order of their output may be
different than you expect. On my system, for example,
seven
prints before five
. That's because
GHC retains the right to rearrange order of evaluation in these
cases.
- By contrast, if you use
five `seq` seven `seq` putStrLn
("Five: " ++ show five)
, it comes out in the order you would
intuitively expect: first five, then seven, and then "Five: 5".
This gives a bit of a lie to my claim that bang patterns are always
a simple translation to seq
s. However, the fact is
that with an expression x `seq` y
, GHC is free to
choose whether it evaluates x
or y
first,
as long as it ensure that when that expression finishes evaluating,
both x
and y
are evaluated.
All that said: as long as your expressions are truly pure, you
will be unable to observe the difference between x
and
y
evaluating first. Only the fact that we used
trace
, which is an impure function, allowed us to
observe the order of evaluation.
QUESTION Does the result change at all if you
put bangs on the add
function? Why do bangs there
affect (or not affect) the output?
The value of bottom
This is all well and good, but the more standard way to
demonstrate evaluation order is to use bottom values, aka
undefined
. undefined
is special in that,
when it is evaluated, it throws a runtime exception. (The
error
function does the same thing, as do a few other
special functions and values.) To demonstrate the same thing about
seven
not being evaluated without the bangs, compare
these two programs:
#!/usr/bin/env stack
add :: Int -> Int -> Int
add x y = x + y
main :: IO ()
main = do
let five = add (1 + 1) (1 + 2)
seven = add (1 + 2) undefined
putStrLn $ "Five: " ++ show five
Versus:
#!/usr/bin/env stack
add :: Int -> Int -> Int
add x y = x + y
main :: IO ()
main = do
let five = add (1 + 1) (1 + 2)
!seven = add (1 + 2) undefined
putStrLn $ "Five: " ++ show five
The former completes without issue, since seven
is
never evaluated. However, in the latter, we have a bang pattern on
seven
. What GHC does here is:
- Evaluate the expression
add (1 + 2) undefined
- This reduces to
(1 + 2) + undefined
- But this is still an expression, not a value, so more
evaluation is necessary
- In order to evaluate the
+
operator, it needs
actual values for the two arguments, not just thunks. This can be
seen as if +
has bang patterns on its arguments. The
correct way to say this is "+
is strict in both of its
arguments."
- GHC is free to choose to either evaluate
1 + 2
or
undefined
first. Let's assume it does 1 +
2
first. It will come up with two evaluated values
(1
and 2
), pass them to +
,
and get back 3
. All good.
- However, it then tries to evaluate
undefined
,
which triggers a runtime exception to be thrown.
QUESTION Returning to the question above: does
it look like bang patterns inside the add
function
actually accomplish anything? Think about what the output of this
program will be:
#!/usr/bin/env stack
add :: Int -> Int -> Int
add !x !y = x + y
main :: IO ()
main = do
let five = add (1 + 1) (1 + 2)
seven = add (1 + 2) undefined
putStrLn $ "Five: " ++ show five
To compare this behavior to a strict language, we need a
language with something like runtime exceptions. I'll use Rust's
panics:
fn add(x: isize, y: isize) -> isize {
println!("adding: {} and {}", x, y);
x + y
}
fn main() {
let five = add(1 + 1, 1 + 2);
let seven = add(1 + 2, panic!());
println!("Five: {}", five);
}
Firstly, to Rust's credit: it gives me a bunch of warnings about
how this program is dumb. Fair enough, but I'm going to ignore
those warnings and charge ahead with it. This program will first
evaluate the add(1 + 1, 1 + 2)
expression (which we
can see in the output of adding: 2 and 3
). Then,
before it ever enters the add
function the second
time, it needs to evaluate both 1 + 2
and
panic!()
. The former works just fine, but the latter
results in a panic being generated and short-circuiting the rest of
our function.
If we want to regain Haskell's laziness properties, there's a
straightforward way to do it: use a closure. A closure is,
essentially, a thunk. The Rust syntax for creating a closure is
|args| body
. We can create closures with no arguments
to act like thunks, which gives us:
fn add<X, Y>(x: X, y: Y) -> isize
where X: FnOnce() -> isize,
Y: FnOnce() -> isize {
let x = x();
let y = y();
println!("adding: {} and {}", x, y);
x + y
}
fn main() {
let five = || add(|| 1 + 1, || 1 + 2);
let seven = || add(|| 1 + 2, || panic!());
println!("Five: {}", five());
}
Again, the Rust compiler complains about the unused
seven
, but this program succeeds in running, since we
never run the seven
closure.
Still not up to speed with Rust? Let's use everyone's favorite
language: Javascript:
function add(x, y) {
return x() + y();
}
function panic() {
throw "Panic!"
}
var five = ignored => add(ignored => 1 + 1, ignored => 1 + 2);
var seven = ignored => add(ignored => 1 + 2, panic);
console.log("Five: " + five());
Alright, to summarize until now:
- Haskell is lazy by default
- You can use bang patterns and
seq
to make things
strict
- By contrast, in strict languages, you can use closures to make
things lazy
- You can see if a function is strict in its arguments by passing
in bottom (
undefined
) and seeing if it explodes in
your face
- The
trace
function can help you see this as
well
This is all good, and make sure you have a solid grasp of these
concepts before continuing. Consider rereading the sections
above.
Average
Here's something we didn't address: what, exactly, does it mean
to evaluate or force a value? To demonstrate the problem, let's
implement an average function. We'll use a helper datatype, called
RunningTotal
, to capture both the cumulative sum and
the number of elements we've seen so far.
data RunningTotal = RunningTotal
{ sum :: Int
, count :: Int
}
printAverage :: RunningTotal -> IO ()
printAverage (RunningTotal sum count)
| count == 0 = error "Need at least one value!"
| otherwise = print (fromIntegral sum / fromIntegral count :: Double)
printListAverage :: [Int] -> IO ()
printListAverage =
go (RunningTotal 0 0)
where
go rt [] = printAverage rt
go (RunningTotal sum count) (x:xs) =
let rt = RunningTotal (sum + x) (count + 1)
in go rt xs
main :: IO ()
main = printListAverage [1..1000000]
We're going to run this with run time statistics turned on so we
can look at memory usage:
$ stack ghc average.hs && ./average +RTS -s
Lo and behold, our memory usage is through the roof!
[1 of 1] Compiling Main ( average.hs, average.o )
Linking average ...
500000.5
258,654,528 bytes allocated in the heap
339,889,944 bytes copied during GC
95,096,512 bytes maximum residency (9 sample(s))
1,148,312 bytes maximum slop
164 MB total memory in use (0 MB lost due to fragmentation)
We're allocating a total of 258MB, and keeping 95MB in memory at
once. For something that should just be a tight inner loop, that's
ridiculously large.
Bang!
You're probably thinking right now "shouldn't we use that
seq
stuff or those bang patterns?" Certainly that
makes sense. And in fact, it looks really trivial to solve this
problem with a single bang to force evaluation of the newly
constructed rt
before recursing back into
go
. For example, we can add {-# LANGUAGE
BangPatterns #-}
to the top of our file and then define
go
as:
go !rt [] = printAverage rt
go (RunningTotal sum count) (x:xs) =
let rt = RunningTotal (sum + x) (count + 1)
in go rt xs
Unfortunately, this results in exactly the same memory
usage as we had before. In order to understand why this is
happening, we need to look at something called weak head normal
form.
Note in advance that there's a great Stack
Overflow answer on this topic for further reading.
We've been talking about forcing values and evaluating
expressions, but what exactly that means hasn't been totally clear.
To start simple, what will the output of this program be?
main = putStrLn $ undefined `seq` "Hello World"
You'd probably guess that it will print an error about
undefined
, since it will try to evaluate
undefined
before it will evaluate "Hello
World"
, and because putStrLn
is strict in its
argument. And you'd be correct. But let's try something a little
bit different:
main = putStrLn $ Just undefined `seq` "Hello World"
If you assume that "evaluate" means "fully evaluate into
something with no thunks left," you'll say that this, too, prints
an undefined
error. But in fact, it happily prints out
"Hello World" with no exceptions. What gives?
It turns out that when we talk about forcing evaluation with
seq
, we're only talking about evaluating to weak
head normal form (WHNF). For most data types, this means
unwrapping one layer of constructor. In the case of Just
undefined
, it means that we unwrap the Just
data constructor, but don't touch the undefined
within
it. (We'll see a few ways to deal with this differently below.)
It turns out that, with a standard data constructor*, the impact
of using seq
is the same as pattern matching the
outermost constructor. If you want to monomorphise, for example,
you can implement a function of type seqMaybe :: Maybe a
-> b -> b
and use it in the main
example
above. Go ahead and give it a shot... answer below.
* Hold your horses, we'll talk about newtype
s later
and then you'll understand this weird phrasing.
seqMaybe :: Maybe a -> b -> b
seqMaybe Nothing b = b
seqMaybe (Just _) b = b
main :: IO ()
main = do
putStrLn $ Just undefined `seqMaybe` "Hello World"
putStrLn $ undefined `seqMaybe` "Goodbye!"
Let's up the ante again. What do you think this program will
print?
main = do
putStrLn $ error `seq` "Hello"
putStrLn $ (\x -> undefined) `seq` "World"
putStrLn $ error "foo" `seq` "Goodbye!"
You might think that error `seq` ...
would be a
problem. After all, isn't error
going to throw an
exception? However, error
is a function. There's no
exception getting thrown, or no bottom value being provided, until
error
is given its String
argument. As a
result, evaluating does not, in fact, generate an error. The rule
is: any function applied to too few values is automatically in
WHNF.
A similar logic applies to (\x -> undefined)
.
Although it's a lambda expression, its type is a function which has
not been applied to all arguments. And therefore, it will not throw
an exception when evaluated. In other words, it's already in
WHNF.
However, error "foo"
is a function fully applied to
its arguments. It's no longer a function, it's a value. And when we
try to evaluate it to WHNF, its exception blows up in our face.
EXERCISE Will the following throw exceptions
when evaluated?
(+) undefined
Just undefined
undefined 5
(error "foo" :: Int -> Double)
Fixing average
Having understood WHNF, let's return to our example and see why
our first bang pattern did nothing to help us:
go !rt [] = printAverage rt
go (RunningTotal sum count) (x:xs) =
let rt = RunningTotal (sum + x) (count + 1)
in go rt xs
In WHNF, forcing evaluation is the same as unwrapping the
constructor, which we are already doing in the second clause! The
problem is that the values contained inside the
RunningTotal
data constructor are not being evaluated,
and therefore are accumulating thunks. Let's see two ways to solve
this:
go rt [] = printAverage rt
go (RunningTotal !sum !count) (x:xs) =
let rt = RunningTotal (sum + x) (count + 1)
in go rt xs
Instead of putting the bangs on the RunningTotal
value, I'm putting them on the values within the
constructor, forcing them to be evaluated at each loop. We're no
longer accumulating a huge chain of thunks, and our maximum
residency drops to 44kb. (Total allocations, though, are still up
around 192mb. We need to play around with other optimizations
outside the scope of this post to deal with the total allocations,
so we're going to ignore this value for the rest of the examples.)
Another approach is:
go rt [] = printAverage rt
go (RunningTotal sum count) (x:xs) =
let !sum' = sum + x
!count' = count + 1
rt = RunningTotal sum' count'
in go rt xs
This one instead forces evaluation of the new sum and count
before constructing the new RunningTotal
value.
I like this version a bit more, as it's forcing evaluation at the
correct point: when creating the value, instead of on the next
iteration of the loop when destructing it.
Moral of the story: make sure you're evaluating the thing you
actually need to evaluate, not just its container!
deepseq
The fact that seq
only evaluates to weak head
normal form is annoying. There are lots of times when we would like
to fully evaluate down to normal form (NF), meaning all
thunks have been evaluated inside our values. While there is
nothing built into the language to handle this, there is a
semi-standard (meaning it ships with GHC) library to handle this:
deepseq
. It works by providing an NFData
type class the defines how to reduce a value to
normal form (via the
rnf
method).
import Control.DeepSeq
data RunningTotal = RunningTotal
{ sum :: Int
, count :: Int
}
instance NFData RunningTotal where
rnf (RunningTotal sum count) = sum `deepseq` count `deepseq` ()
printAverage :: RunningTotal -> IO ()
printAverage (RunningTotal sum count)
| count == 0 = error "Need at least one value!"
| otherwise = print (fromIntegral sum / fromIntegral count :: Double)
printListAverage :: [Int] -> IO ()
printListAverage =
go (RunningTotal 0 0)
where
go rt [] = printAverage rt
go (RunningTotal sum count) (x:xs) =
let rt = RunningTotal (sum + x) (count + 1)
in rt `deepseq` go rt xs
main :: IO ()
main = printListAverage [1..1000000]
This has a maximum residency, once again, of 44kb. We define our
NFData
instance, which includes an rnf
method. The approach of simply deepseq
ing all of the
values within a data constructor is almost always the approach to
take for NFData
instances. In fact, it's so common,
that you can get away with just using Generic
deriving
and have GHC do the work for you:
import GHC.Generics (Generic)
import Control.DeepSeq
data RunningTotal = RunningTotal
{ sum :: Int
, count :: Int
}
deriving Generic
instance NFData RunningTotal
The true beauty of having NFData
instances is the
ability to abstract over many different data types. We can use this
not only to avoid space leaks (as we're doing here), but also to
avoid accidentally including exceptions inside thunks within a
value. For an example of that, check out the
tryAnyDeep function from the safe-exceptions
library.
EXERCISE Define the deepseq
function yourself in terms of rnf
and
seq
.
Strict data
These approaches work, but they are not ideal. The problem lies
in our definition of RunningTotal
. What we want
to say is that, whenever you have a value of type
RunningTotal
, you in fact have two Int
s.
But because of laziness, what we're actually saying is that a
RunningTotal
value could contain two
Int
s, or it could contain thunks that will evaluate to
Int
s, or thunks that will throw exceptions.
Instead, we'd like to make it impossible to construct a
RunningTotal
value that has any laziness room left
over. And to do that, we can use strictness annotations in
our definition of the data type:
data RunningTotal = RunningTotal
{ sum :: !Int
, count :: !Int
}
deriving Generic
printAverage :: RunningTotal -> IO ()
printAverage (RunningTotal sum count)
| count == 0 = error "Need at least one value!"
| otherwise = print (fromIntegral sum / fromIntegral count :: Double)
printListAverage :: [Int] -> IO ()
printListAverage =
go (RunningTotal 0 0)
where
go rt [] = printAverage rt
go (RunningTotal sum count) (x:xs) =
let rt = RunningTotal (sum + x) (count + 1)
in go rt xs
main :: IO ()
main = printListAverage [1..1000000]
All we've done is put bangs in front of the Int
s in
the definition of RunningTotal
. We have no other
references to strictness or evaluation in our program. However, by
placing the strictness annotations on those fields, we're saying
something simple and yet profound:
Whenever you evaluate a value of type
RunningTotal
, you must also evaluate the two
Int
s it contains
As we mentioned above, our second go
clause forces
evaluation of the RunningTotal
value by taking apart
its constructor. This act now automatically forces evaluation of
sum
and count
, which we previously needed
to achieve via a bang pattern.
There's one other advantage to this, which is slightly out of
scope but worth mentioning. When dealing with small values like an
Int
, GHC will automatically unbox strict
fields. This means that, instead of keeping a pointer to an
Int
inside RunningTotal
, it will keep the
Int
itself. This can further reduce memory usage.
You're probably asking a pretty good question right now: "how do
I know if I should use a strictness annotation on my data fields?"
This answer is slightly controversial, but my advice and
recommended best practice: unless you know that you want laziness
for a field, make it strict. Making your fields strict helps in a
few ways:
- Avoids accidental space leaks, like we're doing here
- Avoids accidentally including bottom values
- When constructing a value with record syntax, GHC will give you
an error if you forget a strict field. It will only give you a
warning for non-strict fields.
The curious case of
newtype
Let's define three very similar data types:
data Foo = Foo Int
data Bar = Bar !Int
newtype Baz = Baz Int
Let's play a game, and guess the output of the following
potential bodies for main
. Try to work through each
case in your head before reading the explanation below.
case undefined of { Foo _ -> putStrLn "Still alive!"
}
case Foo undefined of { Foo _ -> putStrLn "Still
alive!" }
case undefined of { Bar _ -> putStrLn "Still alive!"
}
case Bar undefined of { Bar _ -> putStrLn "Still
alive!" }
case undefined of { Baz _ -> putStrLn "Still alive!"
}
case Baz undefined of { Baz _ -> putStrLn "Still
alive!" }
Case (1) is relatively straightforward: we try to unwrap one
layer of data constructor (the Foo
) and find a bottom
value. So this thing throws an exception. The same thing applies to
(3).
(2) does not throw an exception. We have a
Foo
data constructor in our expression, and it
contains a bottom value. However, since there is no strictness
annotation on the Int
in Foo
, uwnrapping
the Foo
does not force evaluation of the
Int
, and therefore no exception is thrown. By
contrast, in (4), we do have a strictness annotation, and
therefore case
ing on Bar
throws an
exception.
What about newtype
s? What we know about
newtype
s is that they have no runtime representation.
Therefore, it's impossible for the Baz
data
constructor to be hiding an extra layer of bottomness. In other
words, Baz undefined
and undefined
are
indistinguishable. That may sound like Bar
at first,
but interestingly it's not.
You see, unwrapping a Baz
constructor can have no
effect on runtime behavior, since it was never there in the first
place. The pattern match inside (5), therefore, does
nothing. It is equivalent to case undefined of { _
-> putStrLn "Still alive!" }
. And since we're not
inspecting the undefined
at all (because we're using a
wildcard pattern and not a data constructor), no exception is
thrown.
Similarly, in case (6), we've applied a Baz
constructor to undefined
, but since it has no runtime
representation, it may as well not be there. So once again, no
exception is thrown.
EXERCISE What is the output of the program
main = Baz undefined `seq` putStrLn "Still alive!"
?
Why?
Convenience operators
and functions
It can be inconvenient, as you may have noticed already, to use
seq
and deepseq
all over the place. Bang
patterns help, but there are other ways to force evaluation.
Perhaps the most common is the $!
operator, e.g.:
mysum :: [Int] -> Int
mysum list0 =
go list0 0
where
go [] total = total
go (x:xs) total = go xs $! total + x
main = print $ mysum [1..1000000]
This forces evaluation of total + x
before
recursing back into the go
function, avoiding a space
leak. (EXERCISE: do the same thing with a bang pattern, and with
the seq
function.)
The $!!
operator is the same, except instead of
working with seq
, it uses deepseq
and
therefore evaluates to normal form.
import Control.DeepSeq
average :: [Int] -> Double
average list0 =
go list0 (0, 0)
where
go [] (total, count) = fromIntegral total / count
go (x:xs) (total, count) = go xs $!! (total + x, count + 1)
main = print $ average [1..1000000]
Another nice helper function is
force
. What this does is makes it that, when the
expression you're looking at is evaluated to WHNF, it's
actually evaluated to NF. For example, we can rewrite the
go
function above as:
go [] (total, count) = fromIntegral total / count
go (x:xs) (total, count) = go xs $! force (total + x, count + 1)
EXERCISE Define these convenience functions and
operators yourself in terms of seq
and
deepseq
.
Data structures
Alright, I swear that's all of the really complicated stuff. If
you've absorbed all of those details, the rest of this just follows
naturally and introduces a little bit more terminology to help us
understand things.
Let's start off slowly: what's the output of this program:
data List a = Cons a (List a) | Nil
main = Cons undefined undefined `seq` putStrLn "Hello World"
Well, using our principles from above: Cons undefined
undefined
is already in WHNF, since we've got the outermost
constructor available. So this program prints "Hello World",
without any exceptions. Cool. Now let's realize that
Cons
is the same as the :
data
constructor for lists, and see that the above is identical to:
main = (undefined:undefined) `seq` putStrLn "Hello World"
This tells me that lists are a lazy data structure: I have a
bottom value for the first element, a bottom value for the rest of
the list, and yet this first cell is not bottom. Let's try
something a little bit different:
data List a = Cons a !(List a) | Nil
main = Cons undefined undefined `seq` putStrLn "Hello World"
This is going to explode in our faces! We are now strict in the
tail of the list. However, the following is fine:
data List a = Cons a !(List a) | Nil
main = Cons undefined (Cons undefined Nil) `seq` putStrLn "Hello World"
With this definition of a list, we need to know all the details
about the list itself, but the values can remain undefined. This is
called spine strict. By contrast, we can also be strict in
the values and be value strict:
data List a = Cons !a !(List a) | Nil
main = Cons undefined (Cons undefined Nil) `seq` putStrLn "Hello World"
This will explode in our faces, as we'd expect.
There's one final definition of list you may be expecting, one
strict in values but not in the tail:
data List a = Cons !a (List a) | Nil
In practice, I'm aware of no data structures in Haskell that
follow this pattern, and therefore it doesn't have a name. (If
there are such data structures, and this does have a name,
please let me know, I'd be curious about the use cases for it.)
So standard lists are lazy. Let's look at a few other data
types:
Vectors
The vectors in Data.Vector
(also known as boxed
vectors) are spine strict. Assuming an import of import
qualified Data.Vector as V
, what would be the results of the
following programs?
main = V.fromList [undefined] `seq` putStrLn "Hello
World"
main = V.fromList (undefined:undefined) `seq` putStrLn
"Hello World"
main = V.fromList undefined `seq` putStrLn "Hello
World"
The first succeeds: we have the full spine of the vector
defined. The fact that it contains a bottom value is irrelevant.
The second fails, since the spine of the tail of the list is
undefined, making the spine undefined. And finally the third (of
course) fails, since the entire list is undefined.
Now let's look at unboxed vectors. Because of inference
issues, we need to help out GHC a little bit more. So starting with
this head of a program:
import qualified Data.Vector.Unboxed as V
fromList :: [Int] -> V.Vector Int
fromList = V.fromList
What happens with the three cases above?
main = fromList [undefined] `seq` putStrLn "Hello
World"
main = fromList (undefined:undefined) `seq` putStrLn
"Hello World"
main = fromList undefined `seq` putStrLn "Hello
World"
As you'd expect, (2) and (3) have the same behavior as with
boxed vectors. However, (1) also throws an exception, since
unboxed vectors are value strict, not just spine strict. The same
applies to storable and primitive vectors.
Unfortunately, to my knowledge, there is no definition of a
strict, boxed vector in a public library. Such a data type would be
useful to help avoid space leaks (such as the original question
that triggered this blog post).
Sets and Maps
If you look at the containers and unordered-containers packages,
you may have noticed that the Map-like modules come in
Strict
and Lazy
variants (e.g.,
Data.HashMap.Strict
and
Data.HashMap.Lazy
) while the Set-like modules do not
(e.g., Data.IntSet
). This is because all of
these containers are spine strict, and therefore must be strict in
the keys. Since a set only has keys, no separate values, it must
also be value strict.
A map, by contrast, has both keys and values. The lazy variants
of the map-like modules are spine-strict, value-lazy, whereas the
strict variants are both spine and value strict.
EXERCISE Analyze the
Data.Sequence.Seq
data type and classify it as either
lazy, spine strict, or value strict.
Function arguments
A function is considered strict in one of its arguments if, when
the function is applied to a bottom value for that argument, the
result is bottom. As we saw way above, +
for
Int
is strict in both of its arguments, since:
undefined + x
is bottom, and x +
undefined
is bottom.
By contrast, the const
function, defined as
const a b = a
, is strict in its first argument and
lazy in its second argument.
The :
data constructor for lists is lazy in both
its first and second argument. But if you have data List =
Cons !a !(List a) | Nil
, Cons
is strict in both
its first and second argument.
Folds
A common place to end up getting tripped up by laziness is
dealing with folds. The most infamous example is the
foldl
function, which lulls you into a false sense of
safety only to dash your hopes and destroy your dreams:
mysum :: [Int] -> Int
mysum = foldl (+) 0
main :: IO ()
main = print $ mysum [1..1000000]
This is so close to correct, and yet uses 53mb of resident
memory! The solution is but a tick away, using the strict left fold
foldl'
function:
import Data.List (foldl')
mysum :: [Int] -> Int
mysum = foldl' (+) 0
main :: IO ()
main = print $ mysum [1..1000000]
Why does the Prelude
expose a function
(foldl
) which is almost always the wrong one to
use?
But the important thing to note about almost all functions that
claim to be strict is that they are only strict to weak head normal
form. Pulling up our average
example from before, this
still has a space leak:
import Data.List (foldl')
average :: [Int] -> Double
average =
divide . foldl' add (0, 0)
where
divide (total, count) = fromIntegral total / count
add (total, count) x = (total + x, count + 1)
main :: IO ()
main = print $ average [1..1000000]
My advice is to use a helper data type with strict fields. But
perhaps you don't want to do that, and you're frustrated that there
is no foldl'
that evaluates to normal form.
Fortunately for you, by just throwing in a call to
force
, you can easily upgrade a WHNF fold into a NF
fold:
import Data.List (foldl')
import Control.DeepSeq (force)
average :: [Int] -> Double
average =
divide . foldl' add (0, 0)
where
divide (total, count) = fromIntegral total / count
add (total, count) x = force (total + x, count + 1)
main :: IO ()
main = print $ average [1..1000000]
Like a good plumber, force
patches that leak right
up!
Streaming data
One of the claims of streaming data libraries (like conduit) is
that they promote constant memory usage. This may make you think
that you can get away without worrying about space leaks. However,
all of the comments about WHNF vs NF mentioned above apply. To
prove the point, let's do average badly with conduit:
import Conduit
average :: Monad m => ConduitM Int o m Double
average =
divide <$> foldlC add (0, 0)
where
divide (total, count) = fromIntegral total / count
add (total, count) x = (total + x, count + 1)
main :: IO ()
main = print $ runConduitPure $ enumFromToC 1 1000000 .| average
You can test the memory usage of this with:
$ stack --resolver lts-9.3 ghc --package conduit-combinators -- Main.hs -O2
$ ./Main +RTS -s
EXERCISE Make this program run in constant
resident memory, by using:
- The
force
function
- Bang patterns
- A custom data type with strict fields
Chain reaction
Look at this super strict program. It's got a special
value-strict list data type. I've liberally sprinkled bang patterns
and calls to seq
throughout. I've used
$!
. How much memory do you think it uses?
#!/usr/bin/env stack
data StrictList a = Cons !a !(StrictList a) | Nil
strictMap :: (a -> b) -> StrictList a -> StrictList b
strictMap _ Nil = Nil
strictMap f (Cons a list) =
let !b = f a
!list' = strictMap f list
in b `seq` list' `seq` Cons b list'
strictEnum :: Int -> Int -> StrictList Int
strictEnum low high =
go low
where
go !x
| x == high = Cons x Nil
| otherwise = Cons x (go $! x + 1)
double :: Int -> Int
double !x = x * 2
evens :: StrictList Int
evens = strictMap double $! strictEnum 1 1000000
main :: IO ()
main = do
let string = "Hello World"
string' = evens `seq` string
putStrLn string
Look carefully, read the code well, and make a guess. Ready?
Good.
It uses 44kb of memory. "What?!" you may exclaim. "But this
thing has to hold onto a million Int
s in a strict
linked list!" Ehh... almost. It's true, our program is going to do
a hell of a lot of evaluation as soon as we force the
evens
value. And as soon as we force the
string'
value in main
, we'll force
evens
.
However, our program never actually forces evaluation of either
of these! If you look carefully, the last line in the program uses
the string
value. It never looks at
string'
or evens
. When executing our
program, GHC is only interested in performing the IO
actions it is told to perform by the main
function.
And main
only says something about putStrLn
string
.
This is vital to understand. You can build up as many chains of
evaluation using seq
and deepseq
as you
want in your program. But ultimately, unless you force evaluation
via some IO
action of the value at the top of the
chain, it will all remain an unevaluated thunk.
EXERCISES
- Change
putStrLn string
to putStrLn
string'
and see what happens to memory usage. (Then undo
that change for the other exercises.)
- Use a bang pattern in
main
somewhere to get the
great memory usage.
- Add a
seq
somewhere in the putStrLn
string
line to force the greater memory usage.
Further Reading
Sebastian Graf has written a great
post called analyzing this blog post. It goes into much more
detail on how GHC analyzes and optimizes cases of strictness. Or to
quote the author himself, "In this post, I want to offer another,
more surgical approach to plugging space leaks that works hand in
hand with optimizations carried out by the compiler." If that's
something you're interested in, I recommend checking it out.
Subscribe to our blog via email
Email subscriptions come from our Atom feed and are handled by Blogtrottr. You will only receive notifications of blog posts, and can unsubscribe any time.
Do you like this blog post and need help with Next Generation Software Engineering, Platform Engineering or Blockchain & Smart Contracts? Contact us.