Often times I'll receive or read questions online about "design
patterns" in Haskell. A common response is that Haskell doesn't
have them. What many languages address via patterns, in Haskell we
address via language features (like built-in immutability, lambdas,
laziness, etc). However, I believe there is still room for some
high-level guidance on structuring programs, which I'll loosely
refer to as a Haskell design pattern.
The pattern I'm going to describe today is what I've been
referring to as the "ReaderT pattern" for a few years now, at least
in informal discussions. I use it as the basis for the design of
Yesod's Handler
type, it describes the majority of the
Stack code base, and I've recommended it to friends, colleagues,
and customers regularly.
That said, this pattern is not universally held in the Haskell
world, and plenty of people design their code differently. So
remember that, like other articles
I've written, this is highly opinionated, but represents my
personal and FP Complete's best practices recommendations.
Let's summarize the pattern, and then get into details and
exceptions:
- Your application should define a core data type (call it
Env
if you want).
- This data type will contain all runtime configuration and
global functions that may be mockable (like logging functions or
database access).
- If you must have some mutable state, put it in
Env
as a mutable reference (IORef
, TVar
,
etc).
- Your application code will, in general, live in
ReaderT
Env IO
. Define it as type App = ReaderT Env IO
if you wish, or use a newtype wrapper instead of
ReaderT
directly.
- You can use additional monad transformers on occassion, but
only for small subsets of your application, and it's best if those
subsets are pure code.
- Optional: instead of directly using the
App
datatype, write your functions in terms of mtl-style typeclasses
like MonadReader
and MonadIO
, which will
allow you to recover some of the purity you think I just told you
to throw away with IO
and mutable references.
That's a lot to absorb all at once, some of it (like the mtl
typeclasses) may be unclear, and other parts (especially that
mutable reference bit) probably seems completely wrong. Let me
motivate these statements and explain the nuances.
Better globals
Let's knock out the easy parts of this. Global variables are
bad, and mutable globals are far worse. Let's say you have an
application which wants to configure its logging level (e.g.,
should we print or swallow DEBUG level messages?). There are three
common ways you might do this in Haskell:
- Use a compile-time flag to control which logging code gets
included in your executable
- Define a global value which reads a config file (or environment
variables) with
unsafePerformIO
- Read the config file in your
main
function and
then pass the value (explicitly, or implicitly via
ReaderT
) to the rest of your code
(1) is tempting, but experience tells me it's a terrible
solution. Every time you have conditional compilation in your
codebase, you're adding in a fractal of possible build failures. If
you have 5 conditionals, you now have 32 (2^5) possible build
configurations. Are you sure you have the right set of import
statements for all of those 32 configurations? It's just a pain to
deal with. Moreover, do you really want to decide at compile
time that you're not going to need debug information? I'd much
rather be able to flip a false
to true
in
a config file and restart my app to get more information while
debugging.
(By the way, even better than this is the ability to signal the
process while it's running to change debug levels, but we'll
leave that out for now. The ReaderT
+mutable variable
pattern is one of the best ways to achieve this.)
OK, so you've agreed with me that you shouldn't conditionally
compile. Now that you've written your whole application, however,
you're probably hesitant to have to rewrite it to thread through
some configuration value everywhere. I get that, it's a pain. So
you decide you'll just use unsafePerformIO
, since the
file will be read once, it's a pure-ish value for the entire
runtime, and everything seems mostly safe. However:
- You now have a slight level of non-determinism of where
exceptions will occur. If you have a missing or invalid config
file, where will the exception get thrown from? I'd much rather
that occur immediately on app initialization.
- Suppose you want to run one small part of the application with
louder debug information (because you know it's more likely to fail
thant the rest of your code). You basically can't do this at
all.
- Just wait until you use STM inside your config file parsing for
some reason, and then your config value first gets evaluated inside
a different STM block. (And there's no way that
could ever happen.)
- Every time you use
unsafePerformIO
, a kitten
dies.
It's time to just bite the bullet, define some Env
data type, put the config file values in it, and thread it through
your application. If you design your application from the beginning
like this: great, no harm done. Doing it later in application
development is certainly a pain, but the pain can be mitigated by
some of what I'll say below. And it is absolutely better to
suck up a little pain with mechanical code rewriting than be faced
with race conditions around unsafePerformIO
. Remember,
this is Haskell: we'd rather face compile time rather than runtime
pain.
Once you've accepted your fate and gone all-in on (3), you're on
easy street:
- Have some new config value to pass around? Easy, just augment
the
Env
type.
- Want to temporarily bump the log level? Use
local
and you're golden
You're now far less likely to resort to ugly hacks like CPP code
(1) or global variables (2), because you've eliminated the
potential pain from doing it the Right Way.
Initializing resources
The case of reading a config value is nice, but even nicer is
initializing some resources. Suppose you want to initialize a
random number generator, open up a Handle
for sending
log messages to, set up a database pool, or create a temporary
directory to store files in. These are all far more logical to do
inside main
than from some global position.
One advantage of the global variable approach for these kinds of
initializations is that it can be defered until the first time the
value is used, which is nice if you think some resources may not
always be needed. But if that's what you want, you can use an
approach like
runOnce
.
Avoiding WriterT and
StateT
Why in the world would I ever recommend mutable references as
anything but a last resort? We all know that purity in Haskell is
paramount, and mutability is the devil. And besides, we have these
wonderful things called WriterT
and
StateT
if we have some values that need to change over
time in our application, so why not use them?
In fact, early versions of Yesod did just that: they used a
StateT
kind of approach within Handler
to
allow you to modify the user session values and set response
headers. However, I switched over to mutable references quite a
while ago, and here's why:
Exception-survival If you have a runtime exception, you
will lose your state in WriterT
and
StateT
. Not so with a mutable reference: you can read
the last available state before the runtime exception was thrown.
We use this to great benefit in Yesod, to be able to set response
headers even if the response fails with something like a
notFound
.
False purity We say WriterT
and
StateT
are pure, and technically they are. But let's
be honest: if you have an application which is entirely living
within a StateT
, you're not getting the benefits of
restrained mutation that you want from pure code. May as well call
a spade a spade, and accept that you have a mutable variable.
Concurrency What's the result of put 4 >>
concurrently (modify (+ 1)) (modify (+ 2)) >> get
? You
may want to say that it will be 7
, but it
definitely won't be. Your options, depending on how
concurrently
is implemented with regards to the state
provided by StateT
, are 4
,
5
, or 6
. Don't believe me? Play around
with:
#!/usr/bin/env stack
import Control.Concurrent.Async.Lifted
import Control.Monad.State.Strict
main :: IO ()
main = execStateT
(concurrently (modify (+ 1)) (modify (+ 2)))
4 >>= print
The issue is that we need to clone the state from the parent
thread into both child threads, and then arbitrarily pick
which child state will survive. Or, if we want to, we can just
throw both child states away and continue with the original parent
state. (By the way, if you think the fact that this code compiles
is a bad thing, I agree, and suggest you use
Control.Concurrent.Async.Lifted.Safe
.)
Dealing with mutable state between different threads is a hard
problem, but StateT
doesn't fix the problem, it
hides it. If you use a mutable variable, you'll be forced to
think about this. What semantics do we want? Should we use an
IORef
, and stick to atomicModifyIORef
?
Should we use a TVar
? These are fair questions, and
ones we'll be forced to examine. For a TVar
-like
approach:
#!/usr/bin/env stack
import Control.Concurrent.Async.Lifted.Safe
import Control.Monad.Reader
import Control.Concurrent.STM
modify :: (MonadReader (TVar Int) m, MonadIO m)
=> (Int -> Int)
-> m ()
modify f = do
ref <- ask
liftIO $ atomically $ modifyTVar' ref f
main :: IO ()
main = do
ref <- newTVarIO 4
runReaderT (concurrently (modify (+ 1)) (modify (+ 2))) ref
readTVarIO ref >>= print
And you can even get a little fancier with prebaked
transformers.
WriterT is broken Don't forget that, as Gabriel Gonzalez
has demonstrated,
even the strict WriterT
has a space leak.
Caveats I still do use StateT
and
WriterT
sometimes. One prime example is Yesod's
WidgetT
, which is essentially a WriterT
sitting on top of HandlerT
. It makes sense in that
context because:
- The mutable state is expected to be modified for a small subset
of the application
- Although we can perform side-effects while building up the
widget, the widget construction itself is a morally pure
activity
- We don't need to let state survive an exception: if something
goes wrong, we'll send back an error page instead
- There's no good reason to use concurrency when constructing a
widget.
- Despite my space leak concerns, I thoroughly benchmarked
WriterT
against alternatives, and found it to be the
fastest for this use case. (Numbers beat reasoning.)
The other great exception to this rule is pure code. If you have
some subset of your application which can perform no
IO
but needs some kind of mutable state, absolutely,
100%, please use StateT
.
Avoiding ExceptT
I'm already strongly on
record as saying that ExceptT
over IO
is a bad idea. To briefly repeat myself: the contract of
IO
is that any exception can be thrown at any time, so
ExceptT
doesn't actually document possible exceptions,
it misleads. You can see that blog post for more details.
I'm rehashing this here because some of the downsides of
StateT
and WriterT
apply to
ExceptT
as well. For example, how do you handle
concurrency in an ExceptT
? With runtime exceptions,
the behavior is clear: when using concurrently
, if any
child thread throws an exception, the other thread is killed and
the exception rethrown in the parent. What behavior do you want
with ExceptT
?
Again, you can use ExceptT
from pure code where a
runtime exception is not part of the contract, just like you should
use StateT
in pure code. But once we've eliminated
StateT
, WriterT
, and ExceptT
from our main application transformers, we're left with...
Just ReaderT
And now you know why I call this "the ReaderT design pattern."
ReaderT
has a huge advantage over the other three
transformers listed: it has no mutable state. It's simply a
convenient manner of passing an extra parameter to all functions.
And even if that parameter contains mutable references, that
parameter itself is fully immutable. Given that:
- We get to ignore all of the state-overwriting issues I
mentioned with concurrency. Notice how we were able to use the
.Safe
module in the example above. That's because it
is actually safe to do concurrency with a
ReaderT
.
- Similarly, you can use the
monad-unlift
library package
- Deep monad transformer stacks are confusing. Knocking it all
down to just one transformer reduces complexity,
significantly.
- It's not just simpler for you. It's simpler for GHC too, which
tends to have a much better time optimizing one-layer
ReaderT
code versus 5-transformer-deep code.
By the way, once you've bought into ReaderT
, you
can just throw it away entirely, and manually pass your
Env
around. Most of us don't do that, because it feels
masochistic (imagine having to tell every call to
logDebug
where to get the logging function). But
if you're trying to write a simpler codebase that doesn't require
understanding of transformers, it's now within your grasp.
Has typeclass approach
Let's say we're going to expand our mutable variable example
above to include a logging function. It may look something like
this:
#!/usr/bin/env stack
import Control.Concurrent.Async.Lifted.Safe
import Control.Monad.Reader
import Control.Concurrent.STM
import Say
data Env = Env
{ envLog :: !(String -> IO ())
, envBalance :: !(TVar Int)
}
modify :: (MonadReader Env m, MonadIO m)
=> (Int -> Int)
-> m ()
modify f = do
env <- ask
liftIO $ atomically $ modifyTVar' (envBalance env) f
logSomething :: (MonadReader Env m, MonadIO m)
=> String
-> m ()
logSomething msg = do
env <- ask
liftIO $ envLog env msg
main :: IO ()
main = do
ref <- newTVarIO 4
let env = Env
{ envLog = sayString
, envBalance = ref
}
runReaderT
(concurrently
(modify (+ 1))
(logSomething "Increasing account balance"))
env
balance <- readTVarIO ref
sayString $ "Final balance: " ++ show balance
Your first reaction to this is probably that defining this
Env
data type for your application looks like overhead
and boilerplate. You're right, it is. Like I said above though,
it's better to just suck up some of the pain initially to make a
better long-term application development practice. Now let me
double down on that...
There's a bigger problem with this code: it's too
coupled. Our modify
function takes in an entire
Env
value, even though it never uses the logging
function. And similarly, logSomething
never uses the
mutable variable it's provided. Exposing too much state to a
function is bad:
- We can't, from the type signature, get any idea about what the
code is doing
- It's more difficult to test. In order to see if
modify
is doing the right thing, we need to provide it
some garbage logging function.
So let's double down on that boilerplate, and use the
Has
typeclass trick. This composes well with
MonadReader
and other mtl classes like
MonadThrow
and MonadIO
to allow us to
state exactly what our function needs, at the cost of having
to define a lot of typeclasses upfront. Let's see how this
looks:
#!/usr/bin/env stack
import Control.Concurrent.Async.Lifted.Safe
import Control.Monad.Reader
import Control.Concurrent.STM
import Say
data Env = Env
{ envLog :: !(String -> IO ())
, envBalance :: !(TVar Int)
}
class HasLog a where
getLog :: a -> (String -> IO ())
instance HasLog (String -> IO ()) where
getLog = id
instance HasLog Env where
getLog = envLog
class HasBalance a where
getBalance :: a -> TVar Int
instance HasBalance (TVar Int) where
getBalance = id
instance HasBalance Env where
getBalance = envBalance
modify :: (MonadReader env m, HasBalance env, MonadIO m)
=> (Int -> Int)
-> m ()
modify f = do
env <- ask
liftIO $ atomically $ modifyTVar' (getBalance env) f
logSomething :: (MonadReader env m, HasLog env, MonadIO m)
=> String
-> m ()
logSomething msg = do
env <- ask
liftIO $ getLog env msg
main :: IO ()
main = do
ref <- newTVarIO 4
let env = Env
{ envLog = sayString
, envBalance = ref
}
runReaderT
(concurrently
(modify (+ 1))
(logSomething "Increasing account balance"))
env
balance <- readTVarIO ref
sayString $ "Final balance: " ++ show balance
Holy creeping boilerplate batman! Yes, type signatures get
longer, rote instances get written. But our type signatures are now
deeply informative, and we can test our functions with ease,
e.g.:
main :: IO ()
main = hspec $ do
describe "modify" $ do
it "works" $ do
var <- newTVarIO (1 :: Int)
runReaderT (modify (+ 2)) var
res <- readTVarIO var
res `shouldBe` 3
describe "logSomething" $ do
it "works" $ do
var <- newTVarIO ""
let logFunc msg = atomically $ modifyTVar var (++ msg)
msg1 = "Hello "
msg2 = "World\n"
runReaderT (logSomething msg1 >> logSomething msg2) logFunc
res <- readTVarIO var
res `shouldBe` (msg1 ++ msg2)
And if defining all of these classes manually bothers you, or
you're just a big fan of the library, you're free to use lens:
#!/usr/bin/env stack
import Control.Concurrent.Async.Lifted.Safe
import Control.Monad.Reader
import Control.Concurrent.STM
import Say
import Control.Lens
import Prelude hiding (log)
data Env = Env
{ envLog :: !(String -> IO ())
, envBalance :: !(TVar Int)
}
makeLensesWith camelCaseFields ''Env
modify :: (MonadReader env m, HasBalance env (TVar Int), MonadIO m)
=> (Int -> Int)
-> m ()
modify f = do
env <- ask
liftIO $ atomically $ modifyTVar' (env^.balance) f
logSomething :: (MonadReader env m, HasLog env (String -> IO ()), MonadIO m)
=> String
-> m ()
logSomething msg = do
env <- ask
liftIO $ (env^.log) msg
main :: IO ()
main = do
ref <- newTVarIO 4
let env = Env
{ envLog = sayString
, envBalance = ref
}
runReaderT
(concurrently
(modify (+ 1))
(logSomething "Increasing account balance"))
env
balance <- readTVarIO ref
sayString $ "Final balance: " ++ show balance
In our case, where Env
doesn't have any immutable
config-style data in it, the advantages of the lens approach aren't
as apparent. But if you have some deeply nested config value, and
especially want to play around with using local
to
tweak some values in it throughout your application, the lens
approach can pay off.
So to summarize: this approach really is about biting the bullet
and absorbing some initial pain and boilerplate. I argue that the
myriad benefits you get from it during app development are well
worth it. Remember: you'll pay that upfront cost once,
you'll reap its rewards daily.
Regain purity
It's unfortunate that our modify
function has a
MonadIO
constraint on it. Even though our real
implementation requires IO
to perform side-effects
(specifically, to read and write a TVar
), we've now
infected all callers of the function by saying "we have the right
to perform any side-effects, including launching the
missiles, or worse, throwing a runtime exception." Can we regain
some level of purity? The answer is yes, it just requires a bit
more boilerplate:
#!/usr/bin/env stack
import Control.Concurrent.Async.Lifted.Safe
import Control.Monad.Reader
import qualified Control.Monad.State.Strict as State
import Control.Concurrent.STM
import Say
import Test.Hspec
data Env = Env
{ envLog :: !(String -> IO ())
, envBalance :: !(TVar Int)
}
class HasLog a where
getLog :: a -> (String -> IO ())
instance HasLog (String -> IO ()) where
getLog = id
instance HasLog Env where
getLog = envLog
class HasBalance a where
getBalance :: a -> TVar Int
instance HasBalance (TVar Int) where
getBalance = id
instance HasBalance Env where
getBalance = envBalance
class Monad m => MonadBalance m where
modifyBalance :: (Int -> Int) -> m ()
instance (HasBalance env, MonadIO m) => MonadBalance (ReaderT env m) where
modifyBalance f = do
env <- ask
liftIO $ atomically $ modifyTVar' (getBalance env) f
instance Monad m => MonadBalance (State.StateT Int m) where
modifyBalance = State.modify
modify :: MonadBalance m => (Int -> Int) -> m ()
modify f = do
modifyBalance f
logSomething :: (MonadReader env m, HasLog env, MonadIO m)
=> String
-> m ()
logSomething msg = do
env <- ask
liftIO $ getLog env msg
main :: IO ()
main = hspec $ do
describe "modify" $ do
it "works, IO" $ do
var <- newTVarIO (1 :: Int)
runReaderT (modify (+ 2)) var
res <- readTVarIO var
res `shouldBe` 3
it "works, pure" $ do
let res = State.execState (modify (+ 2)) (1 :: Int)
res `shouldBe` 3
describe "logSomething" $ do
it "works" $ do
var <- newTVarIO ""
let logFunc msg = atomically $ modifyTVar var (++ msg)
msg1 = "Hello "
msg2 = "World\n"
runReaderT (logSomething msg1 >> logSomething msg2) logFunc
res <- readTVarIO var
res `shouldBe` (msg1 ++ msg2)
It's silly in an example this short, since the entirety of the
modify
function is now in a typeclass. But with larger
examples, you can see how we'd be able to specify that entire
portions of our logic perform no arbitrary side-effects, while
still using the ReaderT
pattern to its fullest.
To put this another way: the function foo :: Monad m =>
Int -> m Double
may appear to be impure, because it lives
in a Monad
. But this isn't true: by giving it a
constraint of "any arbitrary instance of Monad
", we're
stating "this has no real side-effects." After all, the type above
unifies with Identity
, which of course is pure.
This example may seem a bit funny, but what about parseInt
:: MonadThrow m => Text -> m Int
? You may think
"that's impure, it throws a runtime exception." But the type
unifies with parseInt :: Text -> Maybe Int
, which
of course is pure. We've gained a lot of knowledge about our
function and can feel safe calling it.
So the take-away here is: if you can generalize your functions
to mtl-style Monad
constraints, do it, you'll regain a
lot of the benfits you'd have with purity.
Analysis
While the technique here is certainly a bit heavy-handed, for
any large-scale application or library development that cost will
be amortized. I've found the benefits of working in this style to
far outweigh the costs in many real world projects.
There are other problems it causes, like more confusing error
messages and more cognitive overhead for someone joining the
project. But in my experience, once someone is onboarded to the
approach, it works out well.
In addition to the concrete benefits I listed above, using this
approach automatically navigates you around many common monad
transformer stack pain points that you'll see people experiencing
in the real world. I encourage others to share their real-world
examples of them. I personally haven't hit those problems in a long
while, since I've stuck to this approach.
Post-publish updates
June 15, 2017 The comment below from Ashley about
ImplicitParams spawned a
Reddit discussion about the problems with that extension.
Please do read the discussion yourself, but the takeaway for me is
that MonadReader
is a better choice.
Subscribe to our blog via email
Email subscriptions come from our Atom feed and are handled by Blogtrottr. You will only receive notifications of blog posts, and can unsubscribe any time.
Do you like this blog post and need help with Next Generation Software Engineering, Platform Engineering or Blockchain & Smart Contracts? Contact us.