This is a debugging story told completely out of order. In order
to understand the ultimate bug, why it seemed to occur arbitrarily,
and the ultimate resolution, there's lots of backstory to cover. If
you're already deeply familiar with the inner workings of the
monad-control package, you can probably look at
a demonstration of the bad instance and move on. Otherwise,
prepare for a fun ride!
As usual, if you want to play along, we're going to be using
Stack's script interpreter
feature. Just save the snippets contents to a file and run with
stack filename.hs
. (It works with any snippet that
begins with #!/usr/bin/env stack
.)
Oh, and also: the confusion that this blog post demonstrates is
one of the reasons why I strongly recommend sticking to a ReaderT
env IO
monad transformer stack.
Trying in StateT
Let's start with some broken code (my favorite kind). It uses
the StateT
transformer and a function which may throw
a runtime exception.
#!/usr/bin/env stack
import Control.Monad.State.Strict
import Control.Exception
import Data.Typeable
data OddException = OddException !Int
deriving (Show, Typeable)
instance Exception OddException
mayThrow :: StateT Int IO Int
mayThrow = do
x <- get
if odd x
then lift $ throwIO $ OddException x
else do
put $! x + 1
return $ x `div` 2
main :: IO ()
main = runStateT (replicateM 2 mayThrow) 0 >>= print
Our problem is that we'd like to be able to recover from a
thrown exception. Easy enough we think, we'll just use
Control.Exception.try
to attempt to run the
mayThrow
action. Unfortunately, if I wrap up
mayThrow
with a try
, I get this highly
informative error message:
Main.hs:21:19: error:
• Couldn't match type ‘IO’ with ‘StateT Integer IO’
Expected type: StateT Integer IO ()
Actual type: IO ()
• In the first argument of ‘runStateT’, namely
‘(replicateM 2 (try mayThrow))’
In the first argument of ‘(>>=)’, namely
‘runStateT (replicateM 2 (try mayThrow)) 0’
In the expression:
runStateT (replicateM 2 (try mayThrow)) 0 >>= print
Oh, that makes sense: try
is specialized to
IO
, and our function is StateT Int IO
.
Our first instinct is probably to keep throwing lift
calls into our program until it compiles, since lift
seems to always fix monad transformer compilation errors. However,
try as you might, you'll never succeed. To understand why, let's
look at the (slightly specialized) type signature for
try
:
try :: IO a -> IO (Either OddException a)
If I apply lift
to this, I could end up with:
try :: IO a -> StateT Int IO (Either OddException a)
But there's no way to use lift
to modify the type
of the IO a
input. This is generally the case with the
lift
and liftIO
functions: they can deal
with monad values that are the output of a function, but not
the input to the function. (More precisely: the functions
are covariant and work on values in positive positions. We'd need
something contravariant to work on vlaues in negative positions.
You can read more
on this nomenclature in another blog post.)
Huh, I guess we're stuck. But then I remember that
StateT
is just defined as newtype StateT s m a =
StateT { runStateT :: s -> m (a,s)}
. So maybe I can write
a version of try
that works for a StateT
using the internals of the type.
tryStateT :: StateT Int IO a -> StateT Int IO (Either OddException a)
tryStateT (StateT f) = StateT $ \s0 -> do
eres <- try (f s0)
return $ case eres of
Left e -> (Left e, s0)
Right (a, s1) -> (Right a, s1)
Go ahead and plug that into our previous example, and you should
get the desired output:
([Right 0,Left (OddException 1)],1)
Let's break down in nauseating detail what that
tryStateT
function did:
- Unwrap the
StateT
data constructor from the
provided action to get a function f :: Int -> IO (a,
Int)
- Construct a new
StateT
value on the right hand
side by using the StateT
data constructor, and
capturing the initial state in the value s0 ::
Int
.
- Pass
s0
to f
to get an action
IO :: (a, Int)
, which will give the result and the
new, updated state.
- Wrap
f s0
with try
to allow us to
detect and recover from a runtime exception.
eres
has type Either OddException (a,
Int)
, and we pattern match on it.
- If we receive a
Right
/success value, we simply
wrap up the a
value in a Right
constructor together with the updated state.
- If we receive a
Left
/exception value, we wrap it
up the exception with a Left
. However, we need to
return some new state. Since we have no such state available
to us from the action, we return the only thing we can: the initial
s0
state value.
Lesson learned We can use try
in a
StateT
with some difficulty, but we need to be aware
of what happens to our monadic state.
Catching in StateT
It turns out that it's trivial to implement the try
function in terms of catch
, and the catch
function in terms of try
, at least when sticking to
the IO
-specialized versions:
try' :: Exception e => IO a -> IO (Either e a)
try' action = (Right <$> action) `catch` (return . Left)
catch' :: Exception e => IO a -> (e -> IO a) -> IO a
catch' action onExc = do
eres <- try action
case eres of
Left e -> onExc e
Right a -> return a
It turns out that by just changing the type signatures and
replacing try
with tryStateT
, we can do
the same thing for StateT
:
catchStateT :: Exception e
=> StateT Int IO a
-> (e -> StateT Int IO a)
-> StateT Int IO a
catchStateT action onExc = do
eres <- tryStateT action
case eres of
Left e -> onExc e
Right a -> return a
NOTE Pay close attention to that type signature, and
think about how monadic state is being shuttled through this
function.
Well, if we can implement catchStateT
in terms of
tryStateT
, surely we can implement it directly as
well. Let's do the most straightforward thing I can think of (or at
least the thing that continues my narrative here):
catchStateT :: Exception e
=> StateT Int IO a
-> (e -> IO a)
-> StateT Int IO a
catchStateT (StateT action) onExc = StateT $ \s0 ->
action s0 `catch` \e -> do
a <- onExc e
return (a, s0)
Here, we're basing our implementation on top of the
catch
function instead of the try
function. We do the same unwrap-the-StateT, capture-the-s0 trick we
did before. Now, in the lambda we've created for the
catch
call, we pass the e
exception value
to the user-supplied onExc
function, and then like
tryStateT
wrap up the result in a tuple with the
initial s0
.
Who noticed the difference in type signature? Instead of e
-> StateT Int IO a
, our onExc
handler has
type e -> IO a
. I told you to pay attention to how
the monadic states were being shuttled around; let's analyze
it:
- In the first function, we use
tryStateT
, which as
we mentioned will reconstitute the original s0
state
when it returns. If the action succeeded, nothing else happens. But
in the exception case, that original s0
is now passed
into the onExc
function, and the final monadic state
returned will be the result of the onExc
function.
- In the second function, we never give the
onExc
function a chance to play with monadic state, since it just lives
in IO
. So we always return the original state at the
end if an exception occurred.
Which behavior is best? I think most people would argue that the
first function is better: it's more general in allowing
onExc
to access and modify the monadic state, and
there's not really any chance for confusion. Fair enough, I'll buy
that argument (that I just made on behalf of all of my
readers).
Bonus exercise Modify this implementation of
catchStateT
to have the same type signature as the
original one.
Finally
This is fun, let's keep reimplementing functions from
Control.Exception
! This time, let's do
finally
, which will ensure that some action (usually a
cleanup action) is run after an initial action, regardless of
whether an exception was thrown.
finallyStateT :: StateT Int IO a
-> IO b
-> StateT Int IO a
finallyStateT (StateT action) cleanup = StateT $ \s0 ->
action s0 `finally` cleanup
That was really easy. Ehh, but one problem: look at that type
signature! We just agreed (or I agreed for you) that in the case of
catch
, it was better to have the second argument
also live in StateT Int IO
. Here, our argument
lives in IO
. Let's fix that:
finallyStateT :: StateT Int IO a
-> StateT Int IO b
-> StateT Int IO a
finallyStateT (StateT action) (StateT cleanup) = StateT $ \s0 ->
action s0 `finally` cleanup s0
Huh, also pretty simple. Let's analyze the monadic state
behavior here: our cleanup action is given the initial state,
regardless of the result of action s0
. That means
that, even if the action succeeded, we'll ignore the updated state.
Furthermore, because finally
ignores the result of the
second argument, we will ignore any updated monadic state. Want to
see what I mean? Try this out:
#!/usr/bin/env stack
import Control.Exception
import Control.Monad.State.Strict
finallyStateT :: StateT Int IO a
-> StateT Int IO b
-> StateT Int IO a
finallyStateT (StateT action) (StateT cleanup) = StateT $ \s0 ->
action s0 `finally` cleanup s0
action :: StateT Int IO ()
action = modify (+ 1)
cleanup :: StateT Int IO ()
cleanup = do
get >>= lift . print
modify (+ 2)
main :: IO ()
main = execStateT (action `finallyStateT` cleanup) 0 >>= print
You may expect the output of this to be the numbers 1 and 3, but
in fact the output is 0 and 1: cleanup
looks at the
initial state value of 0, and its + 2
modification is
thrown away. So can we implement a version of our function that
keeps the state? Sure (slightly simplified to avoid async
exception/mask noise):
finallyStateT :: StateT Int IO a
-> StateT Int IO b
-> StateT Int IO a
finallyStateT (StateT action) (StateT cleanup) = StateT $ \s0 -> do
(a, s1) <- action s0 `onException` cleanup s0
(_b, s2) <- cleanup s1
return (a, s2)
This has the expected output of 1 and 3. Looking at how it
works: we follow our same tricks, and pass in s0
to
action
. If an exception is thrown there, we once again
pass in s0
to cleanup
and ignore its
updated state (since we have no choice). However, in the success
case, we now pass in the updated state (s1
) to
cleanup
. And finally, our resulting state is the
result of cleanup
(s2
) instead of the
s1
produced by action
.
We have three different implementations of
finallyStateT
and two different type signatures. Let's
compare them:
- The first one (the
IO
version) has the advantage
that its type tells us exactly what's happening: the cleanup
has no access to the state at all. However, you can argue like we
did with catchStateT
that this is limiting and not
what people would expect the type signature to be.
- The second one (use the initial state for
cleanup
and then throw away its modified state) has the advantage that it's
logically consistent: whether cleanup
is called from a
success or exception code path, it does the exact same thing. On
the other hand, you can argue that it is surprising behavior that
state updates that can be preserved are being thrown
away.
- The third one (keep the state) has the reversed arguments of
the second one.
So unlike catchStateT
, I would argue that there's
not nearly as clear a winner with finallyStateT
. Each
approach has its relative merits.
One final point that seems almost not worth mentioning (hint:
epic foreshadowment incoming). The first version (IO
specialized) has an additional benefit of being ever-so-slightly
more efficient than the other two, since it doesn't need to deal
with the additional monadic state in cleanup
. With a
simple monad transformer like StateT
this performance
difference is hardly even worth thinking about. However, if we were
in a tight inner loop, and our monad stack was significantly more
complicated, you could imagine a case where the performance
difference was significant.
It's great that we understand StateT
so well, but
can we do anything for other transformers? It turns out that, yes,
we can for many transformers. (An exception is continuation-based
transformers, which you can read a bit about in passing in my ResourceT blog post from
last week.) Let's look at a few other examples of
finally
:
import Control.Exception
import Control.Monad.Writer
import Control.Monad.Reader
import Control.Monad.Except
import Data.Monoid
finallyWriterT :: Monoid w
=> WriterT w IO a
-> WriterT w IO b
-> WriterT w IO a
finallyWriterT (WriterT action) (WriterT cleanup) = WriterT $ do
(a, w1) <- action `onException` cleanup
(_b, w2) <- cleanup
return (a, w1 <> w2)
finallyReaderT :: ReaderT r IO a
-> ReaderT r IO b
-> ReaderT r IO a
finallyReaderT (ReaderT action) (ReaderT cleanup) = ReaderT $ \r -> do
a <- action r `onException` cleanup r
_b <- cleanup r
return a
finallyExceptT :: ExceptT e IO a
-> ExceptT e IO b
-> ExceptT e IO a
finallyExceptT (ExceptT action) (ExceptT cleanup) = ExceptT $ do
ea <- action `onException` cleanup
eb <- cleanup
return $ case (ea, eb) of
(Left e, _) -> Left e
(Right _a, Left e) -> Left e
(Right a, Right _b) -> Right a
The WriterT
case is very similar to the
StateT
case, except (1) there's no initial state
s0
to contend with, and (2) instead of receiving an
updated s2
state from cleanup
, we need to
monoidally combine the w1
and w2
values.
The ReaderT
case is also very similar to
StateT
, but in the opposite way: we receive an
immutable environment r
which is passed into all
functions, but there is no updated state. To put this in other
words: WriterT
has no context but has
mutable monadic state, whereas ReaderT
has a
context but no mutable monadic state. StateT
, by
contrast, has both. (This is important to understand, so reread it
a few times to get comfortable with the concept.)
The ExceptT
case is interesting: it has no context
(like WriterT
), but it does have mutable
monadic state, just not like StateT
and
WriterT
. Instead of returning an extra value with each
result (as a product), ExceptT
returns either a result
value or an e
value (as a sum). The case
expression at the end of finallyExceptT
is very
informative: we need to figure out how to combine the various
monadic states together. Our implementation here says that if
action
returns e
, we take that result.
Otherwise, if cleanup
fails, we take that
value. And if they both return Right
values, then we
use action
's result. But there are at least two other
valid choices:
- Prefer
cleanup
's e
value to
action
's e
value, if both are
available.
- Completely ignore the
e
value returned by
cleanup
, and just use action
's
result.
There's also a fourth, invalid option: if action
returns a Left
, return that immediately and don't call
cleanup
. This has been a perenniel source of bugs in
many libraries dealing with exceptions in monad transformers like
ErrorT
, ExceptT
, and
EitherT
. This invalidates the contract of
finally
, namely that cleanup
will always
be run. I've seen some arguments for why this can make sense, but I
consider it nothing more than a buggy implementation.
And finally, like with StateT
, we could avoid all
of these questions for ExceptT
if we just modify our
type signature to use IO b
for
cleanup
:
finallyExceptT :: ExceptT e IO a
-> IO b
-> ExceptT e IO a
finallyExceptT (ExceptT action) cleanup = ExceptT $ do
ea <- action `onException` cleanup
_b <- cleanup
return ea
So our takeaway: we can implement finally
for
various monad transformers. In some cases this leads to questions
of semantics, just like with StateT
. And all of these
transformers fall into a pattern of optionally capturing some
initial context, and optionally shuttling around some monadic
state.
(And no, I haven't forgotten that the title of this blog post
talks about bracket
. We're getting there, ever so
slowly. I hope I've piqued your curiosity.)
Generalizing the pattern
It's wonderful that we can implement all of these functions that
take monad transformers as arguments. But do any of us actually
want to go off and implement catch
, try
,
finally
, forkIO
, timeout
,
and a dozen other functions for every possible monad transformer
stack imagineable? I doubt it. So just as we have
MonadTrans
and MonadIO
for dealing with
transformers in output/positive position, we can construct some
kind of typeclass that handles the two concepts we mentioned above:
capture the context, and deal with the monadic state.
Let's start by playing with this for just
StateT
.
#!/usr/bin/env stack
import Control.Exception
import Control.Monad.State.Strict
type Run s = forall b. StateT s IO b -> IO (b, s)
capture :: forall s a.
(Run s -> IO a)
-> StateT s IO a
capture withRun = StateT $ \s0 -> do
let run :: Run s
run (StateT f) = f s0
a <- withRun run
return (a, s0)
restoreState :: (a, s) -> StateT s IO a
restoreState stateAndResult = StateT $ \_s0 -> return stateAndResult
finally1 :: StateT s IO a
-> IO b
-> StateT s IO a
finally1 action cleanup = do
x <- capture $ \run -> run action `finally` cleanup
restoreState x
finally2 :: StateT s IO a
-> StateT s IO b
-> StateT s IO a
finally2 action cleanup = do
x <- capture $ \run -> run action `finally` run cleanup
restoreState x
finally3 :: StateT s IO a
-> StateT s IO b
-> StateT s IO a
finally3 action cleanup = do
x <- capture $ \run -> run action `onException` run cleanup
a <- restoreState x
_b <- cleanup
return a
main :: IO ()
main = do
flip evalStateT () $ lift (putStrLn "here1") `finally1`
putStrLn "here2"
flip evalStateT () $ lift (putStrLn "here3") `finally2`
lift (putStrLn "here4")
flip evalStateT () $ lift (putStrLn "here5") `finally2`
lift (putStrLn "here6")
That's a lot, let's step through it slowly:
type Run s = forall b. StateT s IO b -> IO (b, s)
This is a helper type to make the following bit simpler. It
represents the concept of capturing the initial state in a general
manner. Given an action living in our transformer, it turns an
action in our base monad, returning the entire monadic state with
the return value (i.e., (b, s)
instead of just
b
). This allows use to define our capture
function:
capture :: forall s a.
(Run s -> IO a)
-> StateT s IO a
capture withRun = StateT $ \s0 -> do
let run :: Run s
run (StateT f) = f s0
a <- withRun run
return (a, s0)
This function says "you give me some function that needs to be
able to run monadic actions with the initial context, and I'll give
it that initial context running function (Run s
)." The
implementation isn't too bad: we just capture the s0
,
create a run
function out of it, pass that into the
user-provided argument, and then return the result with the
original state.
Now we need some way to update the monadic state based on a
result value. We call it restoreState
:
restoreState :: (a, s) -> StateT s IO a
restoreState stateAndResult = StateT $ \_s0 -> return stateAndResult
Pretty simple too: we ignore our original monadic state and
replace it with the state contained in the argument. Next we use
these two functions to implement three versions of
finally
. The first two are able to reuse the
finally
from Control.Exception
. However,
both of them suffer from the inability to retain monadic state. Our
third implementation fixes that, at the cost of having to
reimplement the logic of finally
. And as my comment
there mentions, our implementation is not in fact async exception
safe.
So all of our original trade-offs apply from our initial
StateT
discussion, but now there's an additional
downside to option 3: it's significantly more complicated to
implement correctly.
The MonadIOControl type
class
Alright, we've established that it's possible to capture this
idea for StateT
. Let's generalize to a typeclass.
We'll need three components:
- A capture function. We'll call it
liftIOWith
, to
match nomenclature in monad-control.
- A restore function, which we'll call
restoreM
.
- An associated type (type family) to represent what the
monadic state for the given monad stack is.
We end up with:
type RunInIO m = forall b. m b -> IO (StM m b)
class MonadIO m => MonadIOControl m where
type StM m a
liftIOWith :: (RunInIO m -> IO a) -> m a
restoreM :: StM m a -> m a
Let's write an instance for IO
:
instance MonadIOControl IO where
type StM IO a = a
liftIOWith withRun = withRun id
restoreM = return
The type StM IO a = a
says that, for an
IO
action returning a
, the full monadic
state is just a
. In other words, there is no
additional monadic state hanging around. That's good, as we know
that there isn't. liftIOWith
is able to just use
id
as the RunInIO
function, since you can
run an IO
action in IO
directly. And
finally, since there is no monadic state to update,
restoreM
just wraps up the result value in
IO
via return
. (More foreshadowment: what
this instance is supposed to look like is actually at the core of
the bug this blog post will eventually talk about.)
Alright, let's implement this instance for StateT s
IO
:
instance MonadIOControl (StateT s IO) where
type StM (StateT s IO) a = (a, s)
liftIOWith withRun = StateT $ \s0 -> do
a <- withRun $ \(StateT f) -> f s0
return (a, s0)
restoreM stateAndResult = StateT $ \_s0 -> return stateAndResult
This is basically identical to the functions we defined above,
so I won't dwell on it here. But here's an interesting observation:
the same way we define MonadIO
instance as
instance MonadIO m => MonadIO (StateT s m)
, it
would be great to do the same thing for
MonadIOControl
. And, in fact, we can do just that!
instance MonadIOControl m => MonadIOControl (StateT s m) where
type StM (StateT s m) a = StM m (a, s)
liftIOWith withRun = StateT $ \s0 -> do
a <- liftIOWith $ \run -> withRun $ \(StateT f) -> run $ f s0
return (a, s0)
restoreM x = StateT $ \_s0 -> restoreM x
We use the underlying monad's liftIOWith
and
restoreM
functions within our own definitions, and
thereby get context and state passed up and down the stack as
needed. Alright, let's go ahead and do this for all of the
transformers we've been discussing:
#!/usr/bin/env stack
import Control.Exception
import Control.Monad.State.Strict
import Control.Monad.Writer
import Control.Monad.Reader
import Control.Monad.Except
import Data.Monoid
import Data.IORef
type RunInIO m = forall b. m b -> IO (StM m b)
class MonadIO m => MonadIOControl m where
type StM m a
liftIOWith :: (RunInIO m -> IO a) -> m a
restoreM :: StM m a -> m a
instance MonadIOControl IO where
type StM IO a = a
liftIOWith withRun = withRun id
restoreM = return
instance MonadIOControl m => MonadIOControl (StateT s m) where
type StM (StateT s m) a = StM m (a, s)
liftIOWith withRun = StateT $ \s0 -> do
a <- liftIOWith $ \run -> withRun $ \(StateT f) -> run $ f s0
return (a, s0)
restoreM x = StateT $ \_s0 -> restoreM x
instance (MonadIOControl m, Monoid w) => MonadIOControl (WriterT w m) where
type StM (WriterT w m) a = StM m (a, w)
liftIOWith withRun = WriterT $ do
a <- liftIOWith $ \run -> withRun $ \(WriterT f) -> run f
return (a, mempty)
restoreM x = WriterT $ restoreM x
instance MonadIOControl m => MonadIOControl (ReaderT r m) where
type StM (ReaderT r m) a = StM m a
liftIOWith withRun = ReaderT $ \r ->
liftIOWith $ \run -> withRun $ \(ReaderT f) -> run $ f r
restoreM x = ReaderT $ \r -> restoreM x
instance MonadIOControl m => MonadIOControl (ExceptT e m) where
type StM (ExceptT e m) a = StM m (Either e a)
liftIOWith withRun = ExceptT $ do
a <- liftIOWith $ \run -> withRun $ \(ExceptT f) -> run f
return $ Right a
restoreM x = ExceptT $ restoreM x
control :: MonadIOControl m => (RunInIO m -> IO (StM m a)) -> m a
control f = do
x <- liftIOWith f
restoreM x
checkControl :: MonadIOControl m => m ()
checkControl = control $ \run -> do
ref <- newIORef (0 :: Int)
let ensureIs :: MonadIO m => Int -> m ()
ensureIs expected = liftIO $ do
putStrLn $ "ensureIs " ++ show expected
curr <- atomicModifyIORef ref $ \curr -> (curr + 1, curr)
unless (curr == expected) $ error $ show ("curr /= expected", curr, expected)
ensureIs 0
Control.Exception.mask $ \restore -> do
ensureIs 1
res <- restore (ensureIs 2 >> run (ensureIs 3) `finally` ensureIs 4)
ensureIs 5
return res
main :: IO ()
main = do
checkControl
runStateT checkControl () >>= print
runWriterT checkControl >>= (print :: ((), ()) -> IO ())
runReaderT checkControl ()
runExceptT checkControl >>= (print :: Either () () -> IO ())
I encourage you to inspect each of the instances above and make
sure you're comfortable with their implementation. I've added a
function here, checkControl
, as a basic sanity check
of our implementation. We start with the control
helper function, which runs some action with a RunInIO
argument, and then restores the monadic state. Then we use this
function in checkControl
to ensure that a series of
actions are all run in the correct order. As you can see, all of
our test monads pass (again, foreshadowment).
The real monad-control package looks pretty similar to this,
except:
- Instead of
MonadIOControl
, which is hard-coded to
using IO
as a base monad, it provides a
MonadBaseControl
typeclass, which allows arbitrary
base monads (like ST
or STM
).
- Just as
MonadBaseControl
is an analogue of
MonadIO
, the package provides
MonadTransControl
as an analogue of
MonadTrans
, allowing you to unwrap one layer in a
monad stack.
With all of this exposition out of the way—likely the longest
exposition I've ever written in any blog post—we can start dealing
with the actual bug. I'll show you the full context eventually, but
I was asked to help debug a function that looked something like
this:
fileLen1 :: (MonadThrow m, MonadBaseControl IO m, MonadIO m)
=> FilePath
-> m Int
fileLen1 fp = runResourceT
$ runConduit
$ sourceFile fp
.| lengthCE
This is fairly common in Conduit code. We're going to use
sourceFile
, which needs to allocate some resources.
Since we can't safely allocate resources from within a Conduit
pipeline, we start off with runResourceT
to allow
Conduit to register cleanup actions. (This combination is so common
that we have a helper function runConduitRes = runResourceT .
runConduit
.)
Unfortunately, this innocuous-looking like of code was
generating an error message:
Control.Monad.Trans.Resource.register': The mutable state is being accessed after cleanup. Please contact the maintainers.
The "Please contact the maintainers." line should probably be
removed from the resourcet package; it was from back in a time when
we thought this bug was most likely to indicate an implementation
bug within resourcet. That's no longer the case... which hopefully
this debugging adventure will help demonstrate.
Anyway, as last week's blog post on ResourceT explained,
runResourceT
creates a mutable variable to hold a list
of cleanup actions, allows the inner action to register cleanup
values into that mutable variable, and then when
runResourceT
is exiting, it calls all those cleanup
actions. And as a last sanity check, it replaces the value inside
that mutable variable with a special value indicating that the
state has already been closed, and it is therefore invalid to
register further cleanup actions.
In well-behaved code, the structure of our
runResourceT
function should prevent the mutable state
from being accessible after it's closed, though I mention some
cases last week that could cause that to happen (specifically,
misuse of concurrency and the transPipe
function).
However, after thoroughly exploring the codebase, I could find no
indication that either of these common bugs had occurred.
Internally, runResourceT
is essentially a
bracket
call, using the
createInternalState
function to allocate the mutable
variable, and closeInternalState
to clean it up. So I
figured I could get a bit more information about this bug by using
the bracket
function from
Control.Exception.Lifted
and implementing:
fileLen2 :: (MonadThrow m, MonadBaseControl IO m, MonadIO m)
=> FilePath
-> m Int
fileLen2 fp = Lifted.bracket
createInternalState
closeInternalState
$ runInternalState
$ runConduit
$ sourceFile fp
.| lengthCE
Much to my chagrin, the bug disappeared! Suddenly the code
worked perfectly. Beginning to question my sanity, I decided to
look at the implementation of runResourceT
, and found
this:
runResourceT :: MonadBaseControl IO m => ResourceT m a -> m a
runResourceT (ResourceT r) = control $ \run -> do
istate <- createInternalState
E.mask $ \restore -> do
res <- restore (run (r istate)) `E.onException`
stateCleanup ReleaseException istate
stateCleanup ReleaseNormal istate
return res
Ignoring the fact that we differentiate between exception and
normal cleanup in the stateCleanup
function, I was
struck by one question: why did I decide to implement this with
control
in a manual, error-prone way instead of using
the bracket
function directly? I began to worry that
there was a bug in this implementation leading to all of the
problems.
However, after reading through this implementation many times, I
convinced myself that it was, in fact, correct. And then I realized
why I had done it this way. Both createInternalState
and stateCleanup
are functions that can live in
IO
directly, without any need of a monad transformer
state. The only function that needed the monad transformer logic
was that contained in the ResourceT
itself.
If you remember our discussion above, there were two major
advantages of the implementation of finally
which
relied upon IO
for the cleanup function instead of
using the monad transformer state:
- It was much more explicit about how monadic state was going to
be handled.
- It gave a slight performance advantage.
With the downside being that the type signature wasn't quite
what people normally expected. Well, that downside didn't apply in
my case: I was working on an internal function in a library, so I
was free to ignore what a user-friendly API would look like. The
advantage of explicitness around monadic state certainly appealed
in a library that was so sensitive to getting things right. And
given how widely used this function is, and the deep monadic stacks
it was sometimes used it, any performance advantage was worth
pursuing.
Alright, I felt good about the fact that
runResourceT
was implemented correctly. Just to make
sure I wasn't crazy, I reimplemented fileLen
to use an
explicit control
instead of
Lifted.bracket
, and the bug reappeared:
fileLen3 :: forall m.
(MonadThrow m, MonadBaseControl IO m, MonadIO m)
=> FilePath
-> m Int
fileLen3 fp = control $ \run -> do
istate <- createInternalState
res <- run (runInternalState inner istate)
`onException` closeInternalState istate
closeInternalState istate
return res
where
inner :: ResourceT m Int
inner = runConduit $ sourceFile fp .| lengthCE
And as one final sanity check, I implemented
fileLen4
to use the generalized style of
bracket
, where the allocation and cleanup functions
live in the monad stack instead of just IO
, and as
expected the bug disappeared again. (Actually, I didn't really do
this. I'm doing it now for the purpose of this blog post.)
fileLen4 :: forall m.
(MonadThrow m, MonadBaseControl IO m, MonadIO m)
=> FilePath
-> m Int
fileLen4 fp = control $ \run -> bracket
(run createInternalState)
(\st -> run $ restoreM st >>= closeInternalState)
(\st -> run $ restoreM st >>= runInternalState inner)
where
inner :: ResourceT m Int
inner = runConduit $ sourceFile fp .| lengthCE
Whew, OK! So it turns out that my blog post title was correct:
this is a tale of two brackets. And somehow, one of them
triggers a bug, and one of them doesn't. But I still didn't know
quite how that happened.
The culprit
Another member of the team tracked down the ultimate problem to
a datatype that looked like this (though not actually named
Bad
, that would have been too obvious):
newtype Bad a = Bad { runBad :: IO a }
deriving (Functor, Applicative, Monad, MonadIO, MonadThrow, MonadBase IO)
instance MonadBaseControl IO Bad where
type StM Bad a = IO a
liftBaseWith withRun = Bad $ withRun $ return . runBad
restoreM = Bad
That's the kind of code that can easily pass a code review
without anyone noticing a thing. With all of the context from this
blog post, you may be able to understand why I've called this type
Bad
. Go ahead and give it a few moments to try and
figure it out.
OK, ready to see how this plays out? The StM Bad a
associated type is supposed to contain the result value of the
underlying monad, together with any state introduced by this monad.
Since we just have a newtype around IO
, there should
be no monadic state, and we should just have a
.
However, we've actually defined it as IO a
,
which means "my monadic state for a value a
is an
IO
action which will return an a
." The
implementation of liftBaseWith
and
restoreM
are simply in line with making the types work
out.
Let's look at fileLen3
understanding that this is
the instance in question. I'm also going to expand the
control
function to make it easier to see what's
happening.
res <- liftBaseWith $ \run -> do
istate <- createInternalState
res <- run (runInternalState inner istate)
`onException` closeInternalState istate
closeInternalState istate
return res
restoreM res
If we play it a little loose with newtype wrappers, we can
substitute in the implementations of liftBaseWith
and
restoreM
to get:
res <- Bad $ do
let run = return . runBad
istate <- createInternalState
res <- run (runInternalState inner istate)
`onException` closeInternalState istate
closeInternalState istate
return res
Bad res
Let's go ahead and substitute in our run
function
in the one place it's used:
res <- Bad $ do
istate <- createInternalState
res <- return (runBad (runInternalState inner istate))
`onException` closeInternalState istate
closeInternalState istate
return res
Bad res
If you look at the code return x `onException` foo
,
it's pretty easy to establish that return
itself will
never throw an exception in IO
, and therefore the
onException
it useless. In other words, the code is
equivalent to just return x
. So again
substituting:
res <- Bad $ do
istate <- createInternalState
res <- return (runBad (runInternalState inner istate))
closeInternalState istate
return res
Bad res
And since foo <- return x
is just let foo
= x
, we can turn this into:
res <- Bad $ do
istate <- createInternalState
closeInternalState istate
return (runBad (runInternalState inner istate))
Bad res
And then:
Bad $ do
istate <- createInternalState
closeInternalState istate
Bad (runBad (runInternalState inner istate))
And finally, just to drive the point home:
istate <- Bad createInternalState
Bad $ closeInternalState istate
runInternalState inner istate
So who wants to take a guess why the mutable variable was closed
before we ever tried to register? Because that's exactly what
our MonadBaseControl
instance said! The problem is
that instead of our monadic state just being some value, it was the
entire action we needed to run, which was now being deferred
until after we called closeInternalState
. Oops.
What about the other
bracket?
Now let's try to understand why fileLen4
worked,
despite the broken MonadBaseControl
instance. Again,
starting with the original code after replacing
control
with liftBaseWith
and
restoreM
:
res <- liftBaseWith $ \run -> bracket
(run createInternalState)
(\st -> run $ restoreM st >>= closeInternalState)
(\st -> run $ restoreM st >>= runInternalState inner)
restoreM res
This turns into:
res <- Bad $ bracket
(return $ runBad createInternalState)
(\st -> return $ runBad $ Bad st >>= closeInternalState)
(\st -> return $ runBad $ Bad st >>= runInternalState inner)
Bad res
Since this case is a bit more involved than the previous one,
let's strip off the noise of Bad
and
runBad
calls, since they're just wrapping/unwrapping a
newtype:
res <- bracket
(return createInternalState)
(\st -> return $ st >>= closeInternalState)
(\st -> return $ st >>= runInternalState inner)
res
To decompose this mess, let's look at the actual implementation
of bracket
from base
:
bracket before after thing =
mask $ \restore -> do
a <- before
r <- restore (thing a) `onException` after a
_ <- after a
return r
We're going to ignore async exceptions for now, and therefore
just mentally delete the mask $ \restore
bit. We end
up with:
res <- do
a <- return createInternalState
r <- return (a >>= runInternalState inner) `onException`
return (a >>= closeInternalState)
_ <- return (a >>= closeInternalState)
return r
res
As above, we know that our return x `onException`
foo
will never actually trigger the exception case. Also,
a <- return x
is the same as let a =
x
. So we can simplify to:
res <- do
let a = createInternalState
let r = a >>= runInternalState inner
_ <- return (a >>= closeInternalState)
return r
res
Also, _ <- return x
has absolutely no impact at
all, so we can delete that line (and any mention of
closeInternalState
):
res <- do
let a = createInternalState
let r = a >>= runInternalState inner
return r
res
And then with a few more simply conversions, we end up with:
createInternalState >>= runInternalState inner
No wonder this code "worked": it never bothered trying to clean
up! This could have easily led to complete leaking of resources in
the application. Only the fact that our runResourceT
function thankfully stressed the code in a different way did we
reveal the problem.
What's the right instance?
It's certainly possible to define a correct newtype wrapper
around IO
:
newtype Good a = Good { runGood :: IO a }
deriving (Functor, Applicative, Monad, MonadIO, MonadThrow, MonadBase IO)
instance MonadBaseControl IO Good where
type StM Good a = a
liftBaseWith withRun = Good $ withRun runGood
restoreM = Good . return
Unfortunately we can't simply use
GeneralizedNewtypeDeriving
to make this instance due
to the associated type family. But the explicitness here helps us
understand what we did wrong before. Note that our type StM
Good a
is just a
, not IO a
. We
then implement the helper functions in terms of that. If you go
through the same substitution exercise I did above, you'll see
that—instead of passing around values which contain the actions to
actually perform—our fileLen3
and
fileLen4
functions will be performing the actions at
the appropriate time.
I'm including the full test program at the end of this post for
you to play with.
Takeaways
So that blog post was certainly all over the place. I hope the
primary thing you take away from it is a deeper understanding of
how monad transformer stacks interact with operations in the base
monad, and how monad-control works in general. In particular, next
time you call finally
on some five-layer-deep stack,
maybe you'll think twice about the implication of calling
modify
or tell
in your cleanup
function.
Another possible takeaway you may have is "Haskell's crazy
complicated, this bug could happen to anyone, and it's almost
undetectable." It turns out that there's a really simple workaround
for that: stick to standard monad transformers whenever possible.
monad-control is a phenomonal library, but I don't think most
people should ever have to interact with it directly. Like async
exceptions and unsafePerformIO
, there are parts of our
library ecosystem that require them, but you should stick to
higher-level libraries that hide that insanity from you, the same
way we use higher-level languages to avoid having to write
assembly.
Finally, having to think about all of the monadic state stuff in
my code gives me a headache. It's possible for us to have a library
like lifted-base
, but which constrains functions to
only taking one argument in the m
monad and the rest
in IO
to avoid the multiple-state stuff. However, my
preferred solution is to avoid wherever possible monad transformers
that introduce monadic state, and stick to ReaderT
like things for the majority of my application. (Yes, this is
another pitch for my ReaderT design
pattern.)
Full final source code
#!/usr/bin/env stack
import Control.Monad.Trans.Control
import Control.Monad.Trans.Resource
import Control.Exception.Safe
import qualified Control.Exception.Lifted as Lifted
import Conduit
newtype Bad a = Bad { runBad :: IO a }
deriving (Functor, Applicative, Monad, MonadIO, MonadThrow, MonadBase IO)
instance MonadBaseControl IO Bad where
type StM Bad a = IO a
liftBaseWith withRun = Bad $ withRun $ return . runBad
restoreM = Bad
newtype Good a = Good { runGood :: IO a }
deriving (Functor, Applicative, Monad, MonadIO, MonadThrow, MonadBase IO)
instance MonadBaseControl IO Good where
type StM Good a = a
liftBaseWith withRun = Good $ withRun runGood
restoreM = Good . return
fileLen1 :: (MonadThrow m, MonadBaseControl IO m, MonadIO m)
=> FilePath
-> m Int
fileLen1 fp = runResourceT
$ runConduit
$ sourceFile fp
.| lengthCE
fileLen2 :: (MonadThrow m, MonadBaseControl IO m, MonadIO m)
=> FilePath
-> m Int
fileLen2 fp = Lifted.bracket
createInternalState
closeInternalState
$ runInternalState
$ runConduit
$ sourceFile fp
.| lengthCE
fileLen3 :: forall m.
(MonadThrow m, MonadBaseControl IO m, MonadIO m)
=> FilePath
-> m Int
fileLen3 fp = control $ \run -> do
istate <- createInternalState
res <- run (runInternalState inner istate)
`onException` closeInternalState istate
closeInternalState istate
return res
where
inner :: ResourceT m Int
inner = runConduit $ sourceFile fp .| lengthCE
fileLen4 :: forall m.
(MonadThrow m, MonadBaseControl IO m, MonadIO m)
=> FilePath
-> m Int
fileLen4 fp = control $ \run -> bracket
(run createInternalState)
(\st -> run $ restoreM st >>= closeInternalState)
(\st -> run $ restoreM st >>= runInternalState inner)
where
inner :: ResourceT m Int
inner = runConduit $ sourceFile fp .| lengthCE
main :: IO ()
main = do
putStrLn "fileLen1"
tryAny (fileLen1 "/usr/share/dict/words") >>= print
tryAny (runBad (fileLen1 "/usr/share/dict/words")) >>= print
tryAny (runGood (fileLen1 "/usr/share/dict/words")) >>= print
putStrLn "fileLen2"
tryAny (fileLen2 "/usr/share/dict/words") >>= print
tryAny (runBad (fileLen2 "/usr/share/dict/words")) >>= print
tryAny (runGood (fileLen2 "/usr/share/dict/words")) >>= print
putStrLn "fileLen3"
tryAny (fileLen3 "/usr/share/dict/words") >>= print
tryAny (runBad (fileLen3 "/usr/share/dict/words")) >>= print
tryAny (runGood (fileLen3 "/usr/share/dict/words")) >>= print
putStrLn "fileLen4"
tryAny (fileLen4 "/usr/share/dict/words") >>= print
tryAny (runBad (fileLen4 "/usr/share/dict/words")) >>= print
tryAny (runGood (fileLen4 "/usr/share/dict/words")) >>= print
Bonus exercise Take the checkControl
function I provided above, and use it in the Good
and
Bad
monads. See what the result is, and if you can
understand why that's the case.
Subscribe to our blog via email
Email subscriptions come from our Atom feed and are handled by Blogtrottr. You will only receive notifications of blog posts, and can unsubscribe any time.
Do you like this blog post and need help with Next Generation Software Engineering, Platform Engineering or Blockchain & Smart Contracts? Contact us.