This blog post will use mostly Rust and Haskell code snippets to demonstrate its points. But I don't believe the core point is language-specific at all.

Here's a bit of Rust code to read the contents of input.txt and print it to stdout. What's wrong with it?

fn main() {
    let s = std::fs::read_to_string("input.txt").unwrap();
    println!("{}", s);
}

If you're Rust-fluent, that .unwrap() may stick out to you like a sore thumb. You know it means "convert any error that occurred into a panic." And panics are a Bad Thing. It's not correct error handling. Instead, something like this is "better":

fn main() {
    match std::fs::read_to_string("input.txt") {
        Ok(s) => println!("{}", s),
        Err(e) => eprintln!("Unable to read from input.txt: {:?}", e),
    }
}

The presence of enums in Rust makes it really easy to ensure you properly handle all failure cases fully. The code above will not panic. If an I/O error occurs, such as file not found, permissions denied, or a hardware failure, it will print an error message to stderr. But this still isn't good error handling, for two reasons:

  1. The exit code of the program doesn't indicate an error occurred. We'd need to use something like abort to fix that, which isn't too hard. But it's something else to remember.
  2. This is very verbose! We've got a trivial little program here, and we're obscuring the actual behavior of the program with all of this line noise around matching different enum variants.

Fortunately, the Rust language is benevolent, and it makes it possible to do things even better than before. The ? operator will try to do something, and automatically short-circuit if an error occurs. We now get to avoid those pesky panics without cluttering our code. And we get the proper exit code to boot!

fn main() -> Result<(), std::io::Error> {
    let s = std::fs::read_to_string("input.txt")?;
    println!("{}", s);
    Ok(())
}

All is good in the world, we can stop this post here and go home. The greatest marvel of error handling has arrived!

Look again

So it turns out I forgot to create my input.txt file. Let's see the beautiful error message generated by my program:

Error: Os { code: 2, kind: NotFound, message: "The system cannot find the file specified." }

Huh... that's thoroughly unhelpful. In my 5-line program, it's trivial enough to figure out which file doesn't exist. But imagine a 5,000 line program. Or if the code in question is in a dependency. Or if you're a member of the ops team, have never written a line of Rust in your life, don't have access to the codebase, the production server is down at 2am, and you see this error message in your logs.

Runtime exceptions to the rescue?

Well, obviously this is just because Rust uses error returns instead of Good Ol' Runtime Exceptions. Obviously something like Haskell solves this problem better, right? Well, sort of. With this program, and no input.txt:

main = do
  s <- readFile "input.txt"
  putStrLn s

I do in fact get a much nicer error message:

input.txt: openFile: does not exist (No such file or directory)

I didn't even need to include any error handling logic in the code; it's all implicit! But in reality, the clarity of this error message has little to do with exception handling semantics. It has to do with the construction of this specific error message. It contains enough information to help debug this.

But there are plenty of counterexamples in Haskell. Calling head on an empty list provides a line number these days, but you used to just get an error that "oops, tried to head an empty list, somewhere, in one of your libraries. Good luck!" Some low-level network functionality still gives vague error messages.

And even the glorious does not exist message above is only marginally useful. And that's because of...

Context!

In a trivial 2-line program, the reality is that "file not found" without any additional information is perfectly reasonable. That's because I know exactly the context in which the error occurred. It either occurred on line 1, or line 2. By contrast, in a 500k SLOC codebase, knowing that input.txt doesn't exist is probably not nearly enough to debug things.

Similarly, knowing that I can't connect to IP address 255.813.20.1 may be sufficient in a small network test. But in a reasonably complicated program, I'd much rather get the context that I'm trying to make an HTTPS request to example.com proxied through a server with IP address 255.813.20.1, which was specified via the HTTP_PROXY environment variable. That last bit of information may shortcircuit days of debugging to point out "doh, I had a typo in my Kubernetes manifest file!"

Stack traces are often a huge help here. They tell you a lot of useful context. And both Rust and Haskell are particularly weak at providing this context in their error representations. But it's still not a panacea. The ugly reality is that...

There's an inherent trade-off

Like so many other things, error handling ultimately is a trade-off. When we're writing our initial code, we don't want to think about errors. We code to the happy path. How productive would you be if you had to derail every line of code with thought processes around the myriad ways your code could fail?

But then we're debugging a production issue, and we definitely want to think about errors. We curse our lazy selves for not handling an error case that obviously could have arisen. "Why did I decide to abort the process when the TCP connection failed? I should have retried! I should have logged the address I tried to connect to!"

Then we flood our code with log messages, and are frustrated when we can't see the important bits.

Finding the right balance is an art. And typically it's an art that we don't spend enough time thinking about. There are some well-established tools for this, like runtime-configurable log levels. That's a huge step in the right direction.

Rust is such a great example of this. Explicit matching on Result values really forces you to think through all of the different error cases and how to report them correctly. Complex custom enum error types allow you to define all of the different values you'd want reported. But all of this adds huge line noise compared to ?. So ? wins the day.

The method is secondary

The Rust community accepts that panics are bad. The Haskell community constantly argues about whether runtime exceptions are a good or bad thing. Java is either loved or hated for checked exceptions. Golang is either lauded or mocked for if err != nil.

I'm not at all arguing that those discussions are irrelevant. There are significant trade-offs to these various approaches. They affect performance, trackability of errors, and more.

What I'm arguing here is that we spend a disproportionate time on how we report and recover from errors, and far less on discussing what a good error actually contains.

My ideal

These are evolving thoughts for me. So take them with a grain of salt. And I'm very interested to hear differing opinions.

I've long held that in Haskell, we should use runtime exceptions. This has been interpreted by many as my advocacy of runtime exceptions. Instead, I would advocate: use the language's native mechanism. I don't pine for exceptions when writing Rust. Quite the opposite in fact. I overall prefer explicit error handling. But it's not worth fighting the battle against runtime exceptions when they are already ubiquitous.

I think Rust and Haskell are both close to the sweet spot in error handling. There's relatively little verbosity around adding this handling. If you leverage libraries like anyhow in Rust, there's even less.

My biggest concern with a library like anyhow is how easy it becomes to do the wrong thing. Taking our broken example from above. It's trivial to "upgrade" it to use anyhow:

fn main() -> anyhow::Result<()> {
    let s = std::fs::read_to_string("input.txt")?;
    println!("{}", s);
    Ok(())
}

However, this still produces the same useless error message we started with. Instead, we need to be a bit more explicit with a context method call to get a nicer message:

use anyhow::Context;

fn main() -> anyhow::Result<()> {
    let s = std::fs::read_to_string("input.txt")
        .context("Failed to read input.txt")?;
    println!("{}", s);
    Ok(())
}

Now we get the much more helpful error message:

Error: Failed to read input.txt

Caused by:
    The system cannot find the file specified. (os error 2)

This is a good balance of concision and helpfulness. The downside is that lack of enforcement. Nothing forced me to add the .context call. I worry that in a large codebase, or under time pressure, people like me will end up forgetting to add the helpful context.

Could we design a modified anyhow that forces a context call? Certainly. But:

  1. It will lose out on the current simple ergonomics.
  2. No tool can force the "right" level of context, that requires human insight and thought. And those are quantities in short supply, and not usually interested in error messages.

Advice

I don't have an answer here. I would advise people to start by recognizing that good error handling is difficult. We like to think of it as a trivial but tedious task. It isn't. Doing this correctly requires real thought and design. We're too quick to sweep it under the rug as the unimportant parts of our code.

I'll continue with my general advice of using your language's preferred mechanisms for error handling. In Rust, that means using Result and avoiding panics. In Haskell, it means some mixture of explicit Either return values and runtime exceptions (the exact mixture very much up for debate). In Java, it's mostly checked exceptions, though there's plenty of added unchecked exceptions to gum up the works too.

But consider spending a bit more time on thinking through not just how to report/raise/throw an error/exception, but what exactly you're reporting/raising/throwing. Think of the poor ops guy drinking his 7th cup of coffee at 4am trying to figure out what part of the codebase needs input.txt, or why in the world the program is trying to connect to an invalid IP address.

Subscribe to our blog via email
Email subscriptions come from our Atom feed and are handled by Blogtrottr. You will only receive notifications of blog posts, and can unsubscribe any time.

Do you like this blog post and need help with Next Generation Software Engineering, Platform Engineering or Blockchain & Smart Contracts? Contact us.