What pure functional programming is all about: Part 1

This is a technical post series about pure functional programming. The intended audience is general programmers who are familiar with closures and some functional programming.

We're going to be seeing how pure functional programming differs from regular "functional programming", in a significant way.

We're going to be looking at a little language theory, type theory, and implementation and practice of pure functional programming.

We'll look at correctness, architecture, and performance of pure programs.

We're going to look at code samples in C#, JavaScript and SQL. We're going to look at Haskell code. We're going to look at assembly language.

In closing we'll conclude that typed purely functional programming matters, and in a different way to what is meant by "functional programming".

Hold onto yer butts.

In this post

Firstly, we'll start with a pinch of theory, but not enough to bore you. We'll then look at how functional programming differs from "pure" functional programming. I'll establish what we mean by "side-effects". Finally, I'll try to motivate using pure languages from a correctness perspective.

Types and Functions

A function is a relation between terms. Every input term has exactly one output term. That's as simple as it gets.

In type theory, it's described in notation like this:

f : A → B

Inputs to f are the terms in type A, and outputs from f are the terms in type B. The type might be Integer and the terms might be all integers .., -2, -1, 0, 1, 2, ...

The type might be Character and the terms might be Homer, Marge, SideShowBob, FrankGrimes, InanimateCarbonRod, etc.

This is the core of functional programming, and without type theory, it's hard to even talk formally about the meaning of functional programs.

Get it? I thought so. We'll come back to this later.

Functional programming isn't

Functional programming as used generally by us as an industry tends to mean using first-class functions, closures that capture their environment properly, immutable data structures, trying to write code that doesn't have side-effects, and things like that.

That's perfectly reasonable, but it's distinct from pure functional programming. It's called "pure functional", because it has only functions as defined in the previous section. But why aren't there proper functions in the popular meaning of functional programming?

It's something of a misnomer that many popular languages use this term function. If you read the language specification for Scheme, you may notice that it uses the term procedure. Some procedures may implement functions like cosine, but not all procedures are functions, in fact most aren't. Any procedure can do cheeky arbitrary effects. But for the rest of popular languages, it's a misnomer we accept.

Let's look into why this is problematic in the next few sections.

Side-effects, mutation, etc.

What actually are side-effects, really? What's purity? I'll establish what I am going to mean in this series of blog posts. There are plenty of other working definitions.

These are some things you can do in source code:

Mutate variables.
Make side-effects (writing to standard output, connecting to a socket, etc.)
Get the current time.

These are all things you cannot do in a pure function, because they're not explicit inputs into or explicit outputs of the function.

In C#, the expression DateTime.Now.TimeOfDay has type TimeSpan, it's not a function with inputs. If you put it in your program, you'll get the time since midnight when evaluating it. Why?

In contrast, these are side-effects that pure code can cause:

Allocate lots of memory.
Take lots of time.
Loop forever.

But none of these things can be observed during evaluation of a pure functional language. There isn't a meaningful pure function that returns the current time, or the current memory use, and there usually isn't a function that asks if the program is currently in an infinite loop.

So I am going to use a meaning of purity like this: an effect is something implicit that you can observe during evaluation of your program.

Which languages are purely functional

So, a pure functional language is simply one in which a function cannot observe things besides its inputs. Given that, we can split the current crop of Pacman-complete languages up into pure and impure ones:

Pure languages

Impure languages

Haskell (we'll use Haskell in this series), PureScript and Idris are purely functional in this sense. You can't change variables. You can't observe time. You can't observe anything not constructed inside Haskell's language. Side effects like the machine getting hotter, swap space being used, etc. are not in Haskell's ability to care about. (But we'll explore how you can straight-forwardly interact with the real world to get this type of information safely in spite of this fact.)

We'll look at the theoretical and practical benefits of programming in a language that only has functions during this series.

Reasoning and function composition

In a pure language, every expression is pure. That means you can move things around without worrying about implicit dependencies. Function composition is a basic property that lets you take two functions and make a third out of them.

Calling back to the theory section above, you can readily understand what function composition is by its types. Lets say we have two functions, f : X → Y and g : Y → Z:

The functions f : X → Y and g : Y → Z can be composed, written f ∘ g, to yield a function which maps x in X to g(f(x)) in Z:

In JavaScript, f ∘ g is:

function(x){ return f(g(x)); }

In Haskell, f ∘ g is:

\x -> f (g x)

There's also an operator that provides this out of the box:

f . g

You can compare this notion of composition with shell scripting pipes:

sed 's/../../' | grep -v ^[0-9]+ | uniq | sort | ...

Each command takes an input and produces an output, and you compose them together with |.

We'll use this concept to study equational reasoning next.

A really simple example

Let's look at a really simple example of composition and reasoning about it in the following JavaScript code and its equivalent in Haskell:

JavaScript

> [1, 4, 9].map(Math.sqrt)
[1, 2, 3]
> [1, 4, 9].map(Math.sqrt).map(function(x){ return x + 1 });
[2, 3, 4]

Haskell

> map sqrt [1,4,9]
[1.0,2.0,3.0]
> map (\x -> x + 1) (map sqrt [1,4,9])
6.0

There's a pattern forming here. It looks like this:

xs.map(first).map(second)

map second (map first xs)

The f and g variables represent any function you might use in their place.

Let's do equational reasoning

We know that map id ≣ id, that is, applying the identity function across a list yields the original list's value. What does that tell us? Mapping preserves the structure of a list. In JavaScript, this implies that, given id

var id = function(x){ return x; }

Then xs.map(id) ≣ xs.

Map doesn't change the length or shape of the data structure.

By extension and careful reasoning, we can further observe that:

map second . map first   ≣ map (second . first)

In JavaScript, that's

xs.map(first).map(second) ≣ xs.map(function(x){ return second(first(x)); })

For example, applying a function that adds 1 across a list, and then applying a function that takes the square root, is equivalent to applying a function across a list that does both things in one, i.e. adding one and taking the square root.

So you can refactor your code into the latter!

Actually, whether the refactor was a good idea or a bad idea depends on the code in question. It might perform better, because you put the first and the second function in one loop, instead of two separate loops (and two separate lists). On the other hand, the original code is probably easier to read. "Map this, map that ..."

You want the freedom to choose between the two, and to make transformations like this freely you need to be able to reason about your code.

But it doesn't work in JavaScript

Turning back to the JavaScript code, is this transformation really valid? As an exercise, construct in your head a definition of first and second which would violate this assumption.

xs.map(first).map(second)
→
x.map(function(x){ return second(first(x)); })

The answer is: no. Because JavaScript's functions are not mathematical functions, you can do anything in them. Imagine your colleague writes something like this:

var state = 0;
function first(i){ return state += i; }
function second(i){ return i `mod` state; }

You come along and refactor x.map(first).map(second), only to find the following results:

> var state = 0;
  function first(i){ return state += i; }
  function second(i){ return i % state; }
  [1,2,3,4].map(first).map(second)
[1, 3, 6, 0]
> var state = 0;
  function first(i){ return state += i; }
  function second(i){ return i % state; }
  [1,2,3,4].map(function(x){ return second(first(x)); })
[0, 0, 0, 0]

Looking at a really simple example, we see that basic equational reasoning, a fundamental "functional" idea, is not valid in JavaScript. Even something simple like this!

So we return back to the original section "Functional programming isn't", and why the fact that some procedures are functions doesn't get you the same reliable reasoning as when you can only define functions.

A benefit like being able to transform and reason about your code translates to practical pay-off because most people are changing their code every day, and trying to understand what their colleagues have written. This example is both something you might do in a real codebase and a super reduced down version of what horrors can happen in the large.

Summary

In this post I've laid down some terms and setup for follow up posts.

We've looked at what purity is and what it isn't.
We've looked at functions and function composition.
We've looked at how reasoning is easier if you only have functions. We'll circle back to this in coming posts when we get to more detailed reasoning and optimizations.

In the next post, we'll explore:

SQL as a pure language, which is a familiar language to almost everybody.
How pure languages deal with the real-world and doing side-effects, which are obviously practical things to do.
Evaluation vs execution, the generalization of pure vs impure languages.

After that, we'll explore the performance and declarative-code-writing benefits of pure functional languages in particular detail, looking at generated code, assembly, performance, etc.

Subscribe to our blog via email
Email subscriptions come from our Atom feed and are handled by Blogtrottr. You will only receive notifications of blog posts, and can unsubscribe any time.

Do you like this blog post and need help with Next Generation Software Engineering, Platform Engineering or Blockchain & Smart Contracts? Contact us.