I have been working on property testing for years now. My
current conclusion is that property testing still has so much
uncovered potential that with just some more effort, it can become
an even greater tool for software development than it already is
today. The concept is relatively young and the design space
relatively unexplored. This post seeks to provide an overview of
the different approaches and to outline a comparison between the
most commonly used libraries for property-based testing in Haskell.
These include two popular libraries: QuickCheck and HedgeHog, and a
new approach called “Validity-based Testing”.
The problems with testing
There are many ways to try to make sure that code works. They
each have their own trade-offs and we should consider these
trade-offs carefully. The first step is to consider how expensive
breakage would be. The next step is to use the right tools. Use the
right programming language, build system, etc, so that code is safe
in certain ways automatically. Is it worth it to formally verify
all of your code? Maybe, probably not. When you have gone through
these steps, you turn to testing, and most likely: unit
testing.
By “unit test”, here, I mean a test that checks whether the
output of a function, for a single example, is equal to the
expected output. An example hspec
snippet in
Haskell:
myFunction exampleInput `shouldBe` expectedOutput
There are two standard problems with unit tests:
-
One has to write many examples to cover all possible code paths.
Even when we do write tests that cover all code paths, we may not
notice that we have forgotten to write code to deal with edge
cases. We call this problem the coverage problem.
-
Writing these tests is both cumbersome, and usually not
considered enjoyable. Unit testing is expensive from a
developer-time perspective. We call this problem the developer cost
problem.
Property testing as a solution to the coverage problem
Property testing is a different approach to testing. The idea is
that you can make a test parametric in a certain argument, and the
test should still pass for every argument that you pass in. The
next question is then: Where do you get these arguments? That
depends on the kind of property test:
- Exhaustive property test: Run the test for each element of the
argument type.
- Randomised property test: Run the test for N random elements
drawn from the entire argument type.
- Exhaustive specialised property test: Run the test for each
element of a subset of the argument type.
- Randomised specialised property test: Run the test for N random
elements drawn from a subset of the argument type.
An exhaustive property test would be the most comprehensive, but
for argument types inhabited by a very large or infinite number of
values, like Int64
or String
, this is
infeasible. To make the runtime of a property test reasonable, we
usually opt for a randomised property test. Randomised property
testing probabilistically solves the coverage problem of tests, but
it exacerbates the developer cost problem. It is generally harder
to come up with a general property of your code than an
input-output example. You could argue that one would never write as
many unit tests as property tests will generate for them, but the
point here is about the strict amount of thinking and typing that
the programmer has to do to get started. It is a trade-off.
In this post, we will use the following running example of a
property test:
(\str -> reverse (reverse str) == str) :: String -> Bool
This property states that if you reverse a string twice, then
you get the original string. (For the purposes of this posts, we
assume that data structures are finite. This is important because
testing frameworks require that data is Show
-able in
order to print counter examples.)
Generators
Randomised property testing requires a way to generate the
random values. These generators are responsible for generating the
arguments to the property tests. An appropriate generator can often
be the difference between finding a bug, and not finding a bug.
For example, when only generating small lists, the following
property test may seem to hold:
(\ls -> length ls < 100)
It is important to note that not all types allow easy sampling
of values. Generating a value of type Word8
is
relatively simple: We can just choose a value between
0x00
and 0xff
uniformly at random.
However, generating a value from an unbounded type like
String
is tricky. Indeed, there is no way to sample a
value uniformly from the type String
because there are
infinitely many Strings, and there isn’t one obvious way to sample
from an infinite set. To solve this problem, generators often use
some way to guide how large of a value to generate. This is usually
called the size parameter. This size parameter then allows the
generator for String
s to generate a
String
up to the given size.
Generators can usually also be adapted to the exact test at
hand. For example, combinators can usually be composed to build
bigger combinators, or filtered to only return the generated values
that satisfy a given predicate.
Shrinking
When a property test fails, it is likely that the value that
made the test fail is large. This makes it hard to reason about why
the test failed. A standard solution to this problem is to try and
shrink the value that makes the test fail. The test is re-run with
smaller values to make sure that the test still fails with the
smaller value. After shrinking, the resulting value should be of a
more manageable size.
An example of shrinking
Suppose we believe that the following property holds:
After lower-casing every character in a string, all the
characters in this string will be lower-case.
\str -> all isLower (map toLower str)
When we run the property test without shrinking, we may get a
counterexample like the following:
"As5Enu3auO4"
It is certainly not immediately obvious to me why this is a
counterexample for the above property. However, when we rerun the
property test with shrinking, the counterexample is shrunk
to the following:
"1"
Now it becomes rather obvious why the test failed, if we start
debugging as follows:
λ toLower '1'
'1'
λ isLower '1'
False
Indeed, applying toLower
to the character
'1'
returns '1'
, but it is also not
considered lower-case by isLower
. In this case, the
property that we thought would hold just does not hold. We could
try to specialise it so that we get a property that does hold, but
that is beyond the scope of this example.
Note that the shrunk counterexample: "1"
is not
necessarily part of the original counter example. This brings me to
my next point:
Shrinking with invariants
It is important that the shrunk value does not just fail the
test, it must also have been generatable by the generator that was
used. For example, consider the following property test that you
may mistakenly write:
For every number larger than six: the number is larger than five
and odd. (pseudocode)
∀ n. (n > 6) => (n > 5 && odd n)
This property is designed to be tested with values larger than
six. We can constrain the set to draw random values from to the set
of integers larger than six, using randomised specialised testing.
However, if the shrinking method is not adapted as well, to
preserve this constraint on input set, then something unexpected
might occur. At some point, the generator might generate
8
as its random integer larger than 6
,
and fail the property test. Before this value is reported, it is
shrunk to zero because this is the smallest integer, and shrinking
might as well try this first. (Note that shrinking is for humans,
which is why integers are shrunk to zero, (debatably) the simplest
integer, instead of minBound :: Int
, the smallest
integer.) Zero still fails the property test, because zero is not
greater than five, so we report that this property test fails
because zero fails to make the property evaluate to
True
. This will make the output look something like
the following:
failed: the following number larger than six does not pass the property test: 0
It is therefore important that the shrinking method preserves
the intended invariants of the subset that one wants to test. A
value should only ever be shrunk to a smaller value that is still
within this subset.
Quickcheck approach and problems
The QuickCheck library was the pioneer of randomised property
testing in Haskell. QuickCheck makes use of a type class called
Arbitrary
.
The Arbitrary
type class contains two
functions:
You can run a quickcheck by passing something
Testable
to the
quickCheck
function. Something
Testable
is usually of the form FooBar ->
Bool
where FooBar
is a member of the
Arbitrary
type class. Quickcheck then generates random
values of type FooBar
and evaluates the property with
it to check that the resulting boolean is True
.
The example property of the reverse
function is
written as follows, using QuickCheck:
λ> quickCheck $ \(str :: String) -> reverse (reverse str) == str
+++ OK, passed 100 tests.
The QuickCheck approach has certain trade-offs, some of which
we’ll discover and analyse below.
Property combinators for non-specialised randomised property
testing
To test whether a given binary relation <|*|>
is reflexive, we can write a property test as follows:
\x -> x <|*|> x
If we want to test reflexivity of many binary relations, we may
write a function that takes a binary relation and produces a
property. Such a function is called a property combinator:
reflexive :: Arbitrary a => (a -> a -> Bool) -> Property
reflexive rel = \x -> x `rel` x
Now we can use this property combinator to test reflexivity of
any binary relation for which the arguments are a member of
Arbitrary
.
Note that there is an important limitation to this approach.
This property combinator can only test reflexivity of a given
binary relation if the relation is reflexive for ALL possible
values that may be generated by the arbitrary
generator.
This is already a problem with Rational
(from
GHC.Real
) and Eq
, for example.
The Eq
instance for Ratio a
asssumes
that values are normalised:
> 2 :% 2 == 1 :% 1
False
If arbitrary :: Gen Rational
ever generates
un-normalised values, then we can no longer test properties which
require that (==)
works as if the values are actual
ratios, even if the property holds for all values of
Rational
except un-normalised values. This brings me
to the next point
No laws or semantics for the Arbitrary
type
class
To continue with the previous example of Rational
.
How should we implement the Arbitrary
instance for
Rational
:
- Should we ever let it generate
0 :% 0
?
- Should we ever let it generate
1 :% 0
?
- Should we ever let it generate
5 :% -1
? How about
-5 :% 1
?
- Should we take the size parameter into account?
There are some considerations here:
- If
arbitrary :: Gen Rational
never generates
un-normalised values, then property tests about functions that deal
with (==)
assuming that values are actual ratios will
never test what happens for this edge-case. Maybe they work, maybe
they do not.
- If
arbitrary :: Gen Rational
generates
un-normalised values, then we cannot test properties which require
that (==)
works as if the values are actual ratios,
even if the property holds for all values of Rational
except un-normalised values.
There is a solution for problems like this in
QuickCheck
: Using newtypes to guide generation. For
example, the Positive a
newtype wrapper has a
specialised generator to make sure that arbitrary :: Positive
Rational
only generates positive Rational
values. One could imagine a newtype wrapper like Normalised
a
that could have a specialised generator to make sure that
Normalised Rational
never generates un-normalised
values, if arbitrary :: Gen Rational
is implemented to
sometimes generate un-normalised values.
However, this newtype approach has an important limitation as
well. The property test to test for reflexivity would now look
something like this:
reflexive (\(Normalised nn1) (Normalised nn2) -> nn1 == nn2)
(Note that you cannot just supply a custom generator if you want
to use property combinators that assume an Arbitrary
constraint.) This is rather cumbersome. It is a direct result of
the fact that values like 5 :% -5
are not considered
valid values. Who decided that these values are invalid even though
they are certainly members of their type? It is the programmer who
wrote the GHC.Real
module and decided that
Ratio
values should satisfy the invariant that they
are normalised. These invariants are not statically checked.
QuickCheck’s Arbitrary
typeclass makes it cumbersome
to work with values for which not all values are considered valid
values.
Expensive generators and shrinking
The authors of QuickCheck chose to not have a default
implementation for Arbitrary
. This is because there
are certain problems with the automatic generation of
Arbitrary
instances.
The QuickCheck documentation mentions that:
There is no generic arbitrary implementation included because we
don’t know how to make a high-quality one. If you want one,
consider using the testing-feat or generic-random packages.
There are multiple seperate libraries to allow programmers to
have GHC implement Arbitrary
instances automatically
using Generic
instances. However, using one of these
is not common practice.
As a result, there is no default implemetation of
arbitrary
and the default implementation for
shrink
is const []
. This means that by
default, shrink
is never overridden with a more
sensible implementation and counter examples are never shrunk.
Because programmers have to implement generators and shrinking
themselves, using QuickCheck (and Arbitrary
in
particular) can be considered expensive (in developer-time) and
friction-full.
Orphan instances for Arbitrary
A minor problem, but an annoying one nonetheless, is the
question of where to write the Arbitrary
instances.
- If we write the
Arbitrary
instances next to the
data types, then the entire library that we are writing must depend
on QuickCheck
.
- If we write the
Arbitrary
instances in the test
suite, then they are orphan instances and we have to re-write them
for every test suite.
- If we write the
Arbitrary
instances in a seperate
library, then we can depend on this library for the
Arbitrary
instances, but this adds maintenance
overhead.
The most commonly chosen solution seems to be to put
Arbitrary
instances in the
quickcheck-instances
library, but this has another
problem: now the test suite depends on extra libraries for which
the instances are in quickcheck-instances
, that the
test suite may not be using at all.
Hedgehog as a solution to QuickCheck’s problems
Hedgehog is a newer way to do
property testing.
The example property of the reverse
function is
written as follows, using Hedgehog:
import Hedgehog
import qualified Hedgehog.Gen as Gen
import qualified Hedgehog.Range as Range
prop_reverse :: Property
prop_reverse =
property $ do
xs <- forAll $ Gen.list (Range.linear 0 100) Gen.alpha
reverse (reverse xs) === xs
Hedgehog generators
Hedgehog uses a different implementation of generators. In
particular, hedgehog generators have built-in shrinking, so that
every generated value can automatically be shrunk without any extra
developer effort to implement a shrinking function. This built-in
shrinking also automatically takes care to ensure that shrunk
values obey the same invariants as the generator.
Whereas QuickCheck declares “shrink” as a function, in Hedgehog
generating and shrinking are performed simultaneously. Instead of
generating a single value, we can generate a rose tree of them,
where the main output is at the root, and the other nodes carry
shrunk versions of it, with the smallest ones at the leaves.
Generators are defined more generally, to allow for monadic
effects during generation. As such, generators are not seperated
from properties entirely. Indeed, as you can see in the
prop_reverse
example above, the generation of data and
testing of properties can be interleaved monadically.
Hedgehog also has adapters such that a programmer can use
QuickCheck generators in HedgeHog properties.
Sub-optimal shrinking
Hedgehog explicitly makes a tradeoff when it comes to shrinking.
The Hedgehog authors chose to make shrinking sub-optimal in order
to build it into generators.
The following is an example of such a tradeoff. Here Hedgehog is
generating two characters that are equal, by generating them
independently and then filterig the equal ones. It is a contrived
example, admittedly, but it makes my point.
λ > let gen = Gen.enum 'a' 'b' in Gen.printTree (Gen.filter (uncurry (==)) ((,) <$> gen <*> gen))
=== Outcome ===
('b','b')
=== Shrinks ===
('a','a')
('a','a')
('a','a')
As you can see, Hedgehog correctly shrinks the generated value
('b', 'b')
to tuples with equal characters ('a',
'a')
, but it shrinks to the same value many times. This
problem grows as the generated datastructure grows in size, and as
the amount of options to try while shrinking grows. This is the
main drawback with integrating shrinking with generating: It
requires some duplicated work during shrinking.
Side note: QuickCheck handles this case wrong as well, but in a
different way:
λ filter (uncurry (==)) $ QC.shrink ('b','b')
[]
Spoiler (see the rest of the post): genvalidity
handles this case correctly.
λ filter (uncurry (==)) $ shrinkUnchecked ('b', 'b')
[('a','a')]
Even more expensive generators
Hedgehog does not use any type classes for generators like
QuickCheck does with Arbitrary
. Instead, every
generator must be a manually-written value.
This means that Hedgehog does not have any problems with orphan
instances or with a lack of laws or semantics. However, it also
means that writing a generator for a value in Hedgehog now requires
even more developer effort.
More cumbersome property combinators.
It is still possible to write property combinators for Hedgehog
properties, but now the programmer will have to explicitly pass in
a given generator:
associative :: Gen a -> (a -> a -> a) -> Property
This makes writing property combinators more cumbersome, and
tests using property combinators more verbose.
Validity-based Testing as a solution to the problems of both
QuickCheck and Hedgehog
Validity-based testing attempts to use the advantages from each
of both QuickCheck and Hedgehog and make a different tradeoff. It
focuses on making property testing cheap (in terms of developer
time) and removing as much friction as possible.
The goals are:
- Free generators
- Free shrinking
- Cheap properties
The example property of the reverse function is written as
follows, using the validity
-based libraries:
forAllUnchecked $ \str -> reverse (reverse str) == (str :: String)
The GenUnchecked
type-class in the
genvalidity
library for free generators and free
shrinking.
The
GenUnchecked
type class defines both a generator:
genUnchecked :: Gen a
and a shrinking function
shrinkUnchecked :: a -> [a]
, just like
QuickCheck does with Arbitrary
.
Unlike QuickCheck’s Arbitrary
,
GenUnchecked
has very clear semantics: Any value that
can possibly be represented by the type should be generatable
(excluding bottom values and infinite values). Because of these
very clear semantics, the implementations of both the
genUnchecked
and shrinkUnchecked
functions are generated via the Generic
instance by
default. This means that the programmer never has to write such a
generator themselves, and shrinking comes for free as well.
The validity
library to define invariants
explicitly
The next step for validity-based testing is to assert that types
with internal invariants are as natural as types themselves.
Indeed, types exist to constrain the possible values of a given
variable using compile-time checks whereas runtime invariants exist
to constraint the possible values of a given variable using
programming-time cleverness. This will be important later, to
further remove friction, but requires a bit of extra up-front
thought.
The validity
library allows programmers to
explicitly define, in code (instead of documentation),
what it means for a value of a given type to be valid. Types that
have an instance of the Validity
typeclass implement
the validate :: a -> Validation
to explain all the
reasons why a value may be invalid. The semantics of this type
class is that an invalid value of any type should never exist at
runtime, otherwise the programmer has made a mistake.
For example, the type Map k v
has an internal
invariant: it should always have
Data.Map.valid :: Ord k => Map k v -> Bool
evaluate to True
.
Its Validity
instance defines this explicitly:
-- | A 'Map' of things is valid if all the keys and values are valid and the 'Map' itself
-- is valid.
instance (Ord k, Validity k, Validity v) => Validity (Map k v) where
validate m =
mconcat
[ declare "The Map structure is valid." $ M.valid m
, delve "Map elements" $ M.toList m
]
This implementation checks that the internal map structure is
valid: M.valid
, and that the individual elements of
the map are valid values by themselves as well.
The validate
function does not just check whether a
value is valid, but also reports all the reasons why a value may
not be valid. This makes for very useful error messages. For
example:
newtype Prime = Prime { unPrime :: Int } -- INVARIANT: isPrime
instance Validity Prime where
validate (Prime p) = declare "The number is prime" $ isPrime p
λ let Left e = prettyValidate [Prime 2, Prime 3, Prime 5, Prime 6]
λ putStrLn e
The element at index 3 in the list
\ Violated: The number is prime -- Aha!
The default implementation of Validity
uses a
type’s Generic
instance to derive an implementation
that defines a value to be valid if all of its sub-parts are valid.
This makes it frictionless to instantiate Validity
for
types of which the validity of its values depends only on its
sub-parts.
The GenValid
type-class in the
genvalidity
library for free generators and free
shrinking for valid values.
To generate valid values, genvalidity
has the
GenValid
type class. This type class also has very
clear semantics.
- The
genValid :: Gen a
generator must only ever
generate valid values.
- The
shrinkValid :: a -> [a]
shrinking function
should must only ever shrink valid values to valid values..
- The
genValid
generator should be able to generate
all possible valid values that can exist at runtime.
These semantics also allow for default implementations:
genValid :: GenValid a => Gen a
genValid = genUnchecked `suchThat` isValid
shrinkValid :: GenValid a => a -> [a]
shrinkValid = filter isValid . shrinkUnchecked
This makes it relatively frictionless to generate valid values
as well.
Note that if there are not many valid values of a given type, or
if checking validity is expensive, the default
genValid
implementation using suchThat
may be too slow, and a programmer may have to override it with a
faster implementation. This is the tradeoff that
genvalidity
makes to remove friction, but there are
helper functions in genvalidity
to write faster
generators too.
As a result, the following is all that’s necessary for a
programmer to write a custom type with generators, and start
writing property tests:
{-# LANGUAGE DeriveGeneric #-}
module Example where
data MyType = MyType
{ myBool :: Bool
, myRational :: Rational
} deriving (Show, Eq, Generic)
instance Validity MyType
instance GenUnchecked MyType
instance GenValid MyType
Overriding genValid
to make it faster
While genValid
is always correct by default, it can
sometimes be too slow. In such cases we want to override
genValid
to make it faster, but then it is up to us to
make sure that it is still correct. When overriding
genValid
, it is advisable to add
a test that makes sure that genValid
still only
generates valid values It is also important that
genValid
can generate any value that would be
considered valid. We cannot test for this, so it is up to the
programmer to maintain this property of the generator.
As an example, consider a newtype Prime = Prime Int
with a Validity
instance that describes that the
integer inside is prime. The genUnchecked
generator is
generated using the Generic
instance, and it works
exactly as it should. The default implementation of
genValid
is equivalent to the following:
genValid :: Gen Prime
genValid = genUnchecked `suchThat` isValid
There are two ways to speed up a generator that uses
suchThat
:
-
Generate values that are automatically valid, and don’t filter
them using isValid
afterward. This is particularly
useful if the isValid
predicate is expensive to
compute. In the case of Prime
: primality testing is
relatively expensive.
-
Generate values that are more likely to be valid than the ones
generated by genUnchecked
and then filter them with
isValid
. This is particularly useful if there aren’t
many valid values. In the case of Prime
: the
likelihood for an arbitrary number goes down as the number grows
larger.
In the case of the Prime
example, we could speed up
genValid
by replacing genUnchecked
by a
generator that either generates 2
, or an odd number.
The result would look like this:
genValid :: Gen Prime
genValid = (oneof [pure 2, (\x -> 2*x + 1) <$> genUnchecked]) `suchThat` isValid
Cheap properties using property combinators from
genvalidity-property
Once these type classes have been implemented, a programmer can
use the many property combinators from
genvalidity-property
and even test suite combinators
from genvalidity-hspec
.
-- # Property combinators
-- Test whether two functions are inverses
inverseFunctions :: (Show a, Eq a, GenUnchecked a) => (a -> b) -> (b -> a) -> Property
-- Test whether a binary relation is symmetric
symmetry :: (Show a, GenUnchecked a) => (a -> a -> Bool) -> Property
-- Test whether a binary operator is commutative:
commutative :: (Show a, Eq a, GenUnchecked a) => (a -> a -> a) -> Property
-- Test whether a function maintains validity
producesValidsOnValids :: (Show a, Show b, GenValid a, Validity b) => (a -> b) -> Property
Each property combinator in genvalidity-property
comes in multiple variants:
symmetry
for unchecked values
symmetryOnValid
for values generated by
GenValid
symmetryOnArbitrary
for values generated by
Arbitrary
(for compatibility with QuickCheck)
symmetryOnGens
for values generated by a given
generator
Cheap properties using test suite combinators from
genvalidity-hspec
With the TypeApplications
extension that is new in
GHC 8.0, we can now write entire test suites that are parametric in
type argument (without the use of proxies). These so-called test
suite combinators are of the form testSuiteCombinator ::
forall a. Spec
.
The genvalidity-hspec
library and its companion
libraries already contain many test suite combinators.
-- Test whether a custom Eq instance of a type makes sense
eqSpec :: forall a. (Show a, Eq a, Typeable a, GenUnchecked a) => Spec
-- Test that a Monad doesn't violate the monad laws
monadSpec :: forall (f :: * -> *). (Monad f, Eq (f Int), Show (f Int), Typeable f, GenUnchecked (f Int)) => Spec
-- Test whether a lens doesn't violate the lens laws.
lensSpec :: forall s b. (Show b, Eq b, GenUnchecked b, Validity b, Show s, Eq s, GenUnchecked s, Validity s) => Lens' s b -> Spec
Each property combinator in genvalidity-hspec
comes
in multiple variants:
eqSpec
for unchecked values
eqSpecOnValid
for values generated by
GenValid
eqSpecOnArbitrary
for values generated by
Arbitrary
(for backward compatibility)
eqSpecOnGens
for values generated by a given
generator
Conclusion
Property testing can be a great tool in your quality arsenal.
This point has outlined the major similarities and differences
between property testing tools. You should now be more equipped to
make the necessary tradeoffs between these tools, but more
important than trying to optimise your usage is that you give
property testing a try.
Below are some exercises to get you started with Validity-based
testing. validity
and the related libraries can all be found on Stackage.
Exercise: Odd
Objective: Generate valid Odd
values using
genValid
by completing the following piece of
code.
newtype Odd = Odd Int -- INVARIANT: The Int must be odd
Solution:
instance Validity Int where
validate (Odd i) = declare "The Int is odd" $ odd i
instance GenUnchecked Odd
instance GenValid Odd
Exercise: JSON
Objective: Test whether the JSON instance in the following piece
of code works, using genvalidity-hspec-aeson
, by
completing the following piece of code:
data MyType = MyType
{ myBool :: Bool
, myRational :: Rational
} deriving (Show, Eq, Generic)
instance FromJSON MyType where
parseJSON = withObject "MyType" $ \o -> MyType <$> o .: "bool" <*> o .: "Rational"
instance ToJSON MyType where
toJSON mt = object ["bool" .= (myBool mt), "double" .= (myRational mt)]
Solution:
data MyType = MyType
{ myBool :: Bool
, myRational :: Rational
} deriving (Show, Eq, Generic)
instance FromJSON MyType where
parseJSON = withObject "MyType" $ \o -> MyType <$> o .: "bool" <*> o .: "Rational"
instance ToJSON MyType where
toJSON mt = object ["bool" .= (myBool mt), "double" .= (myRational mt)]
instance Validity MyType
instance GenUnchecked MyType
instance GenValid MyType
spec :: Spec
spec =
describe "MyType" $
jsonSpecOnValid @MyType
Subscribe to our blog via email
Email subscriptions come from our Atom feed and are handled by Blogtrottr. You will only receive notifications of blog posts, and can unsubscribe any time.
Do you like this blog post and need help with Next Generation Software Engineering, Platform Engineering or Blockchain & Smart Contracts? Contact us.