One of the greatest things about writing code in Haskell is the wonderful libraries (incidently, one of the worst things about writing code in Haskell is the libraries). In particular the libraries for assessing the quality of your own code. I’m especially found of:
-
QuickCheck for a quick testing of the correctness of my code, by letting me specify the properties I want to check, and taking care of generating random test-cases.
-
Criterion for checking the time performance of my code in a statistical sound manner. Taking care of dealing with the garbage collector and such like.
In this post I give a quick tour of how to use these libraries.
Getting the libraries
Our first step is to install QuickCheck and Criterion:
$ cabal install criterion quickcheck
Now we are ready to go.
Backstory
The standard list-based implementation of quicksort that you so often see, seems ripe for some optimisations:
quicksort :: Ord a => [a] -> [a]
quicksort [] = []
quicksort (p:xs) = quicksort lesser ++ [p] ++ quicksort greater
where
lesser = [ x | x <- xs, x < p]
greater = [ x | x <- xs, x >= p]
One obvious optimisation could be to only traverse xs
once, when partitioning the elements into lesser and greater (and equal) than p
. Here is one implementation that uses foldl'
for maximum performance (we hope):
kquicksort :: Ord a => [a] -> [a]
kquicksort [] = []
kquicksort (p:xs) = kquicksort lesser ++ equal ++ kquicksort greater
where
(lesser,equal,greater) = foldl' part ([],[p],[]) xs
part (l,e,g) x =
case compare x p of
LT -> (x : l, e, g)
GT -> (l, e, x : g)
EQ -> (l, x : e, g)
Checking some properties
The basic usage of QuickCheck is to specify the properties you care about as ordinary Haskell functions, and then you use the quickCheck
function to check the properties. In the following I use QC
as the qualified name for values from the QuickCheck library.
The first property we want to check is that kquicksort
returns a sorted result. To check this property we also define a helper predicate sorted
that checks that a list is sorted:
sorted :: Ord a => [a] -> Bool
sorted (x1:x2:xs) = x1 <= x2 && (sorted $ x2:xs)
sorted _ = True
prop_sorted :: Ord a => [a] -> Bool
prop_sorted xs = sorted $ kquicksort xs
The next properties that we want to check could be that kquicksort
is idempotent and if it is given an ordered argument it doesn’t mess it up:
prop_idempotent :: Ord a => [a] -> Bool
prop_idempotent xs = kquicksort xs == (kquicksort $ kquicksort xs)
prop_ordered :: Ord a => QC.OrderedList a -> Bool
prop_ordered (QC.Ordered xs) = xs == kquicksort xs
For prop_ordered
we use the type class OrderedList
to specify that this property should only hold for arguments that are sorted.
Now we can use the quickCheck
function to check all our properties:
checkAll = do
QC.quickCheck (prop_sorted :: [Int] -> Bool)
QC.quickCheck (prop_idempotent :: [Int] -> Bool)
QC.quickCheck (prop_ordered :: QC.OrderedList Int -> Bool)
The type constraints on all the properties are necessary, because that is what makes overloading works, so that quickCheck
can generate random test-cases. Alternatively we can give our properties a less general non-polymorphic type. However, by keeping the more general type for our properties we can check them for other types as well. For instance, we might want to check it for a custom type, like the following type Colour
:
data Colour = Red | Green | Blue
deriving (Eq, Show, Ord)
Before we can use QuickCheck with the Colour
type, we need to specify how to generate random values of this type. That is, make it an instance of the Arbitrary
type class:
instance QC.Arbitrary Colour where
arbitrary = QC.elements [Red, Green, Blue]
Now, we can check the prop_sorted
property by adding a simple type constraint:
QC.quickCheck (prop_sorted :: [Colour] -> Bool)
Checking the performance
Once we have convinced our-self the code could be correct, we can start worrying about performance. Thus, we turn to Criterion. When doing performance testing there are all sorts of gnarly details that can go wrong and ruin our experiments: we forget to force the lazy evaluation and thus measure the wrong thing, the running time of function we try to measure is to short to get a meaningful reading of with clock resolution on the computer we use, the garbage collector might introduce noise from one sampling into another, we might be using the computer for something while we run the experiments which could introduce noise, and so on.
In the following I use C
as the qualified name for values from the Criterion library.
Criterion takes care of all these concerns, and just present us with a simple interface where the only thing we need to specify is which functions we want to run, on what data, and how much we want the result to be evaluated. Thus, to benchmark kquicksort
on the list [20, 10, 30]
and get a fully evaluated result, we first use the nf
(for normal form) function from Criterion to get something Benchmarkable
:
C.nf kquicksort [20, 10, 30]
When we have something benchmarkable, we use the bench
function to label it and turn it into a Benchmark
C.bench "kquickcheck on short list" $ C.nf kquicksort [20, 10, 30]
When we have a list of benchmark we can hand it over to defaultMain
which makes an excellent main
body for us that will allow us to configure our benchmark from the command-line without recompiling the program. Without further ado:
makeList :: Int -> [Int]
makeList n = QC.unGen (QC.vector n) (R.mkStdGen 42) 25
main :: IO()
main = do
checkAll
let sizes = [ 10000, 20000, 50000, 75000
, 100000, 250000, 500000, 1000000]
let inputs = [(n, makeList n) | n <- sizes]
let benchmarks = [ C.bench (name ++ show n) $ C.nf sort ns
| (n, ns) <- inputs,
(name, sort) <- [("quicksort ", quicksort),
("kquicksort ", kquicksort)]]
C.defaultMain benchmarks
In the helper function makeList
I have reused the generator framework from QuickCheck for generating some random test data for my benchmark. For simple integer lists as we have here it is a bit overkill, but for more complex input it can be nice.
The complete program
If you want to find out if kquicksort
really is faster than quicksort
is here the complete program.
import qualified Criterion.Main as C
import qualified Test.QuickCheck as QC
import qualified Test.QuickCheck.Gen as QC
import Data.List(foldl')
import qualified System.Random as R
quicksort :: Ord a => [a] -> [a]
quicksort [] = []
quicksort (p:xs) = quicksort lesser ++ [p] ++ quicksort greater
where
lesser = [ x | x <- xs, x < p]
greater = [ x | x <- xs, x >= p]
kquicksort :: Ord a => [a] -> [a]
kquicksort [] = []
kquicksort (p:xs) = kquicksort lesser ++ equal ++ kquicksort greater
where
(lesser,equal,greater) = foldl' part ([],[p],[]) xs
part (l,e,g) x =
case compare x p of
LT -> (x : l, e, g)
GT -> (l, e, x : g)
EQ -> (l, x : e, g)
sorted :: Ord a => [a] -> Bool
sorted (x1:x2:xs) = x1 <= x2 && (sorted $ x2:xs)
sorted _ = True
prop_sorted :: Ord a => [a] -> Bool
prop_sorted xs = sorted $ kquicksort xs
prop_idempotent :: Ord a => [a] -> Bool
prop_idempotent xs = kquicksort xs == (kquicksort $ kquicksort xs)
prop_ordered :: Ord a => QC.OrderedList a -> Bool
prop_ordered (QC.Ordered xs) = xs == kquicksort xs
data Colour = Red | Green | Blue
deriving (Eq, Show, Ord)
instance QC.Arbitrary Colour where
arbitrary = QC.elements [Red, Green, Blue]
checkAll = do
QC.quickCheck (prop_sorted :: [Int] -> Bool)
QC.quickCheck (prop_sorted :: [Colour] -> Bool)
QC.quickCheck (prop_idempotent :: [Int] -> Bool)
QC.quickCheck (prop_ordered :: QC.OrderedList Int -> Bool)
makeList :: Int -> [Int]
makeList n = QC.unGen (QC.vector n) (R.mkStdGen 42) 25
main :: IO()
main = do
checkAll
let sizes = [ 10000, 20000, 50000, 75000
, 100000, 250000, 500000, 1000000]
let inputs = [(n, makeList n) | n <- sizes]
let benchmarks = [ C.bench (name ++ show n) $ C.nf sort ns
| (n, ns) <- inputs,
(name, sort) <- [("quicksort ", quicksort),
("kquicksort ", kquicksort)]]
C.defaultMain benchmarks
Compile it with the command-line
$ ghc -O3 -W --make Quicksort.hs -o Quicksort