In order to solve almost any problem, that requires manipulation of data, the
very first question should be: What is the most suitable data structure that we
can use for the problem at hand?
Preface
Haskell is a pure functional language, thus, by it's nature, most of the
available containers are immutable and, without a doubt, the most common one is
a list [a]
. Certainly, it is not efficiency that made it so popular, but
rather its simplicity, consequently, it is also the first data structure, that
you get introduced to while learning Haskell. Although, lists are perfectly
suitable for some problems, more often than not, we need something that is
tailored to how we are trying to use our data.
Here are some situations
that containers package can
be of help. It provides efficient implementation of some of the most commonly
used containers used in programming:
Data.Set
- you care about uniqueness and possibly the order of elements.
Data.Map
- you need a mapping from unique keys to values and operations you
perform can take advantage of ordering of keys.
Data.IntSet
and Data.IntMap
- just as above, but when elements and keys
respectively are Int
s.
Data.Sequence
- can be of use when a linear structure is required for a
finite number of elements with fast access from both of its sides and a fast
concatenation with other sequences.
Data.Tree
and Data.Graph
- for describing more complicated relations of
elements.
Map
Map
is one of the most interesting and commonly used abstractions from the
package, so most of the examples will be based on it. Moreover, interface
provided for Map
, IntMap
, Set
and IntSet
is very similar, so
analogous examples can be easily derived for all of the above.
Setup a problem.
One of the common mappings in real life, that we encounter, is a person's
identification number, that maps a unique number to an actual human
being. Social Security Number (SSN) is what normally used for that purpose in the
USA and despite that it is not totally unique, for demonstration purpose, we can
assume it actually is. Although it is a 9 digit number and using IntMap
would
be more efficient, it does have some structure to it and we will take advantage
of it, so we will use a custom data type SSN
as a key.
import qualified Data.Map as Map
import qualified Data.Set as Set
import Data.List as List
import Data.Monoid
import Text.Printf
-- | Social Security Number. Commonly used as a unique identification number of a
-- person.
data SSN = SSN
{ ssnPrefix :: Int
, ssnInfix :: Int
, ssnSuffix :: Int
} deriving (Eq, Ord)
instance Show SSN where
show (SSN p i s) = printf "%03d-%02d-%04d" p i s
data Gender = Male | Female deriving (Eq, Show)
data Person = Person
{ firstName :: String
, lastName :: String
, gender :: Gender
} deriving (Eq)
instance Show Person where
show (Person fName lName g) = fName ++ ' ':lName ++ " (" ++ show g ++ ")"
type Employees = Map.Map SSN Person
I would like to stress how important Eq
and Ord
instances of a data type
used as a key actually are. Because they are essential for underlying representation
of a Map
, providing incomplete or incorrect instances for these classes will
lead to some strange behavior of your data mappings, therefore, either make sure
you know what you are doing when creating custom instances, or use derived
instances, as they are always safe.
Because Social Security Numbers have a specific structure we would like to
enforce it by providing a constructor function that performs certain validations.
mkSSN :: Int -> Int -> Int -> SSN
mkSSN p i s
| p <= 0 || p == 666 || p >= 900 = error $ "Invalid SSN prefix: " ++ show p
| i <= 0 || i > 99 = error $ "Invalid SSN infix: " ++ show i
| s <= 0 || s > 9999 = error $ "Invalid SSN suffix: " ++ show s
| otherwise = SSN p i s
Converting maps
Let's go ahead and create our mapping of employees using Map.fromList
:
employees :: Employees
employees =
Map.fromList
[ (mkSSN 525 21 5423, Person "John" "Doe" Male)
, (mkSSN 521 01 8756, Person "Mary" "Jones" Female)
, (mkSSN 585 11 1234, Person "William" "Smith" Male)
, (mkSSN 525 15 5673, Person "Maria" "Gonzalez" Female)
, (mkSSN 524 34 1234, Person "Bob" "Jones" Male)
, (mkSSN 522 43 9862, Person "John" "Doe" Male)
, (mkSSN 527 75 1035, Person "Julia" "Bloom" Female)
]
As you can see above, there is no particular order to our data as we defined it,
which results in creation of a Map
in O(n*log n) time complexity, but, if we
were sure ahead of time, that our list was sorted and unique with respect to the
first element of a tuple, it would be more efficient to use Map.fromAscList
,
which would run in O(n) complexity instead.
Operate on data.
Now that we have our program properly set up, most of available functions will
correspond directly to functions that we might try to use on our data,
e.g.:
lookupEmployee :: SSN -> Employees -> Maybe Person
lookupEmployee = Map.lookup
which does exactly what is expected of it:
λ> lookupEmployee (mkSSN 524 34 1234) employees
Just Bob Jones (Male)
λ> lookupEmployee (mkSSN 555 12 3456) employees
Nothing
In order to refrain from redefining functions which trivially correspond to existing
ones, let's go through some of them:
- Checking presence of an employee by the social security number:
λ> mkSSN 585 11 1234 `Map.member` employees
True
λ> mkSSN 621 24 8736 `Map.member` employees
False
- Looking up an employee with a default name:
λ> Map.findWithDefault (Person "Bill" "Smith" Male) (mkSSN 585 11 1234) employees
William Smith (Male)
λ> Map.findWithDefault (Person "Anthony" "Richardson" Male) (mkSSN 621 24 8736) employees
Anthony Richardson (Male)
- Getting total number of employees.
λ> Map.size employees
7
λ> Map.size $ Map.delete (mkSSN 585 11 1234) employees
6
λ> Map.size $ Map.delete (mkSSN 621 24 8736) employees
7
λ> Map.size $ Map.insert (mkSSN 621 24 8736) (Person "Anthony" "Richardson" Male) employees
8
Folding
It would be useful to present our Employees
in a user friendly format, so
let's define a showMap
function by converting our Map
to a printable string:
showMap :: (Show k, Show v) => Map.Map k v -> String
showMap = List.intercalate "\n" . map show . Map.toList
Let's give it a try:
λ> putStrLn $ showMap employees
(521-01-8756,Mary Jones (Female))
(522-43-9862,John Doe (Male))
(524-34-1234,Bob Jones (Male))
(525-15-5673,Maria Gonzalez (Female))
(525-21-5423,John Doe (Male))
(527-75-1035,Julia Bloom (Female))
(585-11-1234,William Smith (Male))
As you can see, all employees are sorted by their SSN, so conversion to a list is
done in ascending order, but it is worth noting, that if there is a desire to
guarantee this behavior, Map.toAscList
should be used instead, or
Map.toDescList
to get it in a reverse order.
Conversion is nice an simple, but how about using folding in a way that is
native to a Map
? While we are at it, we should probably improve formatting as
well.
showEmployee :: (SSN, Person) -> String
showEmployee (social, person) =
concat [show social, ": ", show person]
showEmployees :: Employees -> String
showEmployees es
| Map.null es = ""
| otherwise = showE ssn0 person0 ++ Map.foldrWithKey prepender "" rest
where
showE = curry showEmployee
((ssn0, person0), rest) = Map.deleteFindMin es
prepender key person acc = '\n' : showE key person ++ acc
printEmployees :: Employees -> IO ()
printEmployees = putStrLn . showEmployees
Now that looks slightly better:
λ> printEmployees employees
521-01-8756: Mary Jones (Female)
522-43-9862: John Doe (Male)
524-34-1234: Bob Jones (Male)
525-15-5673: Maria Gonzalez (Female)
525-21-5423: John Doe (Male)
527-75-1035: Julia Bloom (Female)
585-11-1234: William Smith (Male)
Exercise: Using the fact that List is an instance of Monoid
and implement
showEmployees
with the help of Map.foldMapWithKey
.
Exercise: Implement showEmployeesReversed
using Map.foldlWithKey
(hint: use Map.deleteFindMax
).
Mapping
There is a collection of mapping functions available, which range from a simple
Map.map
to a more complex Map.mapAccumRWithKey
.
Because Map
is an instance of Functor
we can use fmap
for mapping a
function over it's values, but for this example we will use it's native equivalent
Map.map
to retain only last names of employees and Map.elems
to retrieve a list of
new elements:
λ> Map.elems $ Map.map lastName employees
["Jones","Doe","Jones","Gonzalez","Doe","Bloom","Smith"]
If for some reason, we would like to map a function over keys, we can use
Map.mapKeys
for this purpose, for instance getting a list of all SSN prefixes:
λ> Map.keys $ Map.mapKeys (show . ssnPrefix) employees
["521","522","524","525","527","585"]
We need to pay some extra attention to usage of Map.mapKeys
, because whenever
a function that is being mapped over is not 1-to-1, it is possible that some
values will be lost. Although, sometimes discarding some elements maybe desired
or simply irrelevant, here is an example of how we can mistakenly lose an
employee if we assume that last four numbers of a social are unique:
λ> putStrLn $ showMap $ Map.mapKeys ssnSuffix employees
(1035,Julia Bloom)
(1234,William Smith)
(5423,John Doe)
(5673,Maria Gonzalez)
(8756,Mary Jones)
(9862,John Doe)
If we are sure that our function is not only 1-to-1, but it is also monotonic
(i.e. it doesn't change the order of the resulting keys) we could use a more
efficient mapping function Map.mapKeysMonotonic
. Say a show
function on
SSN
would be safe to use, since ordering would be preserved. One of the
simplest examples of a non-monotonic function would be negate
and a
strictly-monotonic one: succ
.
Filtering
Let's start with a couple of simple examples:
λ> printEmployees (Map.filter (("Jones"==) . lastName) employees)
521-01-8756: Mary Jones (Female)
524-34-1234: Bob Jones (Male)
Partitioning by gender:
λ> let (men, women) = Map.partition ((Male==) . gender) employees
λ> printEmployees men
522-43-9862: John Doe (Male)
524-34-1234: Bob Jones (Male)
525-21-5423: John Doe (Male)
585-11-1234: William Smith (Male)
λ> printEmployees women
521-01-8756: Mary Jones (Female)
525-15-5673: Maria Gonzalez (Female)
527-75-1035: Julia Bloom (Female)
Prior to June 25th, 2011, Social
Security prefixes, were restricted to states where they were issued in. Let's
assume this is still the case and use this information to figure out which
states our employees received their Social Security Cards in.
First, we need to define a function, that retrieves employees within a
prefix range, so a naïve approach would be to use Map.filterWithKey
:
withinPrefixRangeNaive :: Int -> Int -> Employees -> Employees
withinPrefixRangeNaive prefixLow prefixHigh = Map.filterWithKey ssnInRange where
ssnInRange (SSN prefix _ _) _ = prefix >= prefixLow && prefix <= prefixHigh
which runs in O(n), but we can do better than that, simply by taking
advantage of ordering of keys:
withinPrefixRange :: Int -> Int -> Employees -> Employees
withinPrefixRange prefixLow prefixHigh =
fst . Map.split (SSN (prefixHigh + 1) 0 0) . snd . Map.split (SSN prefixLow 0 0)
employeesFromColorado :: Employees -> Employees
employeesFromColorado = withinPrefixRange 521 524
Naturally, this function will give us all employees that got their Social
Security Card in Colorado:
λ> printEmployees $ employeesFromColorado employees
521-01-8756: Mary Jones
522-43-9862: John Doe
524-34-1234: Bob Jones
That worked well for Colorado state, but some states have noncontiguous groups
of area numbers, which means we need to join together results from a couple of
ranges:
employeesFromNewMexico :: Employees -> Employees
employeesFromNewMexico es =
withinPrefixRange 525 525 es `Map.union` withinPrefixRange 585 585 es
Sets and Maps
Until previous example, we've looked at functions that deal only with a single
Map
, but most of the familiar functions from set theory are also made
available to us, so we can easily operate on more than one Map
/Set
. In order
to provide some meaningful examples let's define geographic regions of the USA
and their SSN prefix ranges:
Creation
data State =
Arizona | California | Colorado | NewMexico | Nevada | Oklahoma | Texas | Utah | ...
deriving (Show, Eq, Ord, Enum)
statePrefixRangeMap :: Map.Map State [(Int, Int)]
statePrefixRangeMap =
Map.fromList
[ (Arizona, [(526, 527)])
, (California, [(545, 573)])
, (Colorado, [(521, 524)])
, (NewMexico, [(525, 525), (585, 585)])
, (Nevada, [(530, 530), (680, 680)])
, (Oklahoma, [(440, 448)])
, (Texas, [(449, 467)])
, (Utah, [(528, 529)])
...
]
allStates :: Set.Set State
allStates = Set.fromDistinctAscList [toEnum 0 ..]
For compactness, only states that are considered the South West are included,
complete source code can be found as
a gist.
Note, that because we are using toEnum
and enumFrom
- (..
), we are guaranteed
that all States will be unique and in a proper ascending order, thus we are safe
to use Set.fromDistinctAscList
instead of a less efficient Set.fromList
or even
Set.fromAscList
.
Binary operations
Exclude employees from New Mexico:
λ> printEmployees (employees Map.\\ employeesFromNewMexico employees)
521-01-8756: Mary Jones
522-43-9862: John Doe
524-34-1234: Bob Jones
527-75-1035: Julia Bloom
Some common manipulations on Set
s:
λ> let fourCorners = Set.fromList [Arizona, NewMexico, Colorado, Utah]
λ> let borderStates = Set.fromList [California, Arizona, NewMexico, Texas]
λ> Set.union fourCorners borderStates
fromList [Arizona,California,Colorado,NewMexico,Texas,Utah]
λ> Set.intersection fourCorners borderStates
fromList [Arizona,NewMexico]
λ> Set.difference fourCorners borderStates
fromList [Colorado,Utah]
λ> Set.difference borderStates fourCorners
fromList [California,Texas]
λ> let symmetricDifference a b = Set.union a b Set.\\ Set.intersection a b
λ> symmetricDifference fourCorners borderStates
fromList [California,Colorado,Texas,Utah]
Conversion
Map a State to a set of SSN prefixes that correspond to it:
statePrefixMap :: Map.Map State (Set.Set Int)
statePrefixMap =
Map.fromSet (Set.fromList . concatMap (uncurry enumFromTo) . (statePrefixRangeMap Map.!)) allStates
Inverse of what we have above: Map from prefix to State
:
prefixStateMap :: Map.Map Int State
prefixStateMap = Map.foldlWithKey addPrefixes Map.empty statePrefixMap where
addPrefixes spm state = Map.union spm . Map.fromSet (const state)
Using above Map we can list all employees we have per state:
-- | Transform `Map` to another `Map`
statePersonsMap :: Employees -> Map.Map State [Person]
statePersonsMap = Map.foldlWithKey updateState Map.empty
where updateState sm ssn p =
case Map.lookup (ssnPrefix ssn) prefixStateMap of
Nothing -> sm
Just state -> Map.alter (consPerson p) state sm
consPerson p Nothing = Just [p]
consPerson p (Just ps) = Just (p:ps)
And create a Set
of all Social Security Numbers of our employees per State:
-- Transform `Map` to `Set` to `Map`
stateSocialsMap :: Employees -> Map.Map State (Set.Set SSN)
stateSocialsMap = Set.foldl updateState Map.empty . Map.keysSet
where updateState sm ssn =
case Map.lookup (ssnPrefix ssn) prefixStateMap of
Nothing -> sm
Just state -> Map.alter (addSSN ssn) state sm
addSSN ssn Nothing = Just $ Set.singleton ssn
addSSN ssn (Just ssnSet) = Just $ Set.insert ssn ssnSet
Here is a trial run on our employees
data base:
λ> putStrLn $ showMap $ statePersonsMap employees
(Arizona,[Julia Bloom (Female)])
(Colorado,[Mary Jones (Female),John Doe (Male),Bob Jones (Male)])
(NewMexico,[Maria Gonzalez (Female),John Doe (Male),William Smith (Male)])
λ> putStrLn $ showMap $ stateSocialsMap employees
(Arizona,fromList [527-75-1035])
(Colorado,fromList [521-01-8756,522-43-9862,524-34-1234])
(NewMexico,fromList [525-15-5673,525-21-5423,585-11-1234])
Alternative approach
Although desired result is achieved, we can do better in terms of performance,
since above implementation does not take advantage of the ordering of Social
Security Numbers that is made available to us by Map
interface.
First, we want to generalize our employeesFrom*
functions and then use it to
partition our Employees
:
employeesFrom :: State -> Employees -> Employees
employeesFrom state es = Map.unions $ map fromRange (statePrefixRangeMap Map.! state)
where fromRange (low, high) = withinPrefixRange low high es
allStateEmployeesMap :: Employees -> Map.Map State Employees
allStateEmployeesMap es = Map.fromSet (`employeesFrom` es) allStates
Calling allStateEmployeesMap
will produce undesired empty Maps of Employees
for some States, but there is a reason behind it, it will help us demonstrate
similarities between Map.filter
and Map.mapMaybe
:
statePersonsMap' :: Employees -> Map.Map State [Person]
statePersonsMap' = Map.map Map.elems . Map.filter (not . Map.null) . allStateEmployeesMap
stateSocialsMap' :: Employees -> Map.Map State (Set.Set SSN)
stateSocialsMap' = Map.mapMaybe nonEmptyElems . allStateEmployeesMap
where nonEmptyElems sem | Map.null sem = Nothing
| otherwise = Just $ Map.keysSet sem
Subset and Submap.
Here are two equivalent approaches of how we would check if we are missing a
State
from our Map
:
λ> Set.isProperSubsetOf (Map.keysSet $ allStateEmployeesMap employees) allStates
False
λ> Set.isProperSubsetOf (Map.keysSet $ statePersonsMap employees) allStates
True
λ> Map.isProperSubmapOfBy (const . const True) (allStateEmployeesMap employees) statePrefixMap
False
λ> Map.isProperSubmapOfBy (const . const True) (statePersonsMap employees) statePrefixMap
True
Conclusion
In this tutorial we looked at very useful Map
and Set
containers, but there
are plenty of other packages that provide implementations of all sorts of data
structures that might be a better match for your needs. In particular, if keys and
elements for Map
and Set
respectively are hashable and their ordering is of
no importance, it is recommended to use HashMap
and HashSet
from
unordered-containers instead.
That package provides very similar interface to one from containers library, but it has
much smaller memory impact and
better performance.
Here is a list of some packages and corresponding tutorials, that can help you choose and
get started with an appropriate data structure for your problem:
Strict vs lazy values
Summary: unless you really know what you're doing, always import the
.Strict
modules.
Maps are always strict in their keys, meaning that forcing a map
always forcing all of the keys. That's because it's necessary to
analyze the keys themselves to put them into the hash table or binary
tree appropriately. However, it's not necessary to be strict in the
values. The default modules (Data.Map
and Data.IntMap
) are
lazy. Use the .Strict
modules unless you have strong reason to do
otherwise.
Mutability
Unlike other languages, Maps are immutable. This means you can safely
pass them to other threads and functions, you never worry about data
races, etc. The functions that "mutate" actually return brand new map
values. If you need mutation across threads, you need to use a mutable
variable like IORef
or TVar
(which we haven't covered yet).
Downside of immutability: Maps do have a performance overhead. There's
a mutable hashtables library, but it's not nearly as well used as
containers and unordered-containers.
Exercises
Calculate the frequency of each byte available on standard
input. Example usage:
$ echo hello world | ./Main.hs
(10,1)
(32,1)
(100,1)
(101,1)
(104,1)
(108,3)
(111,2)
(114,1)
(119,1)
#!/usr/bin/env stack
-- stack --resolver lts-12.21 script
import qualified Data.ByteString.Lazy as BL
import qualified Data.Map.Strict as Map
main :: IO ()
main = do
lbs <- BL.getContents
let add m w = Map.insertWith (+) w 1 m
mapM_ print $ Map.toList $ BL.foldl' add Map.empty lbs
Implement a MultiMap
based on Map
and Set
with the following
signature and provides instances for Show
, Eq
, Foldable
,
Semigroup
, and Monoid
.
newtype MultiMap k v
insert :: (Ord k, Ord v) => k -> v -> MultiMap k v -> MultiMap k v
delete :: (Ord k, Ord v) => k -> v -> MultiMap k v -> MultiMap k v
deleteAll :: Ord k => k -> MultiMap k v -> MultiMap k v
lookup :: Ord k => k -> MultiMap k v -> Set v
member :: (Ord k, Ord v) => k -> v -> MultiMap k v -> Bool
#!/usr/bin/env stack
-- stack --resolver lts-12.21 script
{-# LANGUAGE DeriveFoldable #-}
import qualified Data.Map.Strict as Map
import Data.Map.Strict (Map)
import qualified Data.Set as Set
import Data.Set (Set)
import Data.Semigroup
import Data.Maybe (fromMaybe)
import Test.Hspec
import Prelude hiding (lookup)
newtype MultiMap k v = MultiMap
{ toMap :: Map k (Set v)
}
deriving (Show, Eq, Foldable)
instance (Ord k, Ord v) => Semigroup (MultiMap k v) where
MultiMap l <> MultiMap r = MultiMap $ Map.unionWith Set.union l r
instance (Ord k, Ord v) => Monoid (MultiMap k v) where
mempty = MultiMap mempty
mappend = (<>)
insert :: (Ord k, Ord v) => k -> v -> MultiMap k v -> MultiMap k v
insert k v (MultiMap m) =
MultiMap $ Map.insertWith Set.union k (Set.singleton v) m
delete :: (Ord k, Ord v) => k -> v -> MultiMap k v -> MultiMap k v
delete k v (MultiMap m) =
MultiMap $ Map.update fixSet k m
where
fixSet set0
| Set.null set1 = Nothing
| otherwise = Just set1
where
set1 = Set.delete v set0
deleteAll :: Ord k => k -> MultiMap k v -> MultiMap k v
deleteAll k (MultiMap m) = MultiMap $ Map.delete k m
lookup :: Ord k => k -> MultiMap k v -> Set v
lookup k (MultiMap m) = fromMaybe Set.empty $ Map.lookup k m
member :: (Ord k, Ord v) => k -> v -> MultiMap k v -> Bool
member k v (MultiMap m) = maybe False (v `Set.member`) $ Map.lookup k m
main :: IO ()
main = hspec $ do
it "member on empty fails" $ member () () mempty `shouldBe` False
it "insert/member" $ member () () (insert () () mempty) `shouldBe` True
it "insert/lookup"
$ lookup () (insert () () mempty) `shouldBe` Set.singleton ()
it "deleteAll" $ deleteAll () (insert () () mempty) `shouldBe` mempty
it "delete" $ delete () () (insert () () mempty) `shouldBe` mempty
it "<>" $
lookup () (insert () False mempty <> insert () True mempty) `shouldBe`
Set.fromList [False, True]
More exercises!
Exercise 1
Implement the printScore
function:
#!/usr/bin/env stack
-- stack --resolver lts-12.21 script
import Data.Map.Strict (Map)
import qualified Data.Map.Strict as Map
printScore :: Map String Int -> String -> IO ()
printScore = _
main :: IO ()
main =
mapM_ (printScore scores) ["Alice", "Bob", "David"]
where
scores :: Map String Int
scores = Map.fromList
[ ("Alice", 95)
, ("Bob", 90)
, ("Charlies", 85)
]
Exercise 2
We're going to write a program to figure out how much money people
have after a number of transactions. Fill in the implementation of
addMoney
:
#!/usr/bin/env stack
-- stack --resolver lts-12.21 script
import Data.HashMap.Strict (HashMap)
import qualified Data.HashMap.Strict as HashMap
import Control.Monad.State
addMoney :: (String, Int) -> State (HashMap String Int) ()
addMoney = _
main :: IO ()
main =
print $ execState (mapM_ addMoney transactions) HashMap.empty
where
transactions :: [(String, Int)]
transactions =
[ ("Alice", 5)
, ("Bob", 12)
, ("Alice", 20)
, ("Charles", 3)
, ("Bob", -7)
]
The result should be:
fromList [("Bob",5),("Alice",25),("Charles",3)]
Exercise 3
The final output from the previous program is kind of ugly. Instead, I want it to say:
Alice: 25
Bob: 5
Charles: 3
Modify the program above to get that result.
BONUS: If you really want to experience real-world Haskell code, go
for better performance and use this module:
https://www.stackage.org/haddock/lts-12.21/bytestring-0.10.8.1/Data-ByteString-Builder.html.
Exercise 4
#!/usr/bin/env stack
-- stack --resolver lts-12.21 script
import Data.HashMap.Strict (HashMap)
import qualified Data.HashMap.Strict as HashMap
import Data.List (sort)
type Name = String
type StudentId = Int
type Score = Double
students :: HashMap Name StudentId
students = HashMap.fromList
[ ("Alice", 1)
, ("Bob", 2)
, ("Charlie", 3)
]
scores :: HashMap Name Score
scores = HashMap.singleton "Bob" 90.4
noTestScore :: [Name]
noTestScore = _
main :: IO ()
main = do
putStrLn "The following students have not taken the test"
mapM_ putStrLn $ sort noTestScore
Implement noTestScore
such that the output is:
The following students have not taken the test
Alice
Charlie
NOTE: hard-coding ["Alice", "Charlie"]
is cheating! :)
Exercise 5
I want to drop the bottom 20% of test scores. Fill in the helper function.
NOTE I'm switching this exercise from lts-12.21 to lts-12.21. There's
a new helper library function available that makes this much more
straightforward to implement. Browse the docs at:
https://www.stackage.org/haddock/lts-12.21/containers-0.5.10.2/Data-Set.html
You'll also want to use the div
function, which does integration
division.
#!/usr/bin/env stack
-- stack --resolver lts-12.21 script
import Data.Set (Set)
import qualified Data.Set as Set
scores :: Set Int
scores = Set.fromList [1..10]
dropBottom20Percent :: Ord a => Set a -> Set a
dropBottom20Percent = _
main :: IO ()
main = print $ dropBottom20Percent scores
CHALLENGE: Do it for a HashSet
instead. Why is this different?
Exercise solutions
Exercise 1
#!/usr/bin/env stack
-- stack --resolver lts-12.21 script
import Data.Map.Strict (Map)
import qualified Data.Map.Strict as Map
printScore :: Map String Int -> String -> IO ()
printScore scores name =
case Map.lookup name scores of
Just score -> putStrLn $ concat
[ "Score for "
, name
, ": "
, show score
]
Nothing -> putStrLn $ "No score for " ++ name
main :: IO ()
main =
mapM_ (printScore scores) ["Alice", "Bob", "David"]
where
scores :: Map String Int
scores = Map.fromList
[ ("Alice", 95)
, ("Bob", 90)
, ("Charlies", 85)
]
Exercise 2
#!/usr/bin/env stack
-- stack --resolver lts-12.21 script
import Data.HashMap.Strict (HashMap)
import qualified Data.HashMap.Strict as HashMap
import Control.Monad.State
addMoney :: (String, Int) -> State (HashMap String Int) ()
addMoney (name, amt) = modify $ HashMap.insertWith (+) name amt
main :: IO ()
main =
print $ execState (mapM_ addMoney transactions) HashMap.empty
where
transactions :: [(String, Int)]
transactions =
[ ("Alice", 5)
, ("Bob", 12)
, ("Alice", 20)
, ("Charles", 3)
, ("Bob", -7)
]
Exercise 3
The first thing to note is that the easiest way to sort the people is
to switch from a HashMap
to a Map
. A simple find-replace on the
file will work for this.
There are a few different solutions here. Perhaps the most obvious looks like this:
import Data.Foldable (for_)
for_ (Map.toList $ execState (mapM_ addMoney transactions) Map.empty)
$ \(name, total) -> putStrLn $ name ++ ": " ++ show total
However, this is arguably bad: we're doing more inside IO
than
really necessary. Instead, we should stick to pure code as much as
possible, and instead build up a String
value:
putStr $ mapToString $ execState (mapM_ addMoney transactions) Map.empty
where
transactions :: [(String, Int)]
transactions = ...
pairToString :: (String, Int) -> String
pairToString (name, total) = name ++ ": " ++ show total
pairsToString :: [(String, Int)] -> String
pairsToString = unlines . map pairToString
mapToString :: Map String Int -> String
mapToString = pairsToString . Map.toList
This reduces the surface area of code that can do dangerous things,
and makes it easier to test our pure functions. One downside of this
is that string concatenation isn't particularly efficient. Using a
bytestring builder approach is even better.
#!/usr/bin/env stack
-- stack --resolver lts-12.21 script
{-# LANGUAGE OverloadedStrings #-}
import Control.Monad.State
import Data.ByteString.Builder (Builder, hPutBuilder, intDec)
import Data.Map.Strict (Map)
import qualified Data.Map.Strict as Map
import Data.Monoid ((<>))
import Data.Text (Text)
import Data.Text.Encoding (encodeUtf8Builder)
import System.IO (stdout)
addMoney :: (Text, Int) -> State (Map Text Int) ()
addMoney (name, amt) = modify $ Map.insertWith (+) name amt
main :: IO ()
main =
hPutBuilder stdout $
mapToBuilder $ execState (mapM_ addMoney transactions) Map.empty
where
transactions :: [(Text, Int)]
transactions =
[ ("Alice", 5)
, ("Bob", 12)
, ("Alice", 20)
, ("Charles", 3)
, ("Bob", -7)
]
pairToBuilder :: (Text, Int) -> Builder
pairToBuilder (name, total) =
encodeUtf8Builder name <> ": " <> intDec total <> "\n"
pairsToBuilder :: [(Text, Int)] -> Builder
pairsToBuilder = foldMap pairToBuilder
mapToBuilder :: Map Text Int -> Builder
mapToBuilder = pairsToBuilder . Map.toList
This is more involved than the way most people would write this
solution, but provides great performance. Note that it assumes your
output will be UTF8 encoded.
Exercise 4
noTestScore :: [Name]
noTestScore = HashMap.keys $ students `HashMap.difference` scores
Also: we can bypass the sort
if we move over to a Map
instead.
Exercise 5
dropBottom20Percent :: Ord a => Set a -> Set a
dropBottom20Percent s = Set.drop (Set.size s `div` 5) s