# Using a haskell dfa type to match strings

A friend of mine recently covered deterministic and non-deterministic finite state machines (automata for the cool kids) in one of his classes and passed along a sample problem for me to figure out. Informally, build a dfa that will match a given string against another string in the provided alphabet. An example of this class of problem would be to build one that matches 'aab' in any string for the alphabet [ab] ('aaba' would match, 'aba' would not).
To help with visualizing the answer for this example I've built a graph of the states and edges that make up our dfa (yay graphviz):
At each state the machine follows the edge that matches the next input from a given string. If it ever hits the fourth state, it's found a match. Checking the string "baab" for "aab" (for which our machine is built), starting at state one it would take the path 1 -b-> 1 -a-> 2 -a-> 3 -b-> 4, and conclude that "aab" was in fact contained within "baab".
After doing this on paper I thought it might be fun to use haskell and put together a function that took a dfa, a string to match, and produced a boolean result. The first step I took was to define a data type that represented the dfa.

type Dfa = [State] data State = State { identifier :: Int , edges :: [Edge] } | FinalState { identifier :: Int, edges :: [Edge] } data Edge = Edge { action :: (Char -> Bool), edge_identifier :: Int }This is an informal representation (you can find the formal one here) but here, a dfa is a set of States, a State is an identifier and a set of Edges, and an Edge is a function to determine if it is the edge to follow for a given input and its target state. If this looks a bit clunky at first a cleaned up version is further down. With my types in place I went ahead with building the functions to traverse a DFA for a given input:

match :: State -> Dfa -> String -> Bool match current_state _ [] = is_final current_state match current_state dfa (x:xs) = match next_state dfa xs where next_state = traverse dfa current_state x traverse :: Dfa -> State -> Char -> State traverse dfa state = (find_state dfa) . (next_id $ edges state) is_final (FinalState x y) = True is_final (State x y) = False find_state :: Dfa -> Int -> State find_state (st:sts) state_id | identifier st == state_id = st | otherwise = find_state sts state_id next_id :: [Edge] -> Char -> Int next_id (e:es) input | (action e) input = edge_identifier e | otherwise = next_id es inputThe short explanation is that for each input it checks the current state for where it should go next, having determined the destination state's id it finds that state in the dfa and then proceeds with the rest of the input. Now, to test a sample graph:

state1 = State 1 [ Edge is_a 2, Edge is_b 1] state2 = State 2 [ Edge is_a 3, Edge is_b 1] state3 = State 3 [ Edge is_a 3, Edge is_b 4] state4 = FinalState 4 [ Edge is_a 4, Edge is_b 4] graph = [state1, state2, state3, state4] is_a = (== 'a') is_b = (== 'b')You'll note that state4 refers to itself no matter the input so that the match function can operate exhaustively on the input string. And it works! But the best part of this whole exercise (for me at least) was yet to be realized because this whole first definition has been done in a very inelegant manner by failing to make use of haskell's lazy nature and recursive data types.

## Improved

The first, and easiest, thing to improve is how the Edges reference their destination states. Before, an integer was defined corresponding the identifier of the target state, but in haskell types can be defined recursively:type Dfa = [State] data State = State { edges :: [Edge] } | FinalState data Edge = Edge { condition :: (Char -> Bool), destination :: State }For reference:

type Dfa = [State] data State = State { identifier :: Int , edges :: [Edge] } | FinalState { identifier :: Int, edges :: [Edge] } data Edge = Edge { action :: (Char -> Bool), edge_identifier :: Int }Not only is the new definition much cleaner, but its also more intuitive. An edge simply hands off the _actual_ state its pointing at. No need to go find it. As a result the code necessary to operate the state machine is also significantly simplified:

match :: State -> String -> Bool match FinalState _ = True match (State _) [] = False match current_state (x:xs) = match next_state xs where next_state = traverse (edges current_state) x traverse :: [Edge] -> Char -> State traverse (e:es) input | (condition e) input = destination e | otherwise = traverse es inputNotice the type signature of traverse is very intuitive:

traverse :: [Edge] -> Char -> StateGiven a list of edges and an input it will give you the resulting state. Not the id, which would then be used to search for its corresponding state. Finally, the actual graph:

state1 = State [ Edge is_a state2, Edge is_b state1] state2 = State [ Edge is_a state3, Edge is_b state1] state3 = State [ Edge is_a state3, Edge is_b state4] state4 = FinalState is_a = (== 'a') is_b = (== 'b')