DFA in Java
In this tutorial I will show you how to implement any deterministic finite automaton (DFA) in Java. Before we start, would be good thing to refresh our knowledge about them (see Theory of DFA ).
DFA as 5-tuple
For implementation, I will use DFA given in Image 1.0.0 . While state diagrams are fun to look at, they are not as useful as Definition 1.0.0 . It will be much easier for us to implement DFA looking at it as 5-tuple rather using state diagram. Thus, DFA given in Image 1.0.0 is described as
$$M = (Q, Σ, δ, q_0, F )$$, where
- $$Q = \{q_0, q_1, q_2, q_3, q_4\}$$,
- $$Σ = \{$$
:
$$,$$)
$$,$$(
$$,$$_
$$\}$$ - $$q_0 = q_0$$,
- $$F = \{q_2, q_3\}$$.
- $$δ$$ is described as a transition table
$$:$$ | $$)$$ | $$($$ | $$\_$$ | |
---|---|---|---|---|
$$q_0$$ | $$q_1$$ | $$q_4$$ | $$q_4$$ | $$q_0$$ |
$$q_1$$ | $$q_4$$ | $$q_2$$ | $$q_2$$ | $$q_4$$ |
$$q_2$$ | $$q_4$$ | $$q_4$$ | $$q_4$$ | $$q_3$$ |
$$q_3$$ | $$q_1$$ | $$q_4$$ | $$q_4$$ | $$q_3$$ |
$$q_4$$ | $$q_4$$ | $$q_4$$ | $$q_4$$ | $$q_4$$ |
Implementing states
Given all this information, we can start coding. We see that automaton
has 4states: $$q_0$$, $$q_1$$, $$q_2$$, $$q_3$$, $$q_4$$. All of them are known in advance. Thus, we can create
private enum States {...}
to represent them. Nice thing about Java enum, is that you can define methods and instance variables in them. For example,
to differentiate accept and non accept states, we can introduce instance variable
final boolean accept;
for each state.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
private enum States {
Q0(false), Q1(false), Q2(true), Q3(true), Q4(false);
final boolean accept;
// Constructor for sate.
// Every state is either accepting or not.
States(boolean accept) {
this.accept = accept;
}
//...
}
private
access modifier. Line 3 creates all four states and represents $$Q$$ from definition. $$F$$ is
represented as an instance variable inside enum. We could create set consisting of all accept states as per definition. But
remember, Java is object oriented, thus better treat states as objects, rather as data structures, in this case sets.
Lastly, on Line 9 we are passing boolean
value to the constructor. Line 13 indicates that this
class is not finished.
Implementing transitions
Just like that, we defined automaton's states $$Q$$ and a subset of accepting states $$F$$. Now, let's think how to implement
transitions. Probably your first thought is: we can use two dimensional array or a map or a hashtable. All these
solutions are good, and they would work. Unfortunately they are not very object oriented. Think of a state as being
an object, to which you can send some messages for enquiry or making them to do something for you. For example, we can
ask it if it is an accept state or not? By introducing a method
boolean isAccept() {...}
which would return it's
instance variable accept
. But in our case, it is not necessary, because
private enum States {...}
will be a member of
public class DFA {...}
, thus all its variables will be accessible to outer class.
Similarly, given a character from the alphabet, we can ask a state to transit to another. So, instead of writing
transition table and then passing it to some kind of function, we will define method
States transition(char ch) {...}
inside
private enum States {...}
to get next state. For that we will have to introduce additional instance variables in the enum.
Each variable will contain a reference to the state. I think, to show is much easier than to explain.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
private enum States {
Q0(false), Q1(false), Q2(true), Q3(true), Q4(false);
States eyes;
States smile;
States sad;
States space;
static {
Q0.eyes = Q1; Q0.smile = Q4; Q0.sad = Q4; Q0.space = Q0;
Q1.eyes = Q4; Q1.smile = Q2; Q1.sad = Q2; Q1.space = Q4;
Q2.eyes = Q4; Q2.smile = Q4; Q2.sad = Q4; Q2.space = Q3;
Q3.eyes = Q1; Q3.smile = Q4; Q3.sad = Q4; Q3.space = Q3;
Q4.eyes = Q4; Q4.smile = Q4; Q4.sad = Q4; Q4.space = Q4;
}
States transition(char ch) {
switch (ch) {
case ':':
return this.eyes;
case ')':
return this.smile;
case '(':
return this.sad;
case '_':
return this.space;
default:
throw new RuntimeException("Symbol is " +
"not in the alphabet");
}
}
//...
}
First few lines are nothing new. On the Line 5, like I said, we are introducing some instance variables. There are four
symbols in the alphabet, thus each state will have four transitions. For example
States eyes;
represents a state, to which
current state would transit if input symbol would be :
.
Given these extra variables, we need to assign values to them. We do this in
static {...}
block. You may ask Why can't we pass
values as arguments to the constructor, like we did for final boolean accept;
variable. For two reasons:
- Forward referencing. When we are trying to assign a state to the variable, some of the states are not have been
initialised. For example, to assign
Q1
toeyes
inQ0
we need reference ofQ1
, but it is not initialised yet, so we get an error. - Circular referencing. It is similar to forward referencing. We are trying to pass non existing state to the constructor.
The difference is, that constructor belongs to the state itself. For example, passing
Q4
toQ4
constructor.
On the other hand, in static {...}
block all the states are initialised and available to use.
For convenience I arranged assignments of variables, so that
they would resemble transition table. Also, lets not forget about
States transition(char ch) {...}
method which checks what kind of
character is passed, and based on that returns next state. We could have transition function in
public class DFA {...}
class, it would
make no difference.
Fixing transition method
At this moment you should notice one thing: transition methodStates transition(char ch) {...}
breaks definition of
DFA as 5-tuple
.
Because it accepts more than four symbols of our alphabet. In fact, it accepts any $$Unicode$$ character.
Therefore, $$Σ = \{$$:
$$,$$)
$$,$$(
$$,$$_
$$\} \neq Unicode $$. There are few ways how to fix it:
- By encapsulating alphabet.
- By changing DFA itself.
By encapsulating alphabet
To enforce $$Σ = \{$$:
$$,$$)
$$,$$(
$$,$$_
$$\}$$ we could encapsulate alphabet. By creating
private enum Alphabet {...}
and changing transition signature to
States transition(Alphabet symbol) {...}
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
private enum Alphabet {
EYES, SMILE, SAD, SPACE
}
private enum States {
//...
States transition(Alphabet symbol) {
switch (symbol) {
case EYES:
return this.eyes;
case SMILE:
return this.smile;
case SAD:
return this.sad;
case SPACE:
return this.space;
default:
return null;
}
}
}
Now, would be impossible to pass non exciting character to the transition method. While this approach would solve our problem,
it would create new one: what to do with other $$Unicode$$ characters which are not in the alphabet? In the original
Source code 1.0.1
we threw an exception in the transition method. Now we could to the same, only outside DFA. Or change
strings to List<Alphabet>
. The first approach would solve no problem, while latter
would be impractical, unless DFA would be confined in Java program and wouldn't interact with outside word.
By changing DFA itself
The main problem is an alphabet $$Σ$$. We want to change it from $$\{$$:
$$,$$)
$$,$$(
$$,$$_
$$\}$$ to $$Unicode$$.
Image 1.0.2
shows a state
diagram of a new DFA.
Notice, we removed all symbols from the arrows pointing to the state $$q_4$$. An empty arrow stands for
a transition from a state $$q_i$$, where $$i ∈ \{0, 1, 2, 3\}$$, to $$q_4$$ if and only if there are no transition from
$$q_i$$ to $$q_i$$, given a symbol from the $$Unicode$$. For example, if current state is $$q_0$$ then given
:
DFA transitions to $$q_1$$, while given ^
to $$q_4$$.
Source code 1.0.3
shows new implementation of private enum States {...}
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
private enum States {
//...
static {
Q0.eyes = Q1; Q0.space = Q0;
Q1.smile = Q2; Q1.sad = Q2;
Q2.space = Q3;
Q3.eyes = Q1; Q3.space = Q3;
}
States transition(char ch) {
switch (ch) {
case ':':
return this.eyes == null? Q4 : this.eyes;
case ')':
return this.smile == null? Q4: this.smile;
case '(':
return this.sad == null? Q4: this.sad;
case '_':
return this.space == null? Q4: this.space;
default:
return Q4;
}
}
//...
So, we removed some of the assignments of states instance variables and changed transition method. Now, before returning
next state it checks if instance variable is null
or not. In case of null
and by default it returns Q4
state.
Notice, even we did not define all possible transitions of the state, this DFA implementation conforms with Definition 1.0.0 , because for every state and every symbol from the alphabet there is a state.
Implementing $$δ^*$$
The only thing what left is to implement extended transition function $$δ^*$$ (see
Definition 1.0.2
). We will
do this by creating
public boolean accept(String string) {...}
method
in public class DFA {...}
which checks if automaton accepts string.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
public class DFA {
private enum States {
//...
}
public boolean accept(String string) {
States state = States.Q0;
for (int i = 0; i < string.length(); i++) {
state = state.transition(string.charAt(i));
}
return state.accept;
}
}
Additional notes
Using enums we can implement any DFA. Unfortunately they are not very flexible. Meaning, that if we would like to introduce additional state or extra character, then we would need to modify enum itself, which violates Open/Close principle (Wikipedia) . For that we could use State pattern (Wikipedia) . But if you are thinking about modification of DFA, then probably your don't need DFA in the first place. Because by definition, DFA (Deterministic Finite Automaton) is finite, meaning it has finite set of states and deterministic - its next state is determined by input symbol.