DFA in Java

In this tutorial I will show you how to implement any deterministic finite automaton (DFA) in Java. Before we start, would be good thing to refresh our knowledge about them (see Theory of DFA ).

DFA as 5-tuple

For implementation, I will use DFA given in Image 1.0.0 . While state diagrams are fun to look at, they are not as useful as Definition 1.0.0 . It will be much easier for us to implement DFA looking at it as 5-tuple rather using state diagram. Thus, DFA given in Image 1.0.0 is described as

$$M = (Q, Σ, δ, q_0, F )$$, where

  1. $$Q = \{q_0, q_1, q_2, q_3, q_4\}$$,
  2. $$Σ = \{$$:$$,$$)$$,$$($$,$$_$$\}$$
  3. $$q_0 = q_0$$,
  4. $$F = \{q_2, q_3\}$$.
  5. $$δ$$ is described as a transition table
$$:$$ $$)$$ $$($$ $$\_$$
$$q_0$$ $$q_1$$ $$q_4$$ $$q_4$$ $$q_0$$
$$q_1$$ $$q_4$$ $$q_2$$ $$q_2$$ $$q_4$$
$$q_2$$ $$q_4$$ $$q_4$$ $$q_4$$ $$q_3$$
$$q_3$$ $$q_1$$ $$q_4$$ $$q_4$$ $$q_3$$
$$q_4$$ $$q_4$$ $$q_4$$ $$q_4$$ $$q_4$$
Table 1.0.0: Transition table of DFA.

Implementing states

Given all this information, we can start coding. We see that automaton has 4states: $$q_0$$, $$q_1$$, $$q_2$$, $$q_3$$, $$q_4$$. All of them are known in advance. Thus, we can create private enum States {...} to represent them. Nice thing about Java enum, is that you can define methods and instance variables in them. For example, to differentiate accept and non accept states, we can introduce instance variable final boolean accept; for each state.

Source code 1.0.0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
         
private enum States {

  Q0(false), Q1(false), Q2(true), Q3(true), Q4(false);

  final boolean accept;

  // Constructor for sate.
  // Every state is either accepting or not.
  States(boolean accept) {
    this.accept = accept;
  }

//...
}
      
Nothing fancy going on in Source code 1.0.0 . Line 1 declares enum. Notice, because it will be inside of another class, enum has private access modifier. Line 3 creates all four states and represents $$Q$$ from definition. $$F$$ is represented as an instance variable inside enum. We could create set consisting of all accept states as per definition. But remember, Java is object oriented, thus better treat states as objects, rather as data structures, in this case sets. Lastly, on Line 9 we are passing boolean value to the constructor. Line 13 indicates that this class is not finished.

Implementing transitions

Just like that, we defined automaton's states $$Q$$ and a subset of accepting states $$F$$. Now, let's think how to implement transitions. Probably your first thought is: we can use two dimensional array or a map or a hashtable. All these solutions are good, and they would work. Unfortunately they are not very object oriented. Think of a state as being an object, to which you can send some messages for enquiry or making them to do something for you. For example, we can ask it if it is an accept state or not? By introducing a method boolean isAccept() {...} which would return it's instance variable accept. But in our case, it is not necessary, because private enum States {...} will be a member of public class DFA {...}, thus all its variables will be accessible to outer class.

Similarly, given a character from the alphabet, we can ask a state to transit to another. So, instead of writing transition table and then passing it to some kind of function, we will define method States transition(char ch) {...} inside private enum States {...} to get next state. For that we will have to introduce additional instance variables in the enum. Each variable will contain a reference to the state. I think, to show is much easier than to explain.

Source code 1.0.1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
         
private enum States {

  Q0(false), Q1(false), Q2(true), Q3(true), Q4(false);

  States eyes;
  States smile;
  States sad;
  States space;

  static {
    Q0.eyes = Q1; Q0.smile = Q4; Q0.sad = Q4; Q0.space = Q0;
    Q1.eyes = Q4; Q1.smile = Q2; Q1.sad = Q2; Q1.space = Q4;
    Q2.eyes = Q4; Q2.smile = Q4; Q2.sad = Q4; Q2.space = Q3;
    Q3.eyes = Q1; Q3.smile = Q4; Q3.sad = Q4; Q3.space = Q3;
    Q4.eyes = Q4; Q4.smile = Q4; Q4.sad = Q4; Q4.space = Q4;
}

  States transition(char ch) {
    switch (ch) {
      case ':':
        return this.eyes;
      case ')':
        return this.smile;
      case '(':
        return this.sad;
      case '_':
        return this.space;
      default:
        throw new RuntimeException("Symbol is " +
           "not in the alphabet");
  }
}

//...
}
      

First few lines are nothing new. On the Line 5, like I said, we are introducing some instance variables. There are four symbols in the alphabet, thus each state will have four transitions. For example States eyes; represents a state, to which current state would transit if input symbol would be :.

Given these extra variables, we need to assign values to them. We do this in static {...} block. You may ask Why can't we pass values as arguments to the constructor, like we did for final boolean accept; variable. For two reasons:

  1. Forward referencing. When we are trying to assign a state to the variable, some of the states are not have been initialised. For example, to assign Q1 to eyes in Q0 we need reference of Q1, but it is not initialised yet, so we get an error.
  2. Circular referencing. It is similar to forward referencing. We are trying to pass non existing state to the constructor. The difference is, that constructor belongs to the state itself. For example, passing Q4 to Q4 constructor.

On the other hand, in static {...} block all the states are initialised and available to use. For convenience I arranged assignments of variables, so that they would resemble transition table. Also, lets not forget about States transition(char ch) {...} method which checks what kind of character is passed, and based on that returns next state. We could have transition function in public class DFA {...} class, it would make no difference.

Fixing transition method

At this moment you should notice one thing: transition method States transition(char ch) {...} breaks definition of DFA as 5-tuple .

Because it accepts more than four symbols of our alphabet. In fact, it accepts any $$Unicode$$ character. Therefore, $$Σ = \{$$:$$,$$)$$,$$($$,$$_$$\} \neq Unicode $$. There are few ways how to fix it:

  • By encapsulating alphabet.
  • By changing DFA itself.

By encapsulating alphabet

To enforce $$Σ = \{$$:$$,$$)$$,$$($$,$$_$$\}$$ we could encapsulate alphabet. By creating private enum Alphabet {...} and changing transition signature to States transition(Alphabet symbol) {...}.

Source code 1.0.2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
         
private enum Alphabet {

  EYES, SMILE, SAD, SPACE
}

private enum States {

  //...

  States transition(Alphabet symbol) {
    switch (symbol) {
      case EYES:
        return this.eyes;
      case SMILE:
        return this.smile;
      case SAD:
        return this.sad;
      case SPACE:
        return this.space;
      default:
        return null;
    }
  }
}
      

Now, would be impossible to pass non exciting character to the transition method. While this approach would solve our problem, it would create new one: what to do with other $$Unicode$$ characters which are not in the alphabet? In the original Source code 1.0.1 we threw an exception in the transition method. Now we could to the same, only outside DFA. Or change strings to List<Alphabet>. The first approach would solve no problem, while latter would be impractical, unless DFA would be confined in Java program and wouldn't interact with outside word.

By changing DFA itself

The main problem is an alphabet $$Σ$$. We want to change it from $$\{$$:$$,$$)$$,$$($$,$$_$$\}$$ to $$Unicode$$. Image 1.0.2 shows a state diagram of a new DFA.

State diagram of DFA with Unicode alphabet.
Image 1.0.2: State diagram of DFA with Unicode alphabet.

Notice, we removed all symbols from the arrows pointing to the state $$q_4$$. An empty arrow stands for a transition from a state $$q_i$$, where $$i ∈ \{0, 1, 2, 3\}$$, to $$q_4$$ if and only if there are no transition from $$q_i$$ to $$q_i$$, given a symbol from the $$Unicode$$. For example, if current state is $$q_0$$ then given : DFA transitions to $$q_1$$, while given ^ to $$q_4$$. Source code 1.0.3 shows new implementation of private enum States {...} .

Source code 1.0.3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
         
private enum States {

  //...

  static {
    Q0.eyes = Q1;  Q0.space = Q0;
    Q1.smile = Q2; Q1.sad = Q2;
    Q2.space = Q3;
    Q3.eyes = Q1; Q3.space = Q3;
  }

  States transition(char ch) {
    switch (ch) {
      case ':':
        return this.eyes == null? Q4 : this.eyes;
      case ')':
        return this.smile == null? Q4: this.smile;
      case '(':
        return this.sad == null? Q4: this.sad;
      case '_':
        return this.space == null? Q4: this.space;
      default:
        return Q4;
    }
  }

  //...
      

So, we removed some of the assignments of states instance variables and changed transition method. Now, before returning next state it checks if instance variable is null or not. In case of null and by default it returns Q4 state.

Notice, even we did not define all possible transitions of the state, this DFA implementation conforms with Definition 1.0.0 , because for every state and every symbol from the alphabet there is a state.

Implementing $$δ^*$$

The only thing what left is to implement extended transition function $$δ^*$$ (see Definition 1.0.2 ). We will do this by creating public boolean accept(String string) {...} method in public class DFA {...} which checks if automaton accepts string.

Source code 1.0.4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
         
public class DFA {

  private enum States {
    //...
  }

  public boolean accept(String string) {
    States state = States.Q0;

    for (int i = 0; i < string.length(); i++) {
      state = state.transition(string.charAt(i));
    }
    return state.accept;
  }
}
      

Additional notes

Using enums we can implement any DFA. Unfortunately they are not very flexible. Meaning, that if we would like to introduce additional state or extra character, then we would need to modify enum itself, which violates Open/Close principle (Wikipedia) . For that we could use State pattern (Wikipedia) . But if you are thinking about modification of DFA, then probably your don't need DFA in the first place. Because by definition, DFA (Deterministic Finite Automaton) is finite, meaning it has finite set of states and deterministic - its next state is determined by input symbol.

Download (GitHub)