ε-NFA in Java
In previous tutorials we talked about DFA in Java and NFA in Java . Here we will focus on ε-NFA (Nondeterministic Finite Automaton) which may or may not have ε(read as "epsilon")-transitions.
What the heck are ε-transitions?
As you may guessed, the main difference between NFA and ε-NFA is ε-transitions. That is, ε-NFA may have one or more of them. I say may have because some ε-NFA don't have, in that case we will call it simply NFA. Thus, NFA can be seen as a special kind of ε-NFA without ε-transitions.
We have already encountered something similar. In Definition 1.0.2 I mentioned Kleene closure $$Σ^*$$ , which denotes all the strings over alphabet $$Σ$$. One element of $$Σ^*$$ is an empty string, denoted by $$ε$$. It was used in the base case of $$δ^∗$$ function $$δ^∗(q, ε) = q$$ to indicate the end of the string.
Similarly, ε-transition means a state's transition to the next state when input symbol is an empty string. In other words, a transition when there are no input symbol at all. Image 1.0.6 shows a state diagram of ε-NFA which recognises language of $$(ab ∪ a)^*$$.
We can see bunch of ε-transitions: from $$q_0$$ to $$q_1$$, then from $$q_1$$ to both $$q_2$$ and $$q_6$$ and so on.
How should we interpret all those transitions? Let's say automaton's current state is $$q_0$$, then without
any input it transitions to $$q_1$$, from there to $$q_2$$ and $$q_6$$. So actually, at the beginning of computation
automaton is in 4 states. Or we can think of it as 4 parallel machines computing simultaneously.
Image 1.0.7
shows a computation tree for ab
string.
First of all, computation begins at the start state $$q_0$$ (root node). Notice, $$q_0$$ has ε-transition to $$q_1$$, thus we create new branch node with a leaf of $$q_1$$. From $$q_1$$ there are two more transitions: one to $$q_2$$, another to $$q_6$$, therefore we create two more branches. Lastly, to indicate that $$q_0$$ is included in a set of current states, we make a branch for it as well.
So, the second level of the tree shows all current states of the machine. This is interesting, because we haven't even
input any symbol and yet ε-NFA is already in 4 states. What is more interesting, that one of those states $$q_0$$ is an
accept state. Therefore, if we would give an empty string, it would accept it. Anyway, let's look at first input symbol
a
. There are no transition from $$q_0$$ and $$q_1$$, thus both machines die. On the other hand,
$$q_6$$ has transition to $$q_7$$, from there are four ε-transitions to other states, itself included. Also, $$q_2$$
has transition to $$q_3$$. From there are two ε-transitions: one to itself and another to $$q_4$$.
For next symbol b
, there is only one transition: from $$q_4$$ to $$q_5$$. Therefore,
all other machines die. From $$q_5$$ are four ε-transitions: to $$q_5$$, $$q_1$$, $$q_2$$, $$q_6$$. One of those
transitions leads to $$q_5$$ state, which is an accept state. Thus, ε-NFA accepts string ab
.
Formal definition of ε-NFA?
A nondeterministic finite automaton (ε-NFA) $$M$$ is a 5-tuple $$(Q, Σ, δ, q_0, F)$$, where
- $$Q$$ is a finite set called the states,
- $$Σ$$ is a finite set called the alphabet,
- $$δ : Q × Σ_ε → P(Q)$$ is the transition function and $$Σ_ε = Σ ∪ \{ε\}$$
- $$q_0 ∈ Q$$ is the start state, and
- $$F ⊆ Q$$ is the set of accept states.
The main difference between Definition 1.0.6 and Definition 1.0.9 is the domain of transition function $$δ$$, more particularly - an alphabet $$Σ$$. Former doesn't include an empty string $$ε$$, while latter does.
Extended transition function for ε-NFA
Usually, after formally defining automaton we give definition of extended transition function $$δ^*$$ following of acceptance of strings. Unfortunately, ε-NFA has ε-transitions which complicate things. Thus, before we can talk about $$δ^*$$ function, we have to define ε-closure. Informally, we ε-close a state $$q$$ by following all the transitions which are labeled ε. When we get to other states, then we follow ε-transitions out of those states, and so on, eventually finding every state that can be reach from $$q$$ along any path whose arrows are labeled ε. More formally,
Let $$M = (Q, Σ, δ, q_0, F)$$ be a nondeterministic finite automaton (ε-NFA) and $$S ⊆ Q$$. We define the ε-closure of $$S$$,
$$$E(S)$$$as follows:
- $$S ⊆ E(S)$$
- For every $$q ∈ E(S)$$, $$δ(q, ε) ⊆ E(S)$$
At first glance Definition 1.1.1 is not very intuitive. So, let's tackle this together. We take any set $$S$$ which is a subset of $$Q$$. Meaning, $$S$$ represents some states from all the states in the machine $$M$$. In addition, we are defining $$E(S)$$ in terms of set of states, rather just one state. So, if I would want to implement ε-closure as a method, I would need to pass set of states to it, rather just one state.Then, the base case $$S ⊆ E(S)$$ tells us that a set $$S$$ is a subset of $$F(S)$$. But how does it help us? We still don't know what elements $$E(S)$$ has in the first place. In fact, we are defining $$E(S)$$ in this very moment. But we do know definition of the subset
Let $$A$$ and $$B$$ be sets. Then $$A$$ is a subset of $$B$$, written
$$$A ⊆ B$$$if and only if
$$$\forall x \bullet\text{ If }x \in A, \text{ then }x \in B$$$This funny looking symbol $$\forall$$ stands for "for all". So, one set is a subset of another if and only if, all the elements in it are elements in another. Thus, by saying that set $$S$$ is a subset of $$E(S)$$, we are saying, that all elements in $$S$$ is also in $$E(S)$$, otherwise our statement would be false. And that makes sense. Because, when we where considering ε-transitions, we said that a state itself is always included in the resulting set.
If the base case in Definition 1.1.1 includes current states in $$F(S)$$, then the second part includes all the states which can be reached following ε-transitions. Let's look at specific example of $$E(S)$$ when $$S=\{q_0\}$$,
$$$\begin{align} S = \{q_0\} \subseteq E(S) &\implies q_0 \in E(S) \\ δ(q_0, ε) = \{q_1\} ⊆ E(S) &\implies q_1 \in E(S)\\ δ(q_1, ε) = \{q_2, q_6\} ⊆ E(S) &\implies q_2,q_6 \in E(S) \\ δ(q_2, ε) = \emptyset ⊆ E(S) \\ δ(q_6, ε) = \emptyset ⊆ E(S) \\ \end{align} $$$Thus, if $$S = \{q_0\}$$, then $$F(S) = \{q_0, q_1, q_2, q_6\}$$. Notice, there are no ε-transitions from $$q_2$$ or $$q_6$$, therefore $$δ(q_2 \text{ or } q_6, ε) = \emptyset$$. Empty set has no elements, thus it has no implication in this case. Recursive part of Definition 1.1.1 is considered finished, when all the elements (states) in $$F(S)$$ has been checked for ε-transitions.
Now, we are ready to define extended transition function.
Let $$M = (Q, Σ, δ, q_0, F)$$ be a nondeterministic finite automaton (ε-NFA). We define the extended transition function
$$$δ^* : Q × Σ^* → P(Q)$$$as follows:
- For every $$q ∈ Q$$, $$δ^∗(q, ε) = E(\{q\})$$
- For every $$q ∈ Q$$, every $$y ∈ Σ^*$$, and every $$a ∈ Σ$$,
$$$δ^∗(q, ya ) = E(\bigcup \{δ(p,a) \mid p ∈ δ^∗(q, y)\})$$$
The main difference between NFA's Definition 1.0.7 of $$δ^*$$ and Definition 1.1.0 is ε-closure inclusion in the latter. For example, the base case changed from $$δ^∗(q, ε) = \{q\}$$ to $$δ^∗(q, ε) = E(\{q\})$$. This has an effect of including all the ε-transitions of set of states at the end of input string. Similarly, in recursive case at each pass, $$δ^*$$ returns not just set of states given a symbol, but all the states which can be reached following ε-transitions as well. For definition of string acceptance see Definition 1.0.8 .
ε-NFA as 5-tuple
We have all the necessary information to implement ε-NFA in Java. So, as always we will define a specific automaton as 5-tuple and then will write code for it. Image 1.0.6 shows an ε-NFA which recognises $$(ab ∪ a)^*$$ language, its formal definition is
$$M = (Q, Σ, δ, q_0, F )$$, where
- $$Q = \{q_0, q_1, q_2, q_3, q_4, q_5, q_6, q_7\}$$,
- $$Σ = \{$$
a
$$,$$b
$$\}$$ - $$q_0 = q_0$$,
- $$F = \{q_0, q_5, q_7\}$$.
- $$δ$$ is described as a transition table
$$a$$ | $$b$$ | $$ε$$ | |
---|---|---|---|
$$q_0$$ | $$\emptyset$$ | $$\emptyset$$ | $$\{q_1\}$$ |
$$q_1$$ | $$\emptyset$$ | $$\emptyset$$ | $$\{q_2, q_6\}$$ |
$$q_2$$ | $$\{q_3\}$$ | $$\emptyset$$ | $$\emptyset$$ |
$$q_3$$ | $$\emptyset$$ | $$\emptyset$$ | $$\{q_4\}$$ |
$$q_4$$ | $$\emptyset$$ | $$\{q_5\}$$ | $$\emptyset$$ |
$$q_5$$ | $$\emptyset$$ | $$\emptyset$$ | $$\{q_1\}$$ |
$$q_6$$ | $$\{q_7\}$$ | $$\emptyset$$ | $$\emptyset$$ |
$$q_6$$ | $$\emptyset$$ | $$\emptyset$$ | $$\{q_1\}$$ |
Implementing states
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
private enum States {
Q0(true), Q1(false), Q2(false), Q3(false),
Q4(false), Q5(true), Q6(false), Q7(true);
final boolean accept;
Set<States> a;
Set<States> b;
Set<States> epsilon;
static {
Q0.a = Collections.EMPTY_SET;
Q0.b = Collections.EMPTY_SET;
Q0.epsilon = new HashSet<>(Arrays.asList(Q1));
Q1.a = Collections.EMPTY_SET;
Q1.b = Collections.EMPTY_SET;
Q1.epsilon = new HashSet<>(Arrays.asList(Q2, Q6));
Q2.a = new HashSet<>(Arrays.asList(Q3));
Q2.b = Collections.EMPTY_SET;
Q2.epsilon = Collections.EMPTY_SET;
Q3.a = Collections.EMPTY_SET;
Q3.b = Collections.EMPTY_SET;
Q3.epsilon = new HashSet<>(Arrays.asList(Q4));
Q4.a = Collections.EMPTY_SET;
Q4.b = new HashSet<>(Arrays.asList(Q5));
Q4.epsilon = Collections.EMPTY_SET;
Q5.a = Collections.EMPTY_SET;
Q5.b = Collections.EMPTY_SET;
Q5.epsilon = new HashSet<>(Arrays.asList(Q1));
Q6.a = new HashSet<>(Arrays.asList(Q7));
Q6.b = Collections.EMPTY_SET;
Q6.epsilon = Collections.EMPTY_SET;
Q7.a = Collections.EMPTY_SET;
Q7.b = Collections.EMPTY_SET;
Q7.epsilon = new HashSet<>(Arrays.asList(Q1));
}
States(boolean accept) {this.accept = accept;}
Set<States> transition(char ch) {
switch (ch) {
case 'a': return this.a;
case 'b': return this.b;
default: return Collections.EMPTY_SET;
}
}
//...
}
As always we create private enum States {...}
which represents all states in the automaton.
Then, in static {...}
block we initialise transition table. Due to the fact that this is
ε-NFA implementation, each state has set of ε-transitions. Note, Set<States> epsilon;
does not represent ε-closure, but subset of it. The actual e-closure is implemented in
Set<States> eclose() {...}
method.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
private enum States {
//...
Set<States> eClose() {
Set<States> states = new HashSet<>();
states.add(this);
for(States e: this.epsilon)
states.addAll(e.eClose());
return states;
}
}
As you can see, Set<States> eclose() {...}
method belongs to
private enum States {...}
. Because Java is object oriented, it makes sense
to ask a state for its ε-closure. First of all, we create an empty set of states which corresponds to $$E(S)$$, then we add
state itself to it, this action corresponds to the base case of
Definition 1.1.1
. Later we loop trough each
state which can be reached along ε-transitions and add them all recursively, just like in recursive case of
Definition 1.1.1
.
The only difference, we don't call Set<States> transition(char ch) {...}
because
we can access ε-transitions directly through Set<States> epsilon;
instance variable.
The recursion is finished when all states in $$E(S)$$ has been checked.
Implementing $$δ^*$$
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
public class eNFA {
private enum States {...}
public boolean accept(String string) {
Set<States> currStates = new HashSet<>(States.Q0.eClose());
for (int i = 0; i < string.length(); i++)
currStates = union(currStates, string.charAt(i));
return currStates.stream().anyMatch(s -> s.accept);
}
// helper method which returns set of next states
private Set<States> union(Set<States> currStates, char ch) {
Set<States> nextStates = new HashSet<>();
for (States cState : currStates)
for(States s: cState.transition(ch))
nextStates.addAll(s.eClose());
return nextStates;
}
}
The first method public boolean accept(String string) {...}
in public class eNFA {...}
looks almost identical to NFA in
Source code 1.0.6
Except that start state is ε-closed. Helper method has similar
modification, it returns not just next states reachable from current states given symbol, but
their ε-closure as well.