Translating regular expressions to NFAs

Next: Summing up ... Up: Regular expressions Previous: The meaning of regular

Translating regular expressions to NFAs

Theorem 3.1 For each regular expression

we can construct ab NFA

s.t.

, i.e. the automaton accepts the language described by the regular expression.

Proof:

We do this again by induction on the syntax of regular expressions:

$N(\emptyset)$ :
$\epsfbox{re-empty.eps}$
which will reject everything (it has got no final states) and hence

$\displaystyle L(N(\emptyset))$ $\displaystyle = \emptyset$

$\displaystyle = L(\emptyset)$
$N(\epsilon)$ :
$\epsfbox{re-epsilon.eps}$
This automaton accepts the empty word but rejects everything else, hence:

$\displaystyle L(N(\epsilon))$ $\displaystyle = \{\epsilon\}$

$\displaystyle = L(\epsilon)$
$N(\textbf{x})$ :
$\epsfbox{re-x.eps}$
This automaton only accepts the word x, hence:

$\displaystyle L(N(\textbf{x}))$ $\displaystyle = \{ \textrm{x} \}$

$\displaystyle = L(\textbf{x})$
:
We merge the diagrams for and into one:
$\epsfbox{re-plus.eps}$
I.e. given

$\displaystyle N(E)$ $\displaystyle = (Q_E,\Sigma,\delta_E,S_E,F_E)$

$\displaystyle N(F)$ $\displaystyle = (Q_F,\Sigma,\delta_F,S_F,F_F)$

Now we use the disjoint union opration on sets (hope you still remember from MCS), otherwise see pp.55 in the G51MCS notes.

$\displaystyle Q_{E+F}$ $\displaystyle = Q_E + Q_F$

$\displaystyle \delta_{E+F}((0,q),x)$ $\displaystyle = \{ (0,q') \mid q' \in \delta_E(q,x) \}$

$\displaystyle \delta_{E+F}((1,q)),x$ $\displaystyle = \{ (1,q') \mid q' \in \delta_F(q,x) \}$

$\displaystyle S_{E+F}$ $\displaystyle = S_E + S_F$

$\displaystyle F_{E+F}$ $\displaystyle = F_E + F_F$

$\displaystyle N(E+F)$ $\displaystyle = (Q_{E+F},\Sigma,\delta_{E+F},S_{E+F},F_{E+F})$

The disjoint union just signals that we are not going to identify states, even if they accidently happen to have the same name.
Just thinking of the game with markers you should be able to convince yourself that

$\displaystyle L(N(E+F)) = L(N(E)) \cup L(N(F))$

Moreover to show that

$\displaystyle L(N(E+F)) = L(E+F)$
we are allowed to assume that

$\displaystyle L(N(E))$ $\displaystyle = L(E)$

$\displaystyle L(N(F))$ $\displaystyle = L(F)$

that's what is meant by induction over the syntax of regular expressions.
Now putting everything together:

$\displaystyle L(N(E+F))$ $\displaystyle = L(N(E)) \cup L(N(F))$

$\displaystyle = L(E) \cup L(F)$

$\displaystyle = L(E+F)$
:
We want to put the two automata and in series. We do this by connecting the final states of with the initial states of in a way explained below.
$\epsfbox{re-conc.eps}$
In this diagram I only depicted one initial and one final state of each of the automata although they may be several of them.
Here is how we construct from and :

$\displaystyle N(E)$ $\displaystyle = (Q_E,\Sigma,\delta_E,S_E,F_E)$

$\displaystyle N(F)$ $\displaystyle = (Q_F,\Sigma,\delta_F,S_F,F_F)$
- The states of are the disjoint union of the states of and :
  
  $\displaystyle Q_{EF} = Q_E + Q_F$
- The transition function of contains all the transitions of and (as for ) and for each pair of a final state of and an initial state of we add all the arrows coming out of to .
  
  $\displaystyle \delta_{EF}((0,q),x)$ $\displaystyle = \{ (0,q') \mid q' \in \delta_E(q,x) \}$
  
  $\displaystyle \quad \cup \{ (1,q') \mid q\in F_E \wedge \exists q''\in S_F.q'\in \delta_F(q'',x) \}$
  
  $\displaystyle \delta_{EF}((1,q))$ $\displaystyle = \{ (1,q') \mid q' \in \delta_F(q)) \}$
- The initial states of are the initial states of , and the initial states of if there is an initial state of which is also a final state.
  
  $\displaystyle S_{EF} = \{ (0,q) \mid q\in S_E \} \cup \{ (1,q) \mid q\in S_F \wedge S_E\cap F_E \not=\emptyset \}$
- The final states of are the final states of .
  
  $\displaystyle F_{EF} = \{ (1,q) \mid q\in F_F \}$
We now set

$\displaystyle N(EF) = (Q_{EF},\Sigma,\delta_{EF},S_{EF},Z_{EF})$

I hope that you are able to convince yourself that

$\displaystyle L(N(EF)) = \{ uv \mid u\in L(N(E)) \wedge v\in L(N(F))$
and hence we can reason

$\displaystyle L(N(EF))$ $\displaystyle = \{ uv \mid u\in L(N(E)) \wedge v\in L(N(F))$ $\displaystyle = \{ uv \mid u\in L(E) \wedge v\in L(F)$

$\displaystyle = L(EF)$
:
We construct from by merging initial and final states of in a way similar to the previous construction and we add a new state which is initial and final.
$\epsfbox{re-star.eps}$
Given

$\displaystyle N(E)$ $\displaystyle = (Q_E,\Sigma,\delta_E,S_E,F_E)$

we construct .
- We add one extra state :
  
  $\displaystyle Q_{E^*} = Q_E + \{ * \}$
- ${N_E}^*$ inherits all transitions form and for each state which has an arrow to the final state labelled we also add an arrow to all the initial states labelled .
  
  $\displaystyle \delta_{E^*}((0,q),x) =$ $\displaystyle \{ (0,q') \mid q' \in \delta_E(q,x) \}$
  
  $\displaystyle \cup (0,q') \mid \exists \delta_E(q,x)\cap F_E \not= \emptyset \wedge q' \in S_E \}$
- The initial states of are the initial states of and :
  
  $\displaystyle S_{E^*} = \{ (0,q) \mid q\in S_E \} \cup \{ (1,*) \}$
- The final states of $N_{E^*}$ are the final states of and :
  
  $\displaystyle F_{E^*} = \{ (0,q) \mid q\in F_E \} \cup \{ (1,*) \}$
We define

$\displaystyle N(E^*) = (Q_{E^*},\Sigma,\delta_{E^*},S_{E^*},F_{E^*})$

We claim that

$\displaystyle L(N(E^*)) = \{ w_0w_1\dots w_{n-1} \mid n\in\mathbb{N}\wedge \forall i<n. w_i \in L(N(E)) \}$
since we can run through the automaton an arbitrary number of times. The new state allows us also to accept the empty sequence. Hence:

$\displaystyle L(N(E^*))$ $\displaystyle = \{ w_0w_1\dots w_{n-1} \mid n\in\mathbb{N}\wedge \forall i<n. w_i \in L(N(E)) \}$

$\displaystyle = L(N(E))^*$

$\displaystyle = L(E)^*$

$\displaystyle = L(E^*)$
$N(\textbf{(}E\textbf{)}) = N(E)$
I.e. using brackets does not change anything.

$\Box$

As an example we construct $N(\textrm{a}^*\textrm{b}^*)$ . First we construct $N(\textbf{a})$ :

$\epsfbox{na.eps}$

Now we have to apply the -construction and we obtain:

$\epsfbox{nastar.eps}$

is just the same and we get

$\epsfbox{nastarbstar.eps}$

and now we have to serialize the two automata and we get:

$\epsfbox{nastarconcbstar.eps}$

Now, you may observe that this automaton, though correct, is unnecessary complicated, since we could have just used

$\epsfbox{nastarbstarx.eps}$

However, we shall not be concerned with minimality at the moment.

Next: Summing up ... Up: Regular expressions Previous: The meaning of regular

Thorsten Altenkirch 2001-05-08

$\displaystyle L(N(\emptyset))$	$\displaystyle = \emptyset$
	$\displaystyle = L(\emptyset)$

$\displaystyle L(N(\epsilon))$	$\displaystyle = \{\epsilon\}$
	$\displaystyle = L(\epsilon)$

$\displaystyle L(N(\textbf{x}))$	$\displaystyle = \{ \textrm{x} \}$
	$\displaystyle = L(\textbf{x})$

$\displaystyle N(E)$	$\displaystyle = (Q_E,\Sigma,\delta_E,S_E,F_E)$
$\displaystyle N(F)$	$\displaystyle = (Q_F,\Sigma,\delta_F,S_F,F_F)$

$\displaystyle Q_{E+F}$	$\displaystyle = Q_E + Q_F$
$\displaystyle \delta_{E+F}((0,q),x)$	$\displaystyle = \{ (0,q') \mid q' \in \delta_E(q,x) \}$
$\displaystyle \delta_{E+F}((1,q)),x$	$\displaystyle = \{ (1,q') \mid q' \in \delta_F(q,x) \}$
$\displaystyle S_{E+F}$	$\displaystyle = S_E + S_F$
$\displaystyle F_{E+F}$	$\displaystyle = F_E + F_F$
$\displaystyle N(E+F)$	$\displaystyle = (Q_{E+F},\Sigma,\delta_{E+F},S_{E+F},F_{E+F})$

$\displaystyle L(N(E))$	$\displaystyle = L(E)$
$\displaystyle L(N(F))$	$\displaystyle = L(F)$

$\displaystyle L(N(E+F))$	$\displaystyle = L(N(E)) \cup L(N(F))$
	$\displaystyle = L(E) \cup L(F)$
	$\displaystyle = L(E+F)$

$\displaystyle \delta_{EF}((0,q),x)$	$\displaystyle = \{ (0,q') \mid q' \in \delta_E(q,x) \}$
	$\displaystyle \quad \cup \{ (1,q') \mid q\in F_E \wedge \exists q''\in S_F.q'\in \delta_F(q'',x) \}$
$\displaystyle \delta_{EF}((1,q))$	$\displaystyle = \{ (1,q') \mid q' \in \delta_F(q)) \}$

$\displaystyle L(N(EF))$	$\displaystyle = \{ uv \mid u\in L(N(E)) \wedge v\in L(N(F))$	$\displaystyle = \{ uv \mid u\in L(E) \wedge v\in L(F)$
	$\displaystyle = L(EF)$

$\displaystyle \delta_{E^*}((0,q),x) =$	$\displaystyle \{ (0,q') \mid q' \in \delta_E(q,x) \}$
	$\displaystyle \cup (0,q') \mid \exists \delta_E(q,x)\cap F_E \not= \emptyset \wedge q' \in S_E \}$

$\displaystyle L(N(E^*))$	$\displaystyle = \{ w_0w_1\dots w_{n-1} \mid n\in\mathbb{N}\wedge \forall i<n. w_i \in L(N(E)) \}$
	$\displaystyle = L(N(E))^*$
	$\displaystyle = L(E)^*$
	$\displaystyle = L(E^*)$