next up previous
Next: Summing up ... Up: Regular expressions Previous: The meaning of regular


Translating regular expressions to NFAs

Theorem 3.1   For each regular expression $ E$ we can construct ab NFA $ N(E)$ s.t. $ L(N(E)) = L(E)$, i.e. the automaton accepts the language described by the regular expression.

Proof:

We do this again by induction on the syntax of regular expressions:

  1. $ N(\emptyset)$:

    \epsfbox{re-empty.eps}

    which will reject everything (it has got no final states) and hence

    $\displaystyle L(N(\emptyset))$ $\displaystyle = \emptyset$    
      $\displaystyle = L(\emptyset)$    

  2. $ N(\epsilon)$:

    \epsfbox{re-epsilon.eps}

    This automaton accepts the empty word but rejects everything else, hence:

    $\displaystyle L(N(\epsilon))$ $\displaystyle = \{\epsilon\}$    
      $\displaystyle = L(\epsilon)$    

  3. $ N(\textbf{x})$:

    \epsfbox{re-x.eps}

    This automaton only accepts the word x, hence:

    $\displaystyle L(N(\textbf{x}))$ $\displaystyle = \{ \textrm{x} \}$    
      $\displaystyle = L(\textbf{x})$    

  4. $ N(E+F)$:

    We merge the diagrams for $ N(E)$ and $ N(F)$ into one:

    \epsfbox{re-plus.eps}

    I.e. given

    $\displaystyle N(E)$ $\displaystyle = (Q_E,\Sigma,\delta_E,S_E,F_E)$    
    $\displaystyle N(F)$ $\displaystyle = (Q_F,\Sigma,\delta_F,S_F,F_F)$    

    Now we use the disjoint union opration on sets (hope you still remember from MCS), otherwise see pp.55 in the G51MCS notes.

    $\displaystyle Q_{E+F}$ $\displaystyle = Q_E + Q_F$    
    $\displaystyle \delta_{E+F}((0,q),x)$ $\displaystyle = \{ (0,q') \mid q' \in \delta_E(q,x) \}$    
    $\displaystyle \delta_{E+F}((1,q)),x$ $\displaystyle = \{ (1,q') \mid q' \in \delta_F(q,x) \}$    
    $\displaystyle S_{E+F}$ $\displaystyle = S_E + S_F$    
    $\displaystyle F_{E+F}$ $\displaystyle = F_E + F_F$    
    $\displaystyle N(E+F)$ $\displaystyle = (Q_{E+F},\Sigma,\delta_{E+F},S_{E+F},F_{E+F})$    

    The disjoint union just signals that we are not going to identify states, even if they accidently happen to have the same name.

    Just thinking of the game with markers you should be able to convince yourself that

    $\displaystyle L(N(E+F)) = L(N(E)) \cup L(N(F)) $

    Moreover to show that

    $\displaystyle L(N(E+F)) = L(E+F) $

    we are allowed to assume that

    $\displaystyle L(N(E))$ $\displaystyle = L(E)$    
    $\displaystyle L(N(F))$ $\displaystyle = L(F)$    

    that's what is meant by induction over the syntax of regular expressions.

    Now putting everything together:

    $\displaystyle L(N(E+F))$ $\displaystyle = L(N(E)) \cup L(N(F))$    
      $\displaystyle = L(E) \cup L(F)$    
      $\displaystyle = L(E+F)$    

  5. $ N(EF)$:

    We want to put the two automata $ N(E)$ and $ N(F)$ in series. We do this by connecting the final states of $ N(E)$ with the initial states of $ N(F)$ in a way explained below.

    \epsfbox{re-conc.eps}

    In this diagram I only depicted one initial and one final state of each of the automata although they may be several of them.

    Here is how we construct $ N(EF)$ from $ N(E)$ and $ N(F)$:

    $\displaystyle N(E)$ $\displaystyle = (Q_E,\Sigma,\delta_E,S_E,F_E)$    
    $\displaystyle N(F)$ $\displaystyle = (Q_F,\Sigma,\delta_F,S_F,F_F)$    

    We now set

    $\displaystyle N(EF) = (Q_{EF},\Sigma,\delta_{EF},S_{EF},Z_{EF}) $

    I hope that you are able to convince yourself that

    $\displaystyle L(N(EF)) = \{ uv \mid u\in L(N(E)) \wedge v\in L(N(F)) $

    and hence we can reason

    $\displaystyle L(N(EF))$ $\displaystyle = \{ uv \mid u\in L(N(E)) \wedge v\in L(N(F))$ $\displaystyle = \{ uv \mid u\in L(E) \wedge v\in L(F)$    
      $\displaystyle = L(EF)$    

  6. $ N(E^*)$:

    We construct $ N(E^*)$ from $ N(E)$ by merging initial and final states of $ N(E)$ in a way similar to the previous construction and we add a new state $ *$ which is initial and final.

    \epsfbox{re-star.eps}

    Given

    $\displaystyle N(E)$ $\displaystyle = (Q_E,\Sigma,\delta_E,S_E,F_E)$    

    we construct $ N(E^*)$.

    We define

    $\displaystyle N(E^*) = (Q_{E^*},\Sigma,\delta_{E^*},S_{E^*},F_{E^*}) $

    We claim that

    $\displaystyle L(N(E^*)) = \{ w_0w_1\dots w_{n-1} \mid n\in\mathbb{N}\wedge \forall i<n.
w_i \in L(N(E)) \} $

    since we can run through the automaton an arbitrary number of times. The new state $ *$ allows us also to accept the empty sequence. Hence:

    $\displaystyle L(N(E^*))$ $\displaystyle = \{ w_0w_1\dots w_{n-1} \mid n\in\mathbb{N}\wedge \forall i<n. w_i \in L(N(E)) \}$    
      $\displaystyle = L(N(E))^*$    
      $\displaystyle = L(E)^*$    
      $\displaystyle = L(E^*)$    

  7. $ N(\textbf{(}E\textbf{)}) = N(E)$

    I.e. using brackets does not change anything.

$ \Box$

As an example we construct $ N(\textrm{a}^*\textrm{b}^*)$. First we construct $ N(\textbf{a})$:

\epsfbox{na.eps}

Now we have to apply the $ *$-construction and we obtain:

\epsfbox{nastar.eps}

$ N(b^*)$ is just the same and we get

\epsfbox{nastarbstar.eps}

and now we have to serialize the two automata and we get:

\epsfbox{nastarconcbstar.eps}

Now, you may observe that this automaton, though correct, is unnecessary complicated, since we could have just used

\epsfbox{nastarbstarx.eps}

However, we shall not be concerned with minimality at the moment.


next up previous
Next: Summing up ... Up: Regular expressions Previous: The meaning of regular
Thorsten Altenkirch 2001-05-08