G52MAL UNMC 2010/11: Notes on Coursework, Exercises Set 3
=========================================================
Marking Scheme
--------------
100 marks total
* 1: 45
- 1a: 30
- 1b: 15
* 2: 35
- 2a: 10
- 2b: 10
- 2c: 10
- 2d: 2
- 2d: 3
* 3: 20
Notes
-----
* 1a. I remind you that when doing the subset construction, the EMPTY
set of NFA states may well arise as one of the reachable DFA states.
Most people had done the subset construction itself properly,
including the EMPTY set which in this case is one of the reachable
DFA states, but a significant number of you had forgotten to include
this "dead state" in the transition diagram for the DFA.
As a result, the diagram actually is not a representation of a DFA at all as
there are some states for which there is no outgoing edges for certain
symbols of the alphabet. Recall that one of the criteria for being
a DFA is that there is *exactly* one possible transition for each
state and input symbol. So, if even a single transition is missing,
the machine depicted is NOT a DFA.
Another problem was that a number of people had carried out a flawed
variation on the subset construction by starting from states corresponding
to singleton sets of individual NFA start states.
The result if you do this correctly is some kind of overlay of DFAs
for three *different* languages, one for each possible NFA start
state. The relation between this automaton and the correct one is
not clear to me, but what *is* clear is that the resulting automaton
again is NOT a DFA as it has more than one start state, requiring
a *nondeterministic choice* to be made among these to get started.
The DFA that results from the subset construction has a *single*
start state which is the SET of NFA start states. If you carry out
the subset construction lazily, you need to start from this set.
If you made any of the above mistakes, you are advised to study the
model solutions very carefully.
* 2. Pretty good results overall. However, please note:
(ab)* /= (a*)(b*)
The meaning of the former is the set
{epsilon, ab, abab, ababab, abababab, ...}
The meaning of the latter is the set
{epsilon, a, b, aa, ab, bb, aaa, aab, abb, bbb,
aaaa, aaab, aabb, abbb, bbbb, aaaaa, aaaab, aaabb,
aabbb, abbbb, bbbbbb, ...}
Also note that, due to precedence conventions:
(ab)* /= ab*
as (ab* = a(b*)), while
(a*)(b*) = a*b*
* 3. This was probably the question that caused you most trouble.
You are advised to study the model solutions very carefully
and compare with your own.
A common problem was that many of the submitted solutions
were regular expressions that described a set of words that
satisfy the prescribed constraint, but failed to describe
*all possible* words over the given alphabet that satisfy
the constraint, which is what you were asked to do.
For example, consider a regular expression for "all words over
the alphabet {a,b,c} that contain at least one a". The
regular expression
a
denoting the language {a} is certainly a regular expression for
which all words in the denoted language satisfies the constraint
(trivially). However, as you can see, it denotes far from all
words satisfying the constraint. Missing words include aa, aaa,
aaaa, abc, cab, bac, and many many more.
Another problem was that many of the submitted solutions really
were way too complicated. For example, using
(a*b*c*)*
in place of the much clearer and simpler
(a+b+c)*
suggests that a bit more time should be spent on trying to
gain a better understanding of how regular expressions work
and what they mean.