G52MAL UNMC 2010/11: Notes on Coursework, Exercises Set 3 ========================================================= Marking Scheme -------------- 100 marks total * 1: 45 - 1a: 30 - 1b: 15 * 2: 35 - 2a: 10 - 2b: 10 - 2c: 10 - 2d: 2 - 2d: 3 * 3: 20 Notes ----- * 1a. I remind you that when doing the subset construction, the EMPTY set of NFA states may well arise as one of the reachable DFA states. Most people had done the subset construction itself properly, including the EMPTY set which in this case is one of the reachable DFA states, but a significant number of you had forgotten to include this "dead state" in the transition diagram for the DFA. As a result, the diagram actually is not a representation of a DFA at all as there are some states for which there is no outgoing edges for certain symbols of the alphabet. Recall that one of the criteria for being a DFA is that there is *exactly* one possible transition for each state and input symbol. So, if even a single transition is missing, the machine depicted is NOT a DFA. Another problem was that a number of people had carried out a flawed variation on the subset construction by starting from states corresponding to singleton sets of individual NFA start states. The result if you do this correctly is some kind of overlay of DFAs for three *different* languages, one for each possible NFA start state. The relation between this automaton and the correct one is not clear to me, but what *is* clear is that the resulting automaton again is NOT a DFA as it has more than one start state, requiring a *nondeterministic choice* to be made among these to get started. The DFA that results from the subset construction has a *single* start state which is the SET of NFA start states. If you carry out the subset construction lazily, you need to start from this set. If you made any of the above mistakes, you are advised to study the model solutions very carefully. * 2. Pretty good results overall. However, please note: (ab)* /= (a*)(b*) The meaning of the former is the set {epsilon, ab, abab, ababab, abababab, ...} The meaning of the latter is the set {epsilon, a, b, aa, ab, bb, aaa, aab, abb, bbb, aaaa, aaab, aabb, abbb, bbbb, aaaaa, aaaab, aaabb, aabbb, abbbb, bbbbbb, ...} Also note that, due to precedence conventions: (ab)* /= ab* as (ab* = a(b*)), while (a*)(b*) = a*b* * 3. This was probably the question that caused you most trouble. You are advised to study the model solutions very carefully and compare with your own. A common problem was that many of the submitted solutions were regular expressions that described a set of words that satisfy the prescribed constraint, but failed to describe *all possible* words over the given alphabet that satisfy the constraint, which is what you were asked to do. For example, consider a regular expression for "all words over the alphabet {a,b,c} that contain at least one a". The regular expression a denoting the language {a} is certainly a regular expression for which all words in the denoted language satisfies the constraint (trivially). However, as you can see, it denotes far from all words satisfying the constraint. Missing words include aa, aaa, aaaa, abc, cab, bac, and many many more. Another problem was that many of the submitted solutions really were way too complicated. For example, using (a*b*c*)* in place of the much clearer and simpler (a+b+c)* suggests that a bit more time should be spent on trying to gain a better understanding of how regular expressions work and what they mean.