Next: What are regular expressions? Up: Machines and their languages Previous: The subset construction

Regular expressions

Given an alphabet $\Sigma$ a language is a set of words $L\subseteq \Sigma^*$ . So far we were able to describe languages either by using set theory (i.e. enumeration or comprehension) or by an automaton. In this section we shall introduce regular expressions as an elegant and concise way to describe languages. We shall see that the languages definable by regular expressions are precisely the same as those accepted by deterministic or nondeterministic finite automata. These languages are called regular languages or (according to the Chomsky hierarchy) Type 3 languages.

As already mentioned in the introduction regular expressions are used to define patterns in programs such as grep. grep gets as an argument a regular expression and then filters out all those lines from a file which match the regular expression, where matching means that the line contains a substring which is in the language assigned to the regular expression. It is interesting to note that even in the case when we search for a specific word (this is a special case of a regular expresion) programs like grep are more efficient than a naive implementation of word search.

To find out more about grep have a look at the UNIX manual page and play around with grep. Note that the syntax grep uses is slightly different from the one we use here. grep also use some convenient shorthands which are not relevant for a theoretical analysis of regular expressions because they do not extend the class of languages.

Subsections

Next: What are regular expressions? Up: Machines and their languages Previous: The subset construction

Thorsten Altenkirch 2001-05-08