Next: Beyond LL(1) - use
Up: How to implement a
Previous: Constructing an LL(1) grammar
We can now implement a parser - one way would be to construct a
deterministic PDA. However, using JAVA we can implement the parser
using recursion - here the internal JAVA stack plays the role of the
stack of the PDA.
First of all we have to separate the input into tokens which
are the terminal symbols of our grammar. To keep things simple I
assume that tokens are separated by blanks, i.e. one has to type
( a + a ) * a
for (a+a)*a. This has the advantage that we can use
java.util.StringTokenizer. In a real implementation
tokenizing is usually done by using finite automata.
I don't want to get lost in java details - in the main program I read
a line and produce a tokenizer:
String line=in.readLine();
st = new StringTokenizer(line+" $");
The tokenizer st and the current token are static
variables. I implement the convenience method next which
assigns the next token to curr.
static StringTokenizer st;
static String curr;
static void next() {
try {
curr=st.nextToken().intern();
} catch( NoSuchElementException e) {
curr=null;
}
}
We also implement a convenience method error(String) to report
an error and terminate the program.
Now we can translate all productions into methods using the
sets to determine which alternative to choose.
E.g. we translate
into (using E1 for to follow JAVA rules):
static void parseE1() {
if (curr=="+") {
next();
parseT();
parseE1();
} else if(curr==")" || curr=="$" ) {
} else {
error("Unexpected :"+curr);
}
The basic idea is to
- Translate each occurrence of a non terminal symbol into a
test that this symbol has been read and a call of next().
- Translate each nonterminal symbol into a call of the method with
the same name.
- If you have to decide between different productions use the
lookahead sets to determine which one to use.
- If you find that there is no way to continue call
error().
We initiate the parsing process by calling next() to read the
first symbol and then call parseE(). If after processing
parseE() we are at the end marker, then the parsing has been
successful.
next();
parseE();
if(curr=="$") {
System.out.println("OK ");
} else {
error("End expected");
}
The complete parser can be found at
http://www.cs.nott.ac.uk/~txa/g51mal/ParseE0.java.
Actually, we can be a bit more realistic and turn the parser into a
simple evaluator by
- Replace a by any integer. I.e. we use
Integer.valueOf(curr).intValue();
to translate the current token into a number. JAVA will raise an
exception if this fails.
- Calculate the value of the expression read. I.e. we have to
change the method interfaces:
static int parseE()
static int parseE1(int x)
static int parseT()
static int parseT1(int x)
static int parseF()
The idea behind parseE1 and parseT1 is to pass the
result calculated so far and leave it to the method to incorporate
the missing part of the expression. I.e. in the case of
parseE1
static int parseE1(int x) {
if (curr=="+") {
next();
int y = parseT();
return parseE1(x+y);
} else if(curr==")" || curr=="$" ) {
return x;
} else {
error("Unexpected :"+curr);
return x;
}
}
Here is the complete program with evaluation
http://www.cs.nott.ac.uk/~txa/g51mal/ParseE.java.
We can try the program and observe that it handles precedence of
operators and brackets properly:
[txa@jacob misc]$ java ParseE
3 + 4 * 5
OK 23
[txa@jacob misc]$ java ParseE
( 3 + 4 ) * 5
OK 35
Next: Beyond LL(1) - use
Up: How to implement a
Previous: Constructing an LL(1) grammar
Thorsten Altenkirch
2001-05-08