In (standard) Haskell you do not update values as you would in an imperative language. You instead create a new variable and instantiate it to the value of the old variable, with any modifications. Consider the following Java and Haskell programs:
---------------------------------------------------------------- class Person { public int age; public String name; public Person (int a, String n) { age = a; name = n; } } class AgeUpdater { public static void main(String args[]) { Person henrik = new Person(25, "Henrik"); System.out.println(ageDif(henrik)); } private static int ageDif(Person p) { Person oldHenrik = new Person(p.age = 29, p.name); int d = oldHenrik.age - p.age; return d; } } ---------------------------------------------------------------- module AgeUpdater where data Person = MkPerson {age :: Int, name :: String} main :: IO () main = let henrik = MkPerson {age = 25, name = "Henrik"} in putStrLn (show (ageDif henrik)) ageDif :: Person -> Int ageDif p = let oldHenrik = p {age = 29} d = age oldHenrik - age p in d ----------------------------------------------------------------
Both programs appear the same, but they give different answers. The reason for this is that the Java program destructively updates the value of the "age" field in "henrik", whereas the Haskell program only modifies it for that expression (in the same way that typing "y = x + 1" (in either language) has no effect on the value of x). So in the Java program, "p.age" evaluates to 29, whereas in the Haskell program, "age p" evaluates to 25.
What actually happens behind the scenes is that
gets translated tolet oldHenrik = p {age = 29}
wherelet oldHenrik = f p
Back to topf Person{name = n} = Person{name = n, age = 29}
x is just an arbitrary variable name, while filepath needs to be the file path of the file you wish to read in. If you are in the same directory as the target file, then filepath is just the name of the file. The contents of the file are read in and saved as a string, which can be accessed under the variable name given (in this case x). For example, if you have a TXL file called test.txl in the current directory, containing the codePrelude> x <- readFile "filepath"
then you could typelet x = 'a' + '\r' in x + 1
to instantiate the value of test1 to the contents of the file. You may now use test1 in future statements, such asPrelude> test1 <- readFile "test.txl"
For those of you taking the advanced functional programming module, note that readFile is a function mapping a String to a computation in the IO monad that produces a string.Prelude> :t test1 test1 :: String Prelude> test1 "let x = 'a' + '\\r' in x + 1" Prelude> head test1 'l'
Have a look at the function "main" in the source code and see if you can understand it.type FilePath = String readFile :: FilePath -> IO String
The School's standard late submission policy applies to the coursework. Thus a late submission will be accepted, but earns you less marks. Also, in any case, you need to be ready in time for your oral examination slots.
A → X | A Y
where X and Y stand for any sequence of terminals and/or non-terminals that do not contain A. For example
Commands → Command | Commands ; Command
which follows the pattern where
RewriteA = Commands X = Command Y = ; Command
toA → X | A Y
In our Commands example we would obtainA → X Y*
Commands → Command (; Commands)*
There is quite a bit of confusion about adding character literals to HMTC, so here's some clarifications:
Bear in mind the different representations of escape sequences in the various environments. In the .mt file, there are already newline characters but your text editor interprets them and so displays following code on the next line, rather than actually showing a newline character. Haskell on the other hand does not do this, but instead represents newline characters as '\n'. Now, the initial scanner you are given already throws away any newline characters, considering them white space. What you are asked to extend the scanner for are sequences of characters which your scanner will then interpret as character literals, some of which you will represent as escape characters. Thus, the sequence of four characters '\n' needs to be read by the scanner, and then stored in a character literal token with the escape sequence \n as the associated character. Note that when you read in single quotes or slashes into Haskell, it represents them with its own escape sequences (which happen, somewhat confusingly, to be the same as the ones you're scanning for). So reading in the sequence of four characters ' \ n ' actually gives you this list of characters: ['\'', '\\', 'n', '\''].
However, Haskell formats lists so as to make them more readable. Thus, rather than displaying 'h' : 'e' : 'l' : 'l' : 'o' : [] it displays "hello". Perhaps unfortunately, one other formatting trick it performs it to display any '\'' characters as just a single quote character (not as an escape sequence). This can be done without creating ambiguity, as its known we're inside a string. Thus, 'i' : 't' : '\'' : 's' : [] is displayed as "it's", not as "it\'s" as you may expect. And this happens in reverse as well. However, the escape sequence for a single quote also works as normal inside a string, meaning there are two ways to represent the same character within a string. Thus, "aaa\'bbb" == "aaa'bbb". Try typing
andhead "'twas"
into Haskell.head "\'twas"
So, now consider the earlier example of ' \ n ' which is represented as the Haskell list ['\'', '\\', 'n', '\'']. When displayed to the screen as a string, this would look like "'\\n'". The single quotes are not displayed as escape sequences, while the backslash still is.
You may find it easier to test the scanner by writing HMTC programs in their own files, and then reading the files into GHCi (see above), rather than writing them directly into GHCi, as there is a lot of potential to make an error with the escape sequences.
A student once put forth this relevant question:
Looking at MiniTriangle's Abstract Syntax ... my assumption is that there should be a line reading
| begin Command end BeginCommand
Abstract syntax describes only the relevant aspects of syntax. For example, we do not care whether a variable declaration starts with an identifier and is followed by a type-denoter (as in MiniTriangle) or the other way around (as in Java). Abstract syntax only cares about what components make up a language construct, not about their order or the tokens in between, like punctuation tokens, that only play their role during parsing.
Nonetheless, abstract syntax must be specified in some way. If we choose to write them down on a line we need to pick an order, but such order is just for presentation. For instance, instead of writing the abstract syntax of a MiniTriangle Declaration in grammatical form as in
we could have used a tree representation with Declaration at the root node and two sub-nodes Identifier and Type-denoter. The following two trees would be valid representations:Declaration → Identifier Type-denoter
In the coursework and the book, MiniTriangle's abstract syntax is written as a grammar. In the book the authors have also included the tokens in the specification of the abstract syntax, which is a bit confusing. The reason is that in constructs like Command, where there are two different types of command with Identifier and Expression components, we can't just writeDeclaration Declaration / \ / \ / \ / \ / \ / \ Identifier Type-Denoter Type-Denoter Identifier
for we cannot distinguish between an assignment or a call command. The tokens aid in the distinction and that's why they are shown. The provided UML diagrams for the abstract syntax trees should clear the confusion.Command → Identifier Expression
which may strike you in the sense that it seems ridiculous to have a program composed of a single statement, but if you look at the productions for Command, you find the two interesting cases:Program → Command
which allow you to write more stuff as part of the program. Hence, if we want to write a short program in MiniTriangle that prints two integers, according to this grammar we cannot writeCommand → ... | let Declaration in Command ... | begin Commands end
but instead we must writeputint(1); putint(2)
which makes perhaps more sense.begin putint(1); putint(2); end
We can have more than one command inside a let command by using the same device:
The Parser code contains a few comments on this matter.let var m := integer in begin putint(1); m := 1; putint(m); end
Notice that, although expressions like x>y>4 do not usually make sense (indeed, x>y returns a boolean value and in most languages you cannot compare a boolean to an integer), they are valid syntactically. They should get parsed with the right precedence, e.g. ((x>y)>4).
It is the Checker, not the Parser, that invalidates such expressions. Notice that expressions like
5 > 3 == (7 > 2)
which should be parsed as (5 > 3) == (7 > 2), also have three relational operators but do make sense.
If we don't allow more than one relational operator in an expression, the only way to handle expressions like the previous one in the parsing phase is to distinguish between "<" and ">" on one side and "==" on the other (modifying the grammar).
In MiniTriangle, operator "==" cannot be used associatively as in "5 == 3 == 2" (the C language allows this). It can only be used conjunctively as in "5 == 3 and 3 == 2" which has a completely different meaning. Indeed, you would be overloading "=" as an operator that can be applied to booleans ("false == true", yields false) and integers ("7 == 8" yields false) which is not the case in our MiniTriangle language.
An easier solution is to allow all relational operators to be used associatively in the parsing phase and leave the job of checking their validity to the type-checker (contextual analysis).
This is as it should be.
Typing rules (except in extremely trivial cases) are examples of contextual constraints: loosely speaking, one needs to take the context (such as an environment that lists all variables that are in scope at some particular point in a program and their types) in order to decide if the constraint is met or not.
A context-free grammar cannot express such rules. Hence the name! And if you look at a production in a context free grammar, for example,
A → aBc
it is easy to see why this is so: the production says that the non-terminal A can be replaced by aBc in any string of grammar symbols, regardless of the context (surrounding symbols) in that string.
You will also note that the grammar as given also allows ill-typed expressions. For example, the following is a syntactically correct MiniTriangle expression,
1 + (2 < 3)
but it is not type correct.
And the reason is that the MiniTriangle grammar makes no attempt to capture any of the typing rules as that isn't possible to do using a context-free grammar for a language like MiniTriangle. MiniTriangle may be small and simple, but it exhibits most of the characteristics of a "real" programming language.
You might think you could create separate productions for the binary and arithmetic operators, and indeed it very instructive to try. If you do, you will quickly and concretely experience why the formalism of context-free grammar simply isn't powerful enough to express what you need to express.
Thus to express the typing rules we need something else. And for this, the MiniTriangle uses typing rules expressed in logic.
When you need to extended the Happy parser, you need to change the Happy source code Parser.y and re-generate Parser.hs from that. (The build-system should do this for you: just invoke "gmake".)
This can be generalised: Never edit generated code unless it is absolutely unavoidable! If you find yourself thinking about editing generated code, then think again!
As always, there are exceptions. Some tools do generate code that is intended to be edited. Usually some kind of "boiler-plate code" generator that aims at saving work by generating tedious, repetitive stub code from some kind of specification, allowing the programmer to then add the interesting missing bits. This isn't a great design in my opinion, as this does not address the need to save work when maintaining code, only to ease writing the code the first time round.
However, some really sophisticated such tools may even be able to re-generate the boiler plate code and add any user-added bits from the previously generated and edited version back in.
That's a better design, but complicated. It would seem to be a better idea to embed code fragments in a specification in Yacc/Bison/Happy style, and generate everything from there. A single spec/source file is much simpler and easier to maintain.
Finally, in some unfortunate circumstances, it simply may not be possible to re-generate code. Such as if the original source is gone, or the tool used to generate code can no longer be run for whatever reason. Then one does what one has to, if starting over from scratch isn't an option.
cond :: Bool -> a -> a -> a cond b a1 a2 = if b then a1 else a2
(Haskell only supports binary operators, though, not trinary ones, hence the name "cond" instead of "?:".)
The answer is yes, in principle, but:
"cond" only works as a conditional because Haskell is a non-strict ("lazy") language! MiniTriangle is intended to be strict (because it is an imperative language, which does not blend well with laziness).
So, if :? indeed was truly handled just like a function in MiniTriangle, it wouldn't work when we try to run programs using it!
Of course, this point is kind of moot at present, since there isn't a MiniTriangle code generator, and hence we cannot run MiniTriangle programs. But nevertheless, the issue is there.
f :: [Int] -> Int f xs = cond (xs == []) 0 (head xs)
This works as intended in Haskell. If the list "xs" is empty, the expression "head xs" will not be evaluated thanks to the non-strict semantics of Haskell, and thus the result of the conditional expression is 0.
Note that a conditional expression is not just about selecting one of two values, but also about making sure only one of two expressions gets evaluated.
In a strict language, on the other hand, all three arguments to "cond" would be evaluated before the function is called. Thus "head xs" will be evaluated regardless of whether the list "xs" is empty or not, and thus, in case "xs" is empty, the result of the conditional expression is not 0, but that the program stops and reports an error.
Try to implement something like "cond" in a strict language like Java, and you'll see what I mean. It's quite instructive!
Strictly speaking: no type! The type system can only be used to infer types for well-typed expressions.
However, I suppose the question really is: "What should the compiler do in the event that it discovers this particular type error?"
There are a number of possible answers to that question. The approach that is is used in the MiniTriangle compiler is that the type checker should first report any errors, and then try to "recover" so that further type errors may be found and reported. As opposed to just finding and reporting a single error at a time.
So, how can we recover? Three obvious alternatives suggest themselves:
The question here, then, is what would work best in practise. That's not easy to say for sure. But at least we can say that in the first two cases, there are some source programs for which either guess is going to be the wrong guess. That would likely lead to reporting false type errors: that is, type errors that are a mere consequence of earlier type errors. This could be confusing for a user, and one can thus argue that this should be avoided.
The type "SomeType" on the other hand, means "the type should be some type (all expressions have a type in a type correct program), but unclear which one". Thus, "SomeType" is compatible with any other type! Therefore, if "SomeType" is returned, the compiler will not report a later error simply because we guessed wrong as to the type of the conditional.
On the other hand, being this forgiving, might mean that we miss a genuine type error this time round. That's not a big problem, though, as it would be found once the first error has been corrected and the source program is recompiled.
So, pick the approach you think seems best, and explain why you think so!
%prec is used at the end of some of the productions of the grammar to set the precedence and associativity of those productions.
If you have a production that explicitly mentions X, like
then you don't need the %prec bit. So| foo X fum
would be fine as it is. However, in the actual production '+' is not mentioned explicitly (because there are often multiple productions, and essentially repeating the production for each one individually would be tedious and bad style). Instead the permissible operators are given by the production for, in this case, opclass_additive. But Happy does not attempt to figure out what operators opclass_additive might stand for in order to figure out what the precedence should be (as it would not work in general). So instead the grammar writer has to explicitly state the desired precedence in a case like this.| expression '+' expression
The actual precedences (and associativity) are set in a list of declarations elsewhere in the Happy code. The order of the declarations sets the precedence, and the left, right and nonassoc keywords set the associativity. In the Happy code for the MiniTriangle parser, this looks like:| expression opclass_additive expression %prec '+'
%left '||' %left '&&' %nonassoc '<' '<=' '==' '!=' '>=' '>' %left '+' '-' %left '*' '/' %right '^'
and you typecheck it with ./hmtc --print-after-checking, the output islet var v : Integer in
CmdLet <line 1, column 1> DeclVar <line 1, column 5> "v@1" : Ref Integer
What this means is that v is declared to be a variable.
In an imperative language, a variable is essentially a memory cell into which a piece of data can be written or from which a piece of data can be read. In the case of v, the type of the data is declared to be Integer. Thus, the way MiniTriangle is set up, v is a reference to (a memory cell that can hold an) Integer.
On the other hand, a literal like 1 has type Integer because it is an integer.
Thus
makes sense, because it means "write the integer 1 into the cell capable of holding an integer that v refers to".v := 1
v + 1
also makes sense because it means "read whatever integer that is stored in the cell that v refers to and add the integer 1 to it, producing a new integer".
But
1 := v
does not make sense because it is not possible to overwrite an integer!
Similarly, you will also find that there are notions of "Source" and "Sink" in MiniTriangle. These are references that only has, respectively, "read-only" and "write only" capabilities, while a reference has both.
The problem could be that tuck and your personal computer have different versions of Happy and GHC installed.
If you've already compiled HMTC on your personal computer, then if you copy the entire directory to tuck you also copy the Happy-generated "Parser.hs" file. Even if you "gmake clean", this "Parser.hs" file is not deleted. When you try to recompile, this existing "Parser.hs" file may be used, rather than being regenerated.
The solution is to delete the "Parser.hs" file before recompiling.
The main difference between the two is that the IR contains type information whereas the AST does not. The purpose of the type checker is to convert an AST into an IR, rejecting ill-typed programs.
Additionally, identifiers in the AST have been replaced by symbols in the IR. These symbols carry semantic information such as: type, scope level, internal or external, and the value of the symbol if external.
Generally, MTIR is an abstract syntax tree annotated by semantic information. The module documentation says:
Module: MTIR Purpose: MiniTriangle Internal Representation (typechecked AST with semantical annotations) -- MiniTriangle Internal Representation. The definitions mirror the -- AST, but the program should be type correct at this stage and semantical -- annotations have been added. In particular, variables are represented -- by symbols (with name, type info, and source position) as opposed to just -- their names.
The most common mistakes with Part I of the coursework last year stemmed from failing pattern match on lists. In particular, while some pattern matched on the head and tail of a list, very few pattern matched to greater depth. In Haskell you can pattern match to arbitrary depth on any datatype (though only on the constructors of that datatype).
Basic examples of pattern matching on lists:
But you can pattern match deeper if you want:head :: [a] -> a head (a : _) = a tail :: [a] -> [a] tail (_ : as) = as (++) :: [a] -> [a] -> [a] [] ++ bs = bs (a : as) ++ bs = a : (as ++ bs)
You can also pattern match on the structure of the elements within the list:secondelement :: [a] -> a secondelement (_ : a2 : _) = a2
There's a guide to pattern matching on the Haskell Wiki that you may find useful.iffy :: [Bool] -> Bool iffy (True : b1 : _) = b1 iffy (False : _ : b2 : _) = b2 hasPrefixABC :: [Char] -> Bool hasPrefixABC ('A' : 'B' : 'C' : cs) = True hasPrefixABC _ = False
People are often caught out by issues related to how different operating systems (like Windows and Unix) represent text files, and differences in how different tools (in particular editors, like Emacs and Wordpad) choose to interpret certain characters in text files.
Here is a couple of suggestions to help you get around these issues.
First of all, as you know, parsing of Haskell programs take layout (indentation) into account (unless you explicitly group things using curly braces and semicolons). If tab characters are used in a Haskell file, it thus become a critical question just how wide (in spaces) a tab stop is supposed to be. The Haskell language standard has a precise definition (to ensure that Haskell programs always are interpreted the same way): a tab stop is 8 spaces wide. This is also the default in many (most?) text editors, like Emacs.
However, for example Wordpad, which is a popular text editor for Windows users at least, seems to have a different idea about the default: it opts for tab stops being 4 spaces wide.
To avoid unnecessary grief caused by these seemingly inexplicable parse errors, I recommend that you, when working on the G53CMP labs, go into the Wordpad settings menu and change the width of tabs to 8, and that you also tick the box "treat tabs like spaces".
The second issue is the difference between Windows and Unix (and hence also Linux, Solaris, Mac OS) when it comes to how end of lines are encoded in text files. The Unix convention is to use a single new line character (NL, ASCII/UNICODE character 10, denoted by the escape sequence "\n" in languages like C, Java, Haskell). The Windows convention is to use a carriage return (CR, ASCII/UNICODE character 13, denoted by the escape sequence "\r" in languages like C, Java, Haskell), followed by NL (see en.wikipedia.org/wiki/Newline).
This can cause issues with the MiniTriangle scanner if the input to the MiniTriangle compiler are files that were created under Windows: the scanner might complain about "illegal character '\r'".
There are a number of ways around this. One is to create any test files under Linux. Another is to work exclusively within GHCi and provide the input to the scanner as a Haskell string.
Yet another is to use a tool that translates between the two different conventions. For example, the program "dos2unix" converts from Windows conventions to Unix conventions. The dos2unix command takes two arguments: the original filename and the filename to save the converted file as.
dos2unix inputFile outputFile
However, as you have the source code for the scanner, you can do one better and simply fix the problem once and for all by extending it a little bit. The idea is to essentially make it ignore CR characters. Here is how. The interpretation of the carriage return character is that it returns the printing position to the beginning of the line (column 1), but does not advance the position to the next line. So we should to the same thing when we keep track of the source code position within the scanner. Here's the extra line you need to add to the scanner, in context:
scan l c ('\n' : s) = scan (l + 1) 1 s scan l c ('\r' : s) = scan l 1 s scan l c ('\t' : s) = scan l (nextTabStop c) s
Last updated 3rd October 2014.