The idea behind this is to parse unary/binary operators into function calls with 1/2 operands. So the AST actually has a FunctionCall with the name "+". Function names may now be quoted operators, and thus you can also have function declarations with names such as "+". Resolving is *not* done in the parser for these function names, but rather every "+" is left as "+" (no matter what types it operates on, or what is in scope) by the parser (see later patches to InferTypes instead).
When parsing an occam source file, we automatically insert a bunch of PRAGMA TOCKEXTERNAL that define the default occam operators (e.g. + on INT) as external C functions (which they are!). The naming scheme for these C functions is standardised, and must be used by functions such as mulExprs (which bases the function on the type of its operands) and the new versions mulExprsInt (which are pegged to INT).
The Types module also has some new functions for dealing with operator-functions.
This may seem like an odd change, but it simplifies the logic a lot. I kept having problems with passes not operating on externals (e.g. functions-to-procs, adding array sizes, constant folding in array dimensions) and adding a special case every time to also process the externals was getting silly.
Putting the externals in the AST therefore made sense, but I didn't want to just add dummy bodies as this would cause them to throw up errors (e.g. in the type-checking for functions). So I turned the bodies into a Maybe type, and that has worked out well.
I also stopped storing the formals in csExternals (since they are now in csNames, and the tree), which streamlined that nicely, and stopped me having to keep them up to date.
There was a bug where things scoped in via pragmas were never scoped out again, which was screwing up the local names stack. I then realised/decided that pragmas were really specifications, and decided to put them there in the parser.
The rest of this patch is just some rewiring to allow the special name munging involved in pragmas (they have already got a munged version of their name) and to stop the scoped in pragmas appearing in the AST.
The second part of the patch is essential, given the first. Otherwise names in different pragmas in the same file can overlap -- this already happened in oak!
One change, based on Adam's suggestion, was to rename the pragma to TOCKEXTERNAL.
Another, also based on Adam's suggestion, was to generate both the munged name and the original name, which allows (along with a previous patch) different files to declare the same PROC, and will remove the need for the occam_ prefix in the backend.
I also stopped using specific states in the lexer, in favour of just using the normal lexing function (which has had its type generalised slightly).
This will allow (along with a few patches in a minute) different occam files to declare the same PROC, and have it resolved correctly based on the order of their declaration, just like if it was all in one file.
The solution is a bit hacky, but this was an important problem. If your PRAGMA failed to parse, that was worthy of a warning. But if that then caused the parse to fail, all you would get is the parser error (could not find name), and you would never see the warnings about the pragmas not being recognised. So now the pragmas are shoved into the error (using a basic encoding) and pulled out and issued if the parser dies.
The separately compiled occam PROCs now use #PRAGMA OCCAMEXTERNAL, which also discards the "= number" thing at the end. These PROCs then need to be processed differently when adding on the sizes (C externals have one size per dimension, occam externals have the normal array of sizes).
We also now record which processes were originally at the top-level, and keep their original names (i.e. minus the _u43 suffixes) plus an "occam_" prefix to avoid collisions.
With my previous change to PRAGMAs, unknown pragmas would fatally fail in the lexer, so that an unknown pragma would always stop compilation, which is not good. I've changed it more towards Adam's suggestion of re-lexing and re-parsing the pragma from the parser, so we now gracefully ignore unknown pragmas again. The lexer is a bit messy, though.
At the moment, the information is only needed in the parser, which must define recursive names before parsing the body of the function. But in future, we should keep the information when the function becomes a proc, and then the C/C++ backends may need to use it (for example, when calculating stack space usage)
For now, I have fixed the occam parser so that it allows 1 or more direction specifiers after channel names. So c?? is valid, and should end up being equivalent to c?, but this may need altering later.
This lets you write things like "[cs! FOR 5]", which is horrible; I
would prefer "[cs FOR 5]!", since then that doesn't imply that you can
do things like "cs![0] ! 0".
However, Tock now compiles and passes cgtest87 -- the first occam-pi
cgtest we've handled. :)