The previous method, using the C preprocessor was both nasty, and crazily resource-intensive. The new method stores stack size information in files that are read in and processed by the compiler when it comes time to link.
They are now recursed into with no type context, then afterwards the type is deduced. This seems to be how they were meant to work, and is also much faster than what I was doing.
This patch follows on from the previous change to the parser. When it spots a function-call, it looks for operators and treats them differently.
It keeps a stack of operators in scope (csOperators in CompState), and when an operator is used, it searches the stack (with all old definitions masked out) for operator definitions to resolve to.
The way it chooses which operator to use in the presence of overloadings (e.g. + on INT vs + on INT32) is simply to try them all. If one matches, it uses that. If none, or more than one match, it gives an error. This makes the code simple and seems logical, but I'm not totally confident if this is the required behaviour for resolving overloaded operators.
The idea behind this is to parse unary/binary operators into function calls with 1/2 operands. So the AST actually has a FunctionCall with the name "+". Function names may now be quoted operators, and thus you can also have function declarations with names such as "+". Resolving is *not* done in the parser for these function names, but rather every "+" is left as "+" (no matter what types it operates on, or what is in scope) by the parser (see later patches to InferTypes instead).
When parsing an occam source file, we automatically insert a bunch of PRAGMA TOCKEXTERNAL that define the default occam operators (e.g. + on INT) as external C functions (which they are!). The naming scheme for these C functions is standardised, and must be used by functions such as mulExprs (which bases the function on the type of its operands) and the new versions mulExprsInt (which are pegged to INT).
The Types module also has some new functions for dealing with operator-functions.
This may seem like an odd change, but it simplifies the logic a lot. I kept having problems with passes not operating on externals (e.g. functions-to-procs, adding array sizes, constant folding in array dimensions) and adding a special case every time to also process the externals was getting silly.
Putting the externals in the AST therefore made sense, but I didn't want to just add dummy bodies as this would cause them to throw up errors (e.g. in the type-checking for functions). So I turned the bodies into a Maybe type, and that has worked out well.
I also stopped storing the formals in csExternals (since they are now in csNames, and the tree), which streamlined that nicely, and stopped me having to keep them up to date.
There was a bug where things scoped in via pragmas were never scoped out again, which was screwing up the local names stack. I then realised/decided that pragmas were really specifications, and decided to put them there in the parser.
The rest of this patch is just some rewiring to allow the special name munging involved in pragmas (they have already got a munged version of their name) and to stop the scoped in pragmas appearing in the AST.
The second part of the patch is essential, given the first. Otherwise names in different pragmas in the same file can overlap -- this already happened in oak!
I must admit, this was mainly done to allow munged names back in again as valid identifiers.
OEP 144 suggests replacing dot with underscore; this change just allows underscore alongside dot. It won't break any existing code, and seems like something we want anyway, so I think it's a valid thing to do.
One change, based on Adam's suggestion, was to rename the pragma to TOCKEXTERNAL.
Another, also based on Adam's suggestion, was to generate both the munged name and the original name, which allows (along with a previous patch) different files to declare the same PROC, and will remove the need for the occam_ prefix in the backend.
I also stopped using specific states in the lexer, in favour of just using the normal lexing function (which has had its type generalised slightly).
This will allow (along with a few patches in a minute) different occam files to declare the same PROC, and have it resolved correctly based on the order of their declaration, just like if it was all in one file.
The solution is a bit hacky, but this was an important problem. If your PRAGMA failed to parse, that was worthy of a warning. But if that then caused the parse to fail, all you would get is the parser error (could not find name), and you would never see the warnings about the pragmas not being recognised. So now the pragmas are shoved into the error (using a basic encoding) and pulled out and issued if the parser dies.
The separately compiled occam PROCs now use #PRAGMA OCCAMEXTERNAL, which also discards the "= number" thing at the end. These PROCs then need to be processed differently when adding on the sizes (C externals have one size per dimension, occam externals have the normal array of sizes).
We also now record which processes were originally at the top-level, and keep their original names (i.e. minus the _u43 suffixes) plus an "occam_" prefix to avoid collisions.
I changed a little bit of the code, but mainly the tests. Several of the remaining failures are actually real failures, so I need to dig through the rest carefully. A lot are failing because the C++ backend is broken.
With my previous change to PRAGMAs, unknown pragmas would fatally fail in the lexer, so that an unknown pragma would always stop compilation, which is not good. I've changed it more towards Adam's suggestion of re-lexing and re-parsing the pragma from the parser, so we now gracefully ignore unknown pragmas again. The lexer is a bit messy, though.