The second part of the patch is essential, given the first. Otherwise names in different pragmas in the same file can overlap -- this already happened in oak!
I must admit, this was mainly done to allow munged names back in again as valid identifiers.
OEP 144 suggests replacing dot with underscore; this change just allows underscore alongside dot. It won't break any existing code, and seems like something we want anyway, so I think it's a valid thing to do.
One change, based on Adam's suggestion, was to rename the pragma to TOCKEXTERNAL.
Another, also based on Adam's suggestion, was to generate both the munged name and the original name, which allows (along with a previous patch) different files to declare the same PROC, and will remove the need for the occam_ prefix in the backend.
I also stopped using specific states in the lexer, in favour of just using the normal lexing function (which has had its type generalised slightly).
The separately compiled occam PROCs now use #PRAGMA OCCAMEXTERNAL, which also discards the "= number" thing at the end. These PROCs then need to be processed differently when adding on the sizes (C externals have one size per dimension, occam externals have the normal array of sizes).
We also now record which processes were originally at the top-level, and keep their original names (i.e. minus the _u43 suffixes) plus an "occam_" prefix to avoid collisions.
With my previous change to PRAGMAs, unknown pragmas would fatally fail in the lexer, so that an unknown pragma would always stop compilation, which is not good. I've changed it more towards Adam's suggestion of re-lexing and re-parsing the pragma from the parser, so we now gracefully ignore unknown pragmas again. The lexer is a bit messy, though.
At the moment, the information is only needed in the parser, which must define recursive names before parsing the body of the function. But in future, we should keep the information when the function becomes a proc, and then the C/C++ backends may need to use it (for example, when calculating stack space usage)
Previously it was a tuple, which meant it couldn't have sensible
custom instances. Token and TokenType now have Show instances, so we
get more useful output when parsing fails.
The order of initial passes is now:
lex -> preprocess -> structure -> expand-include -> parse
which means that #IFing out structurally-invalid code (like inline VALOF) now
works. This also cleans up the preprocessor code a bit.
This implements #DEFINE, #UNDEF, #IF, #ELSE and #ENDIF, macro expansion with
##, and TRUE, FALSE, AND, OR, NOT and DEFINED within #IF expressions, with the
same semantics as occ21.
The macro COMPILER.TOCK is always defined by default, so you can now say things
like "#IF NOT DEFINED (COMPILER.TOCK) ... #ENDIF".