Summary:
The ability to use pre-determined character widths will benefit alternative
layout engines such as gagern's canvas layout engine. I would also like to
experiment would using CSS transforms to absolutely position each glyph. This
diff adds a new make rule, make extended_metrics, which generates metrics that
also containing glyph widths.
Test Plan:
- run `make extended_metrics`
- verify that fontMetricsData.js contains entries with 5 numbers instead of 4
Reviewers: emily alpert
There are two main motivations for this commit. One is unicode input, which
requires unicode characters to get past the lexer. See discussion in #261.
The second is in preparation for #266, where we'd deal with one token of
look-ahead but might be lexing that token in an unknown mode in some cases.
The unit test shipped with this commit addresses the latter concern, since
it checks that a math-mode-only token may immediately follow some text mode
content group.
In this new implementation, all the various things that could get matched
have been collected into a single regular expression. The hope is that
this will be beneficial for performance and keep the code simpler.
The code was written with Unicode input in mind, including non-BMP codepoints.
The role of the lexer as a gate keeper, keeping out invalid TeX syntax, has
been abandoned. That role is still fulfilled by the symbols and functions
tables, though, since any input which is neither a symbol nor a command is
still considered invalid input, even though it lexes successfully.
Fixes issue #255.
Mixing the variable number of arguments a function receives from TeX code
with the fixed arguments which the parser provides can cause some confusion.
After this change, a handler will receive exactly two arguments: one is a
context object from which things provided by the parser can be accessed by
name, which allows for simple extensions in the future. The other is the
list of TeX arguments, passed as an array.
If we ever switch to EcmaScript 2015, we might want to use its destructuring
features to name the elements of the args array in the function head. Until
then, destructuring that array manually immediately at the beginning of the
function seems like a useful convention to easily find the meaning of these
arguments.