134 lines
5.9 KiB
Plaintext
134 lines
5.9 KiB
Plaintext
_combinator-parser_
|
|
|
|
This documentation provides directions on using the combinator parser library. It assumes familiarity with lexing and with combinator parsers.
|
|
|
|
_combinator-unit.ss_
|
|
This library provides a unit implementing four higher-order functions
|
|
that can be used to build a combinator parser, and the export and
|
|
import signatures related to it. The functions contained in this unit
|
|
automatically build error reporting mechanisms in the event that no parse
|
|
is found. Unlike other combinator parsers, this system assumes that the
|
|
input is already lexed into tokens using _lex.ss_. This library relies on
|
|
_(lib "lazy.ss" "lazy")_.
|
|
|
|
The unit _combinator-parser-tools_ exports the signature
|
|
_combinator-parser^_ and imports the signatures _error-format-parameters^_, _language-format-parameters^_, and _language-dictionary^_.
|
|
|
|
The signature combinator-parser^ references functions to build combinators,
|
|
a function to build a runable parser using a combinator, a structure for
|
|
recording errors and macro definitions to specify combinators with:
|
|
|
|
>(terminal predicate result name spell-check case-check type-check) ->
|
|
(list token) -> parser-result
|
|
The returned function accepts one terminal from a token stream, and
|
|
returns produces an opaque value that interacts with other combinators.
|
|
|
|
predicate: token -> boolean - check that the token is the expected one
|
|
result: token -> beta - create the ast node for this terminal
|
|
name: string - human-language name for this terminal
|
|
spell-check, case-check, type-check: (U bool (token -> bool))
|
|
optional arguments, default to #f, perform spell checking, case
|
|
checking, and kind checking on incorrect tokens
|
|
|
|
>(seq sequence result name) -> (list token) -> parser-result
|
|
The returned function accepts a term made up of a sequence of smaller
|
|
terms, and produces an opaque value that interacts with other
|
|
combinators.
|
|
|
|
sequence: (listof ((list token) -> parser-result)) - the subterms
|
|
result: (list alpha) -> beta - create the ast node for this sequence.
|
|
Input list matches length of sequence list
|
|
name: human-language name for this term
|
|
|
|
>(choice options name) -> (list token) -> parser-result
|
|
The returned function selects between different terms, and produces an
|
|
opaque value that interacts with other combinators
|
|
|
|
options: (listof ((list token) -> parser-result) - the possible terms
|
|
name: human-language name for this term
|
|
|
|
>(repeat term) -> (list token) -> parser-result
|
|
The returned function accepts 0 or more instances of term, and produces
|
|
an opaque value that interacts with other combinators
|
|
|
|
term: (list token) -> parser-result
|
|
|
|
>(parser term) -> (list token) location -> ast-or-error
|
|
Returns a function that parses a list of tokens, producing either the
|
|
result of calling all appropriate result functions or an err
|
|
|
|
term: (list token) -> parser-result
|
|
location: string | editor
|
|
Either the string representing the file name or the editor being read,
|
|
typically retrieved from file-path
|
|
ast-or-error: AST | err
|
|
AST is the result of calling the given result function
|
|
|
|
The err structure is:
|
|
>(make-err string source-list)
|
|
|
|
>(err-msg err) -> string
|
|
The error message
|
|
>(err-src err) -> (list location line-k col-k pos-k span-k)
|
|
This list is suitable for calling raise-read-error,
|
|
*-k are positive integers
|
|
|
|
The language forms provided are:
|
|
>(define-simple-terminals NAME (simple-spec ...))
|
|
Expands to a define-empty-tokens and one terminal definition per
|
|
simple-spec
|
|
|
|
NAME is an identifier specifying a group of tokens
|
|
|
|
simple-spec = NAME | (NAME string) | (NAME proc) | (NAME string proc)
|
|
NAME is an identifier specifying a token/terminal with no value
|
|
proc: token -> ast - A procedure from tokens to AST nodes. id is used
|
|
by default. The token will be a symbol.
|
|
string is the human-language name for the terminal, NAME is used by
|
|
default
|
|
|
|
>(define-terminals NAME (terminal-spec ...))
|
|
Like define-simple-terminals, except uses define-tokens
|
|
|
|
terminal-spec = (NAME proc) | (NAME string proc)
|
|
proc: token -> ast - a procedure from tokens to AST node.
|
|
The token will be the token defined as NAME and will be a value token.
|
|
|
|
>(sequence (NAME ...) proc string)
|
|
Generates a call to seq with the specified names in a list,
|
|
proc => result and string => name.
|
|
The name can be omitted when nested in another sequence or choose
|
|
|
|
>(sequence (NAME_ID ...) proc string)
|
|
where NAME_ID is either NAME or (^ NAME)
|
|
The ^ form identifies a parser production that can be used to identify
|
|
this production in an error message. Otherwise the same as above
|
|
|
|
>(choose (NAME ...) string)
|
|
Generates a call to choice using the given terms as the list of options,
|
|
string => name.
|
|
The name can be omitted when nested in another sequence or choose
|
|
|
|
>(eta NAME)
|
|
Eta expands name with a wrapping that properly mimcs a parser term
|
|
|
|
The _error-format-parameters^_ signature requires five names:
|
|
src?: boolean- will the lexer include source information
|
|
input-type: string- used to identify the source of input
|
|
show-options: boolean- presently ignored
|
|
max-depth: int- The depth of errors reported
|
|
max-choice-depth: int- The max number of options listed in an error
|
|
|
|
The _language-format-parameters^_ requires two names
|
|
class-type: string - general term for language keywords
|
|
input->output-name: token -> string - translates tokens into strings
|
|
|
|
The _language-dictionary^_ requires three names
|
|
misspelled: string string -> number -
|
|
check the spelling of the second arg against the first, return a number
|
|
that is the probability that the second is a misspelling of the first
|
|
misscap: string string -> boolean -
|
|
check the capitalization of the second arg against the first
|
|
missclass: string string -> boolean -
|
|
check if the second arg names a correct token kind
|