_combinator-parser_ This documentation provides directions on using the combinator parser library. It assumes familiarity with lexing and with combinator parsers. _combinator-unit.ss_ This library provides a unit implementing four higher-order functions that can be used to build a combinator parser, and the export and import signatures related to it. The functions contained in this unit automatically build error reporting mechanisms in the event that no parse is found. Unlike other combinator parsers, this system assumes that the input is already lexed into tokens using _lex.ss_. This library relies on _(lib "lazy.ss" "lazy")_. The unit _combinator-parser-tools_ exports the signature _combinator-parser^_ and imports the signatures _error-format-parameters^_, _language-format-parameters^_, and _language-dictionary^_. The signature combinator-parser^ references functions to build combinators, a function to build a runable parser using a combinator, a structure for recording errors and macro definitions to specify combinators with: >(terminal predicate result name spell-check case-check type-check) -> (list token) -> parser-result The returned function accepts one terminal from a token stream, and returns produces an opaque value that interacts with other combinators. predicate: token -> boolean - check that the token is the expected one result: token -> beta - create the ast node for this terminal name: string - human-language name for this terminal spell-check, case-check, type-check: (U bool (token -> bool)) optional arguments, default to #f, perform spell checking, case checking, and kind checking on incorrect tokens >(seq sequence result name) -> (list token) -> parser-result The returned function accepts a term made up of a sequence of smaller terms, and produces an opaque value that interacts with other combinators. sequence: (listof ((list token) -> parser-result)) - the subterms result: (list alpha) -> beta - create the ast node for this sequence. Input list matches length of sequence list name: human-language name for this term >(choice options name) -> (list token) -> parser-result The returned function selects between different terms, and produces an opaque value that interacts with other combinators options: (listof ((list token) -> parser-result) - the possible terms name: human-language name for this term >(repeat term) -> (list token) -> parser-result The returned function accepts 0 or more instances of term, and produces an opaque value that interacts with other combinators term: (list token) -> parser-result >(parser term) -> (list token) location -> ast-or-error Returns a function that parses a list of tokens, producing either the result of calling all appropriate result functions or an err term: (list token) -> parser-result location: string | editor Either the string representing the file name or the editor being read, typically retrieved from file-path ast-or-error: AST | err AST is the result of calling the given result function The err structure is: >(make-err string source-list) >(err-msg err) -> string The error message >(err-src err) -> (list location line-k col-k pos-k span-k) This list is suitable for calling raise-read-error, *-k are positive integers The language forms provided are: >(define-simple-terminals NAME (simple-spec ...)) Expands to a define-empty-tokens and one terminal definition per simple-spec NAME is an identifier specifying a group of tokens simple-spec = NAME | (NAME string) | (NAME proc) | (NAME string proc) NAME is an identifier specifying a token/terminal with no value proc: token -> ast - A procedure from tokens to AST nodes. id is used by default. The token will be a symbol. string is the human-language name for the terminal, NAME is used by default >(define-terminals NAME (terminal-spec ...)) Like define-simple-terminals, except uses define-tokens terminal-spec = (NAME proc) | (NAME string proc) proc: token -> ast - a procedure from tokens to AST node. The token will be the token defined as NAME and will be a value token. >(sequence (NAME ...) proc string) Generates a call to seq with the specified names in a list, proc => result and string => name. The name can be omitted when nested in another sequence or choose >(sequence (NAME_ID ...) proc string) where NAME_ID is either NAME or (^ NAME) The ^ form identifies a parser production that can be used to identify this production in an error message. Otherwise the same as above >(choose (NAME ...) string) Generates a call to choice using the given terms as the list of options, string => name. The name can be omitted when nested in another sequence or choose >(eta NAME) Eta expands name with a wrapping that properly mimcs a parser term The _error-format-parameters^_ signature requires five names: src?: boolean- will the lexer include source information input-type: string- used to identify the source of input show-options: boolean- presently ignored max-depth: int- The depth of errors reported max-choice-depth: int- The max number of options listed in an error The _language-format-parameters^_ requires two names class-type: string - general term for language keywords input->output-name: token -> string - translates tokens into strings The _language-dictionary^_ requires three names misspelled: string string -> number - check the spelling of the second arg against the first, return a number that is the probability that the second is a misspelling of the first misscap: string string -> boolean - check the capitalization of the second arg against the first missclass: string string -> boolean - check if the second arg names a correct token kind