Adding a set of higher-order combinator-builders with good error message generation

Combinator implementation is the responsiblity of kathyg, not sowens

svn: r5718
This commit is contained in:
Kathy Gray 2007-03-02 15:15:09 +00:00
parent eadc853016
commit c064fe4238
9 changed files with 1247 additions and 2 deletions

View File

@ -0,0 +1,10 @@
(module combinator-unit mzscheme
(require "private-combinator/combinator-parser.scm"
"private-combinator/parser-sigs.ss")
(provide combinator-parser-tools@
combinator-parser^
error-format-parameters^ language-format-parameters^ language-dictionary^)
)

View File

@ -1,7 +1,6 @@
_parser-tools_
This documentation assumes familiarity with lex and yacc style lexer
and parser generators.
This documentation provides directions on using the lexer, Yacc-style parser generator and combinator parser library. It assumes familiarity with lex and yacc style lexer and parser generators and with combinator parsers.
_lex.ss_
A _regular expression_ is one of the following:
@ -426,4 +425,131 @@ the original grammar have nested blocks the tool will fail.
Annotated examples are in the examples subdirectory of the parser-tools
collection directory.
_combinator-unit.ss_
This library provides a unit implementing four higher-order functions
that can be used to build a combinator parser, and the export and
import signatures related to it. The functions contained in this unit
automatically build error reporting mechanisms in the event that no parse
is found. Unlike other combinator parsers, this system assumes that the
input is already lexed into tokens using _lex.ss_. This library relies on
_(lib "lazy.ss" "lazy")_.
The unit _combinator-parser-tools_ exports the signature
_combinator-parser^_ and imports the signatures _error-format-parameters^_, _language-format-parameters^_, and _language-dictionary^_.
The signature combinator-parser^ references functions to build combinators,
a function to build a runable parser using a combinator, a structure for
recording errors and macro definitions to specify combinators with:
>(terminal predicate result name spell-check case-check type-check) ->
(list token) -> parser-result
The returned function accepts one terminal from a token stream, and
returns produces an opaque value that interacts with other combinators.
predicate: token -> boolean - check that the token is the expected one
result: token -> beta - create the ast node for this terminal
name: string - human-language name for this terminal
spell-check, case-check, type-check: (U bool (token -> bool))
optional arguments, default to #f, perform spell checking, case
checking, and kind checking on incorrect tokens
>(seq sequence result name) -> (list token) -> parser-result
The returned function accepts a term made up of a sequence of smaller
terms, and produces an opaque value that interacts with other
combinators.
sequence: (listof ((list token) -> parser-result)) - the subterms
result: (list alpha) -> beta - create the ast node for this sequence.
Input list matches length of sequence list
name: human-language name for this term
>(choice options name) -> (list token) -> parser-result
The returned function selects between different terms, and produces an
opaque value that interacts with other combinators
options: (listof ((list token) -> parser-result) - the possible terms
name: human-language name for this term
>(repeat term) -> (list token) -> parser-result
The returned function accepts 0 or more instances of term, and produces
an opaque value that interacts with other combinators
term: (list token) -> parser-result
>(parser term) -> (list token) location -> ast-or-error
Returns a function that parses a list of tokens, producing either the
result of calling all appropriate result functions or an err
term: (list token) -> parser-result
location: string | editor
Either the string representing the file name or the editor being read,
typically retrieved from file-path
ast-or-error: AST | err
AST is the result of calling the given result function
The err structure is:
>(make-err string source-list)
>(err-msg err) -> string
The error message
>(err-src err) -> (list location line-k col-k pos-k span-k)
This list is suitable for calling raise-read-error,
*-k are positive integers
The language forms provided are:
>(define-simple-terminals NAME (simple-spec ...))
Expands to a define-empty-tokens and one terminal definition per
simple-spec
NAME is an identifier specifying a group of tokens
simple-spec = NAME | (NAME string) | (NAME proc) | (NAME string proc)
NAME is an identifier specifying a token/terminal with no value
proc: token -> ast - A procedure from tokens to AST nodes. id is used
by default. The token will be a symbol.
string is the human-language name for the terminal, NAME is used by
default
>(define-terminals NAME (terminal-spec ...))
Like define-simple-terminals, except uses define-tokens
terminal-spec = (NAME proc) | (NAME string proc)
proc: token -> ast - a procedure from tokens to AST node.
The token will be the token defined as NAME and will be a value token.
>(sequence (NAME ...) proc string)
Generates a call to seq with the specified names in a list,
proc => result and string => name.
The name can be omitted when nested in another sequence or choose
>(sequence (NAME_ID ...) proc string)
where NAME_ID is either NAME or (^ NAME)
The ^ form identifies a parser production that can be used to identify
this production in an error message. Otherwise the same as above
>(choose (NAME ...) string)
Generates a call to choice using the given terms as the list of options,
string => name.
The name can be omitted when nested in another sequence or choose
>(eta NAME)
Eta expands name with a wrapping that properly mimcs a parser term
The _error-format-parameters^_ signature requires five names:
src?: boolean- will the lexer include source information
input-type: string- used to identify the source of input
show-options: boolean- presently ignored
max-depth: int- The depth of errors reported
max-choice-depth: int- The max number of options listed in an error
The _language-format-parameters^_ requires two names
class-type: string - general term for language keywords
input->output-name: token -> string - translates tokens into strings
The _language-dictionary^_ requires three names
misspelled: string string -> boolean -
check the spelling of the second arg against the first
misscap: string string -> boolean -
check the capitalization of the second arg against the first
missclass: string string -> boolean -
check if the second arg names a correct token kind

View File

@ -0,0 +1,58 @@
(module combinator-example mzscheme
(require (lib "unit.ss")
(lib "lex.ss" "parser-tools")
(lib "combinator-unit.ss" "parser-tools"))
(define-unit support
(import)
(export error-format-parameters^
language-format-parameters^
language-dictionary^)
(define src? #t)
(define input-type "file")
(define show-options #f)
(define max-depth 1)
(define max-choice-depth 2)
(define class-type "keyword")
(define (input->output-name t) (token-name t))
(define (misspelled s1 s2)
(and (equal? s1 "lam")
(equal? s2 "lambda")))
(define (misscap s1 s2)
(and (equal? s1 "lam")
(equal? s2 "Lam")))
(define (missclass s1 s2) #f)
)
(define-signature parser^ (parse-prog))
(define-unit lambda-calc
(import combinator-parser^)
(export parser^)
(define-simple-terminals keywords
(lam (O_paren "(") (C_paren ")")))
(define-terminals ids
((id "variable" string->symbol) (number (lambda (x) (read (open-input-string x))))))
(define app
(sequence (O_paren (repeat (eta expr)) C_paren)
(lambda (id) id)
"application"))
(define func
(sequence (O_paren lam O_paren (repeat id) (eta expr))
(lambda (id) id)
"function"))
(define expr (choose (id number app func) "expression"))
(define parse-prog (parser expr))
)
)

View File

@ -0,0 +1,92 @@
(module combinator-parser (lib "lazy.ss" "lazy")
(require (lib "unit.ss")
(lib "lex.ss" "parser-tools"))
(require "structs.scm" "parser-sigs.ss" "combinator.scm" "errors.scm")
(provide combinator-parser-tools@)
(define-unit main-parser@
(import error^ out^ error-format-parameters^ language-format-parameters^)
(export parser^)
(define (sort-used reses)
(sort reses (lambda (a b) (> (res-used a) (res-used b)))))
(define (parser start)
(lambda (input file)
(let* ([result (start input)]
[out
(cond
[(and (res? result) (res-a result) (null? (res-rest result)))
(car (res-a (!!! result)))]
[(and (res? result) (res-a result) (res-possible-error result))
(fail-type->message (!!! (res-possible-error result)))]
[(and (res? result) (res-a result))
(make-err
(format "Found extraneous input after ~a, starting with ~a, at the end of ~a."
(!!! (res-msg result))
(input->output-name (!!! (car (res-rest result)))) input-type)
(and src?
(make-src-lst (position-token-start-pos (!!! (car (res-rest result)))))))]
[(res? result) (fail-type->message (res-msg (!!! result)))]
[(or (choice-res? result) (pair? result))
(let* ([options (if (choice-res? result) (choice-res-matches result) result)]
[finished-options (filter (lambda (o)
(cond [(res? o) (null? (res-rest o))]
[(repeat-res? o)
(eq? (repeat-res-stop o) 'out-of-input)]))
options)]
[possible-errors (filter res-possible-error
(map (lambda (a) (if (repeat-res? a) (repeat-res-a a) a))
options))])
(cond
[(not (null? finished-options)) (car (res-a (!!! (car finished-options))))]
[(not (null? possible-errors))
(!!! (fail-type->message
(res-possible-error (!!! (car (sort-used possible-errors))))))]
[else
(let ([used-sort (sort-used options)])
(make-err
(format "Found additional content after ~a, begining with ~a."
(!!! (res-msg (car used-sort)))
(input->output-name (!!! (car (res-rest (car used-sort))))))
(and src?
(make-src-lst (position-token-start-pos
(!!! (car (res-rest (car used-sort)))))))))]))]
[(and (repeat-res? result) (eq? 'out-of-input (repeat-res-stop (!!! result))))
(res-a (repeat-res-a result))]
[(and (repeat-res? result) (fail-type? (repeat-res-stop (!!! result))))
(!!! (fail-type->message (!!! (repeat-res-stop (!!! result)))))]
[else (error 'parser (format "Internal error: recieved unexpected input ~a"
(!!! result)))])])
(cond
[(err? out)
(make-err (!!! (err-msg out))
(cons file (!!list (err-src out))))]
[else out]))))
)
(define-unit rank-max@
(import)
(export ranking-parameters^)
(define (rank-choice choices) (apply max choices)))
(define-unit out-struct@
(import)
(export out^)
(define-struct err (msg src)))
(define-compound-unit/infer combinator-parser@
(import error-format-parameters^ language-format-parameters^ language-dictionary^)
(export combinator-parser-forms^ parser^ out^)
(link out-struct@ main-parser@ rank-max@ error-formatting@ combinators@))
(define-unit/new-import-export combinator-parser-tools@
(import error-format-parameters^ language-format-parameters^ language-dictionary^)
(export combinator-parser^)
((combinator-parser-forms^ parser^ out^) combinator-parser@ error-format-parameters^ language-format-parameters^
language-dictionary^))
)

View File

@ -0,0 +1,427 @@
(module combinator (lib "lazy.ss" "lazy")
(require (lib "unit.ss")
(only (lib "etc.ss") opt-lambda))
(require "structs.scm"
"parser-sigs.ss"
(lib "lex.ss" "parser-tools"))
(provide (all-defined))
(define-unit combinators@
(import error-format-parameters^ ranking-parameters^ language-dictionary^)
(export combinator-parser-forms^)
(define return-name "dummy")
;terminal: ('a -> bool 'a -> 'b string) -> ( (list 'a) -> res )
(define terminal
(opt-lambda (pred build name [spell? #f] [case? #f] [class? #f])
(let* ([fail-str (string-append "failed " name)]
[t-name
(lambda (t) (if src? (token-name (position-token-token t)) (token-name t)))]
[t-val
(lambda (t) (if src? (token-value (position-token-token t)) (token-value t)))]
[spell? (if spell? spell?
(lambda (token)
(when (position-token? token) (set! token (position-token-token token)))
(and (token-value token)
(misspelled name (token-value token)))))]
[case? (if case? case?
(lambda (token)
(when (position-token? token) (set! token (position-token-token token)))
(and (token-value token)
(misscap name (token-value token)))))]
[class? (if class? class?
(lambda (token)
(when (position-token? token) (set! token (position-token-token token)))
(missclass name (token-name token))))]
[make-fail
(lambda (c n k i u)
(make-terminal-fail c (if src?
(make-src-lst (position-token-start-pos i)
(position-token-end-pos i))
null)
n 0 u k (if src? (position-token-token i) i)))]
[value (lambda (t) (or (t-val t) name))]
[builder
(if src?
(lambda (token) (build (position-token-token token)
(position-token-start-pos token)
(position-token-end-pos token)))
build)])
(opt-lambda (input [last-src (list 0 0 0 0)] [alts 1])
#;(!!! (printf "terminal ~a~n" name))
(cond
[(eq? input return-name) name]
[(null? input)
(make-terminal-fail null last-src .4 0 0 'end #f)]
[(pred (if src? (position-token-token (car input)) (car input)))
(make-res (list (builder (car input))) (cdr input)
name (value (car input)) 1 #f (car input))]
[else
#;(printf "Incorrect input for ~a : ~a miscase? ~a misspell? ~a ~n" name
(cond
[(and (position-token? (car input))
(token-value (position-token-token (car input))))
(token-value (position-token-token (car input)))]
[(position-token? (car input))
(token-name (position-token-token (car input)))]
[else (car input)])
(case? (car input))
(spell? (car input)))
(fail-res (cdr input)
(let-values ([(chance kind may-use)
(cond
[(case? (car input)) (values 9/10 'misscase 1)]
[(spell? (car input))
(values 4/5 'misspell 1)]
[(class? (car input)) (values 2/5 'missclass 1)]
[else (values 1/5 'wrong 0)])])
(make-fail chance name kind (car input) may-use)))])))))
;seq: ( (list ((list 'a) -> res)) ((list 'b) -> 'c) string -> ((list 'a) -> result)
(define seq
(opt-lambda (sub-list build name [id-position 0])
(let* ([sequence-length (length sub-list)]
[memo-table (make-hash-table 'weak)]
[prev (lambda (x)
(cond [(eq? x return-name) "default previous"]
[else (fail-res null null)]))]
[builder
(lambda (r)
(cond
[(res? r)
(make-res (list (build (res-a r))) (res-rest r)
name (res-id r) (res-used r)
(res-possible-error r)
(res-first-tok r))]
[(repeat-res? r)
(make-res (list (build (res-a (repeat-res-a r))))
(res-rest (repeat-res-a r))
name (res-id (repeat-res-a r))
(res-used (repeat-res-a r))
(repeat-res-stop r)
(res-first-tok (repeat-res-a r)))]))]
[my-error (sequence-error-gen name sequence-length)]
[my-walker (seq-walker id-position name my-error)])
(opt-lambda (input [alts 1] [last-src (list 0 0 0 0)])
#;(printf "seq ~a~n" name)
(cond
[(eq? input return-name) name]
[(hash-table-get memo-table input #f) (hash-table-get memo-table input)]
[(null? sub-list)
(builder (make-res null input name #f 0 #f #f))]
[else
(let* ([pre-build-ans (my-walker sub-list input prev #f #f null 0 alts last-src)]
[ans
(cond
[(and (res? pre-build-ans) (res-a pre-build-ans)) (builder pre-build-ans)]
[(pair? pre-build-ans) (map builder pre-build-ans)]
[else pre-build-ans])])
(hash-table-put! memo-table input ans)
#;(printf "sequence ~a returning ~n" name)
ans)])))))
;seq-walker: int string error-gen -> [(list parser) (list alpha) parser result (U bool string) (list string) int int -> result
(define (seq-walker id-position seq-name build-error)
(letrec ([next-res
(lambda (a id used tok rst)
(cond
[(res? rst)
(make-res (append a (res-a rst)) (res-rest rst)
seq-name (or id (res-id rst))
(+ used (res-used rst)) (res-possible-error rst) tok)]
[(repeat-res? rst)
(make-res (append a (res-a (repeat-res-a rst)))
(res-rest (repeat-res-a rst)) seq-name
(or id (res-id (repeat-res-a rst)))
(+ used (res-used (repeat-res-a rst)))
(repeat-res-stop rst) tok)]))]
[walker
(lambda (subs input previous? look-back curr-id seen used alts last-src)
(let* ([next-preds (cdr subs)]
[curr-pred (car subs)]
[id-spot? (= id-position (add1 (length seen)))]
[next-call
(lambda (old-result curr curr-name new-id tok alts)
(let* ([old-answer (res-a old-result)]
[rest (res-rest old-result)]
[old-used (res-used old-result)]
[rsts (walker next-preds rest curr-pred curr
(or new-id curr-id) (cons curr-name seen)
(+ old-used used) alts
(make-src-lst (res-first-tok old-result)))])
(cond
[(and (res? rsts) (res-a rsts))
(next-res old-answer new-id old-used tok rsts)]
[(res? rsts) (fail-res rest (res-msg rsts))]
[(pair? rsts)
(map (lambda (rst) (next-res old-answer new-id old-used tok rst))
rsts)])))])
(cond
[(null? next-preds)
(build-error (curr-pred input last-src)
(previous? input) (previous? return-name)
look-back used curr-id seen alts last-src)]
[else
#;(printf "seq-walker called: else case~n")
(let ([fst (curr-pred input last-src)])
(cond
[(res? fst)
(cond
[(res-a fst) (next-call fst fst (res-msg fst) (and id-spot? (res-id fst))
(res-first-tok fst) alts)]
[else
(build-error fst (previous? input) (previous? return-name)
look-back used curr-id seen alts last-src)])]
[(repeat-res? fst) (next-call (repeat-res-a fst) fst
(res-msg (repeat-res-a fst)) #f
(res-first-tok (repeat-res-a fst)) alts)]
[(or (choice-res? fst) (pair? fst))
#;(printf "choice-res or pair: ~a ~a ~n"
(if (choice-res? fst) (map res-rest (choice-res-matches fst)) fst)
(if (choice-res? fst) (map res-a (choice-res-matches fst)) fst))
(let*-values
([(lst name curr)
(if (choice-res? fst)
(values (choice-res-matches fst)
(lambda (_) (choice-res-name fst))
(lambda (_) fst))
(values fst res-msg (lambda (x) x)))]
[(new-alts) (+ alts (length lst))]
[(rsts)
(map (lambda (res)
(cond
[(res? res)
(next-call res (curr res) (name res) (and id-spot? (res-id res))
(res-first-tok res) new-alts)]
[(repeat-res? res)
(next-call (repeat-res-a res) res
(res-msg (repeat-res-a res)) #f
(res-first-tok (repeat-res-a res)) new-alts)])) lst)]
[(correct-rsts) (correct-list rsts)])
#;(printf "correct-rsts ~a~n" (map res-a correct-rsts))
#;(printf "rsts: ~a~n" (map res-a rsts))
(cond
[(null? correct-rsts)
(let ([fails (map (lambda (rst)
(res-msg
(build-error rst (previous? input) (previous? return-name)
look-back used curr-id seen alts last-src)))
rsts)])
(fail-res input
(make-options-fail
(rank-choice (map fail-type-chance fails)) #f seq-name
(rank-choice (map fail-type-used fails))
(rank-choice (map fail-type-may-use fails)) fails)))]
[else correct-rsts]))]))])))])
walker))
;get-fail-info: fail-type -> (values symbol 'a 'b)
(define (get-fail-info fail)
(cond
[(terminal-fail? fail)
(values (terminal-fail-kind fail)
(fail-type-name fail)
(terminal-fail-found fail))]
[(sequence-fail? fail)
(values 'sub-seq (sequence-fail-expected fail) fail)]
[(choice-fail? fail) (values 'choice null fail)]
[(options-fail? fail) (values 'options null fail)]))
;update-src: symbol src-list src-list token -> src-list
(define (update-src error-kind src prev-src tok)
(and src?
(case error-kind
[(choice options) prev-src]
[(sub-seq misscase misspell end) src]
[(missclass wrong) (update-src-start src (position-token-start-pos tok))])))
;build-sequence-error: result boolean result string int [U #f string] [listof string] int int -> result
(define (sequence-error-gen name len)
(lambda (old-res prev prev-name look-back used id seen alts last-src)
(cond
[(and (pair? old-res) (null? (cdr old-res))) (car old-res)]
[(repeat-res? old-res)
(cond
[(fail-type? (repeat-res-stop old-res))
(let ([res (repeat-res-a old-res)])
(make-res (res-a res) (res-rest res) (res-msg res) (res-id res) (res-used res)
(repeat-res-stop old-res) (res-first-tok res)))]
[else (repeat-res-a old-res)])]
[(or (and (res? old-res) (res-a old-res))
(choice-res? old-res)
(pair? old-res)) old-res]
[else
;There actually was an error
(fail-res (res-rest old-res)
(let*-values ([(fail)
(cond
[(and (repeat-res? look-back)
(fail-type? (repeat-res-stop look-back))
(> (fail-type-chance (repeat-res-stop look-back))
(fail-type-chance (res-msg old-res))))
(repeat-res-stop look-back)]
[else (res-msg old-res)])]
[(kind expected found) (get-fail-info fail)]
[(new-src) (update-src kind
(fail-type-src fail)
last-src
(res-first-tok old-res))]
[(seen-len) (length seen)]
[(updated-len) (+ (- used seen-len) len)])
#;(printf "sequence ~a failed.~n seen ~a~n" name (reverse seen))
#;(when (repeat-res? look-back)
(printf "look-back ~a : ~a vs ~a : ~a > ~a~n"
(repeat-res-stop look-back)
(and (fail-type? (repeat-res-stop look-back)) (fail-type-name (repeat-res-stop look-back)))
(fail-type-name (res-msg old-res))
(and (fail-type? (repeat-res-stop look-back)) (fail-type-chance (repeat-res-stop look-back)))
(fail-type-chance (res-msg old-res))))
#;(printf "old-probability ~a new probability ~a~n"
(cond
[(= 0 used) (fail-type-chance fail)]
[else (+ (* (fail-type-chance fail) (/ 1 updated-len))
(/ used updated-len))])
(compute-chance len seen-len used alts (fail-type-chance fail)))
(make-sequence-fail
(cond
[(= 0 used) (fail-type-chance fail)]
[else (compute-chance len seen-len used alts (fail-type-chance fail))])
(fail-type-src fail)
name used
(+ used (fail-type-may-use fail))
id kind (reverse seen) expected found (and (res? prev) (res-a prev) (res-msg prev))
prev-name)))])))
(define (compute-chance expected-length seen-length used-toks num-alts sub-chance)
(let* ([revised-expectation (+ (- used-toks seen-length) expected-length)]
[probability-with-sub (* (/ (add1 used-toks) revised-expectation) (/ 1 num-alts))]
[probability-without-sub (* (/ used-toks revised-expectation) (/ 1 num-alts))]
[expected-sub probability-with-sub]
[expected-no-sub probability-without-sub]
[probability (/ (* expected-sub sub-chance) (+ (* expected-sub sub-chance)
(* expected-no-sub (- 1 sub-chance))))])
#;(printf "compute-chance: args ~a ~a ~a ~a ~a~n"
expected-length seen-length used-toks num-alts sub-chance)
#;(printf "compute-chance: intermediate values: ~a ~a ~a ~a ~a~n"
revised-expectation probability-with-sub probability-without-sub expected-sub expected-no-sub)
probability))
;repeat: (list 'a) -> result -> (list 'a) -> result
(define (repeat sub)
(letrec ([repeat-name (string-append "any number of " (sub return-name))]
[memo-table (make-hash-table 'weak)]
[process-rest
(lambda (curr-ans rest-ans)
(cond
[(repeat-res? rest-ans)
(let ([a (res-a curr-ans)]
[rest (repeat-res-a rest-ans)])
(make-repeat-res
(cond
[(res? rest)
(make-res (append a (res-a rest)) (res-rest rest) repeat-name ""
(+ (res-used curr-ans) (res-used rest)) (res-possible-error rest)
(res-first-tok curr-ans))]
[(and (pair? rest) (null? (cdr rest)))
(make-res (append a (res-a (car rest))) (res-rest (car rest)) repeat-name ""
(+ (res-used curr-ans) (res-used (car rest)))
(res-possible-error (car rest))
(res-first-tok curr-ans))]
[(pair? rest)
(correct-list
(map (lambda (rs)
(make-res (append a (res-a rs)) (res-rest rs) repeat-name ""
(+ (res-used curr-ans) (res-used rs))
(res-possible-error rs)
(res-first-tok curr-ans)))
rest))])
(repeat-res-stop rest-ans)))]
[(pair? rest-ans)
(map (lambda (r) (process-rest curr-ans r)) rest-ans)]))])
(opt-lambda (input [last-src (list 0 0 0 0)] [alts 1])
(cond
[(eq? input return-name) repeat-name]
[(hash-table-get memo-table input #f) (hash-table-get memo-table input)]
[else
(let ([ans
(let loop ([curr-input input])
(cond
[(null? curr-input)
(make-repeat-res (make-res null null repeat-name "" 0 #f #f) 'out-of-input)]
[else
(let ([this-res (sub curr-input last-src)])
#;(printf "Repeat of ~a called it's repeated entity: ~a~n" repeat-name this-res)
(cond
[(and (res? this-res) (res-a this-res))
(process-rest this-res (loop (res-rest this-res)))]
[(res? this-res)
(make-repeat-res (make-res null curr-input repeat-name "" 0 #f #f) (res-msg this-res))]
[(or (choice-res? this-res) (pair? this-res))
(map (lambda (match) (process-rest match (loop (res-rest match))))
(if (choice-res? this-res) (choice-res-matches this-res) this-res))]))]))])
(hash-table-put! memo-table input ans)
#;(printf "repeat of ~a ended with ans ~a~n" repeat-name ans)
ans)]))))
;choice: [list [[list 'a ] -> result]] name -> result
(define (choice opt-list name)
(let ([memo-table (make-hash-table 'weak)]
[num-choices (length opt-list)]
[choice-names (map (lambda (o) (o return-name)) opt-list)])
(opt-lambda (input [last-src (list 0 0 0 0)] [alts 1])
#;(printf "choice ~a~n" name)
(let ([sub-opts (sub1 (+ alts num-choices))])
(cond
[(hash-table-get memo-table input #f) (hash-table-get memo-table input)]
[(eq? input return-name) name]
[else
(let* ([options (map (lambda (term) (term input last-src sub-opts)) opt-list)]
[fails (map res-msg options)]
[corrects (correct-list options)]
[ans
(cond
[(null? corrects)
(fail-res input
(make-choice-fail (rank-choice (map fail-type-chance fails)) #f name
(rank-choice (map fail-type-used fails))
(rank-choice (map fail-type-may-use fails))
num-choices choice-names fails))]
[(null? (cdr corrects)) (car corrects)]
[else (make-choice-res name corrects)])])
#;(printf "choice ~a is returning ~a options were ~a ~n" name ans choice-names)
(hash-table-put! memo-table input ans) ans)])))))
;correct-list: (list result) -> (list result)
(define (correct-list subs)
(cond
[(pair? subs)
(cond
[(and (res? (car subs)) (res-a (car subs)))
(cons (car subs) (correct-list (cdr subs)))]
[(choice-res? (car subs))
(append (choice-res-matches (car subs)) (correct-list (cdr subs)))]
[(repeat-res? (car subs))
(correct-list (cons (repeat-res-a (car subs)) (cdr subs)))]
[(pair? (car subs))
(append (car subs) (correct-list (cdr subs)))]
[else (correct-list (cdr subs))])]
[(null? subs) null]))
(define (update-src-start src new-start)
(list (position-line new-start)
(position-col new-start)
(position-offset new-start)
(+ (- (third src) (position-offset new-start))
(fourth src))))
(define (update-src-end src new-end)
(list (first src) (second src) (third src)
(- (position-offset new-end) (third src))))
)
)

View File

@ -0,0 +1,261 @@
(module errors mzscheme
(require "structs.scm" "parser-sigs.ss")
(require (lib "force.ss" "lazy")
(lib "etc.ss")
(lib "unit.ss")
(lib "list.ss"))
(provide (all-defined))
(define-unit error-formatting@
(import error-format-parameters^ language-format-parameters^ out^)
(export (rename error^ (public-fail-type->message fail-type->message)))
;public-fail-type->message : fail-type -> err
(define (public-fail-type->message fail)
(fail-type->message fail null))
;fail-type->message: fail-type (listof err) -> err
(define (fail-type->message fail-type message-to-date)
(let* ([fail-type (!!!-fail fail-type)]
[input->output-name (!!! input->output-name)]
[name (fail-type-name fail-type)]
[a (a/an name)]
[msg (lambda (m) (make-err m (fail-type-src fail-type)))])
#;(printf "fail-type->message ~a~n" fail-type)
(cond
[(terminal-fail? fail-type)
(combine-message
(msg
(case (terminal-fail-kind fail-type)
[(end) (format "Expected to find ~a ~a, but ~a ended prematurely."
a name input-type)]
[(wrong) (format "Expected to find ~a ~a, but instead found ~a, which is illegal here."
a name (input->output-name (terminal-fail-found fail-type)))]
[(misscase) (format "Expected to find ~a ~a, found ~a which may be miscapitalized."
a name (input->output-name (terminal-fail-found fail-type)))]
[(misspell) (format "Expected to find ~a ~a, found ~a which seems to be misspelled."
a name (input->output-name (terminal-fail-found fail-type)))]
[(missclass) (format "Found ~a instead of ~a ~a, a ~a cannot be used as ~a ~a."
(input->output-name (terminal-fail-found fail-type)) a name class-type a name)]))
message-to-date)]
[(sequence-fail? fail-type)
(let* ([id-name
(if (sequence-fail-id fail-type)
(string-append name " " (sequence-fail-id fail-type)) name)]
[expected (sequence-fail-expected fail-type)]
[a2 (a/an expected)]
[show-sequence (sequence-fail-correct fail-type)])
(case (sequence-fail-kind fail-type)
[(end)
(combine-message
(msg (format "Expected ~a to contain ~a ~a to complete the ~a. ~nFound ~a before ~a ended."
input-type a2 expected id-name (format-seen show-sequence) input-type)) message-to-date)]
[(wrong)
(combine-message
(msg
(cond
[(sequence-fail-repeat? fail-type)
(format "Found a repitition of ~a; the required number are present. Expected ~a ~a next."
(sequence-fail-last-seen fail-type) a2 expected)]
[(null? show-sequence)
(format "Expected ~a ~a to begin this ~a, instead found ~a."
a2 expected id-name (input->output-name (sequence-fail-found fail-type)))]
[else
(format "Expected ~a ~a to continue this ~a. Instead, found ~a after ~a."
a2 expected id-name (input->output-name (sequence-fail-found fail-type))
(format-seen show-sequence))]))
message-to-date)]
[(misscase)
(combine-message
(msg (format "Expected to find ~a ~a to continue this ~a, found ~a which may be miscapitalized."
a2 expected id-name (input->output-name (sequence-fail-found fail-type))))
message-to-date)]
[(misspell)
(combine-message
(msg (format "Expected to find ~a ~a to continue this ~a, found ~a which seems to be misspelled."
a2 expected id-name (input->output-name (sequence-fail-found fail-type))))
message-to-date)]
[(missclass)
(combine-message
(msg (format "Found ~a instead of ~a ~a, a ~a cannot be used as a(n) ~a."
(input->output-name (sequence-fail-found fail-type)) a2 expected class-type expected))
message-to-date)]
[(sub-seq choice)
(fail-type->message (sequence-fail-found fail-type)
(add-to-message (msg (format "An error occured in ~a.~n" id-name)) name message-to-date))]
[(options)
(let ([sorted-opts (sort (options-fail-opts (sequence-fail-found fail-type))
(lambda (a b) (>= (fail-type-chance a) (fail-type-chance b))))])
(fail-type->message (car sorted-opts)
(add-to-message
(msg (format "There is an error in this ~a after ~a, it is likely you intended a(n) ~a here.~n"
id-name (car (reverse show-sequence)) (fail-type-name (car sorted-opts))))
name message-to-date)))]))]
[(options-fail? fail-type)
#;(printf "selecting for options on ~a~n" name)
(let* ([winners (select-errors (options-fail-opts fail-type))]
[top-names (map fail-type-name winners)]
[non-dup-tops (remove-dups top-names name)]
[top-name (car top-names)])
(fail-type->message
(car winners)
(add-to-message
(msg
(cond
[(and (> (length winners) 1) (> (length non-dup-tops) 1))
(format "There is an error in this ~a. It is likely you intended one of ~a here.~n"
name (nice-list non-dup-tops))]
[else
(format "There is an error in this ~a~a.~n"
name
(if (equal? top-name name) ""
(format ", it is likely you intended ~a ~a here" (a/an top-name) top-name)))]))
name message-to-date)))]
[(choice-fail? fail-type)
#;(printf "selecting for ~a~n" name)
(let* ([winners (select-errors (choice-fail-messages fail-type))]
[top-names (map fail-type-name winners)]
[top-name (car top-names)]
[no-dup-names (remove-dups (choice-fail-names fail-type) name)])
(fail-type->message
(car winners)
(add-to-message
(msg (cond
[(and (<= (choice-fail-options fail-type) max-choice-depth)
(> (length no-dup-names) 1)
(> (length winners) 1)
(equal? top-names no-dup-names))
(format "An error occured in this ~a, one of ~a is expected here."
name (nice-list no-dup-names))]
[(and (<= (choice-fail-options fail-type) max-choice-depth)
(> (length no-dup-names) 1)
(> (length winners) 1))
(format "An error occured in this ~a, one of ~a is expected here. Input is close to one of ~a.~n"
name (nice-list no-dup-names) (nice-list top-names))]
[(and (<= (choice-fail-options fail-type) max-choice-depth)
(> (length no-dup-names) 1))
(format "An error occured in this ~a, one of ~a is expected here. Current input is close to ~a.~a~n"
name (nice-list no-dup-names) top-name
(if show-options " To see all options click here." ""))] ;Add support for formatting and passing up all options
[else
(format "An error occured in this ~a~a.~a~n"
name
(if (equal? name top-name) "" (format ", it is likely that you intended ~a ~a here"
(a/an top-name) top-name))
(if show-options " To see all options click here." ""))]))
name message-to-date)))])))
(define (chance-used a) (* (fail-type-chance a) (fail-type-used a)))
(define (chance-may-use a) (* (fail-type-chance a) (fail-type-may-use a)))
(define (chance a) (fail-type-chance a))
(define (composite a)
(/ (+ (chance-used a) (chance-may-use a) (chance a)) 3))
(define (narrow-opts rank options)
(get-ties (sort options (lambda (a b) (> (rank a) (rank b)))) rank))
(define (select-errors opts-list)
(let* ([composite-winners
(narrow-opts composite (!!list opts-list))]
[chance-used-winners
(narrow-opts chance-used composite-winners)]
[chance-may-winners
(narrow-opts chance-may-use chance-used-winners)]
[winners (narrow-opts chance chance-may-winners)])
#;(printf "all options: ~a~n" (!!list opts-list))
#;(printf "~a ~a ~a ~a ~n"
(map fail-type-name (map !!! (!!list opts-list)))
(map !!! (map fail-type-chance (!!list opts-list)))
(map !!! (map fail-type-used (!!list opts-list)))
(map !!! (map fail-type-may-use (!!list opts-list))))
#;(printf "composite round: ~a ~a ~n"
(map fail-type-name (map !!! composite-winners))
(map composite (map !!! composite-winners)))
#;(printf "final sorting: ~a~n" (map fail-type-name (map !!! winners)))
winners))
(define (first-n n lst)
(let loop ([count 0] [l lst])
(cond
[(= count n) null]
[else (cons (car lst) (loop (add1 count) (cdr lst)))])))
(define (get-ties lst evaluate)
(letrec ([getter
(lambda (sub)
(cond
[(null? sub) null]
[(>= (- (evaluate (car lst)) (evaluate (car sub))) .0001) null]
[else (cons (car sub) (getter (cdr sub)))]))])
(cons (car lst) (getter (cdr lst)))))
(define (a/an next-string)
(if (string? next-string)
(if (member (substring next-string 0 1) `("a" "e" "i" "o" "u"))
"an" "a")
"a"))
(define (format-seen l)
(if (null? l)
""
(string-append "'"
(car l)
(apply string-append
(map (lambda (i) (string-append " " i)) (cdr l)))
"'")))
(define (nice-list l)
(letrec ([formatter
(lambda (l)
(cond
[(null? (cdr l)) (string-append "or " (car l))]
[else (string-append (car l) ", " (formatter (cdr l)))]))])
(cond
[(null? l) (error 'internal-error "nice-list in combinator-parser/errors.scm recieved null list")]
[(null? (cdr l)) (car l)]
[(null? (cddr l)) (string-append (car l) " or " (cadr l))]
[else (formatter l)])))
(define (downcase string)
(string-append (string-downcase (substring string 0 1))
(substring string 1 (string-length string))))
(define (remove-dups l n)
(cond
[(null? l) null]
[(equal? (car l) n)
(remove-dups (cdr l) n)]
[(member (car l) (cdr l))
(remove-dups (cdr l) n)]
[else (cons (car l) (remove-dups (cdr l) n))]))
(define-struct ms (who say))
;add-to-message: err string (list err) -> (list err)
(define (add-to-message msg name rest)
(let ([next (make-ms name msg)])
(cond
[(null? rest) (list next)]
[(equal? (ms-who (car rest)) name) (cons next (cdr rest))]
[(< (length rest) max-depth) (cons next rest)]
[else (cons next (first-n (sub1 max-depth) rest))])))
;combine-message: err (list ms) -> err
(define (combine-message end-msg messages)
(cond
[(null? messages) end-msg]
[else
(combine-message
(make-err (string-append (err-msg (ms-say (car messages)))
(err-msg end-msg))
(err-src end-msg))
(cdr messages))]))
)
)

View File

@ -0,0 +1,2 @@
(module info (lib "infotab.ss" "setup")
(define name "Parser-tools private-combinator"))

View File

@ -0,0 +1,193 @@
(module parser-sigs mzscheme
(require (lib "unit.ss"))
(require (only (lib "etc.ss") opt-lambda)) ; Required for expansion
(require (lib "lex.ss" "parser-tools")
(lib "string.ss") (lib "list.ss"))
(provide (all-defined))
(define-signature-form (terminals stx)
(syntax-case stx ()
[(_ group (elt ...))
(and (identifier? #'group)
(andmap identifier? (syntax->list #'(elt ...))))
(syntax->list #`(elt ...
#,@(map (lambda (e)
(datum->syntax-object e
(string->symbol
(format "token-~a" (syntax-e e)))))
(syntax->list #'(elt ...)))))]))
(define-for-syntax (insert-name stx name)
(let loop ([term stx]
[pos 0]
[id-pos 0]
[terms null])
(syntax-case term (sequence choose ^)
[((sequence a b) . rest)
(loop (syntax rest) (add1 pos) id-pos
(cons (quasisyntax (sequence a b #,name)) terms))]
[((choose a) . rest)
(loop (syntax rest) (add1 pos) id-pos
(cons (quasisyntax (choose a #,name)) terms))]
[((^ a) . rest)
(loop (syntax (a . rest))
pos (add1 pos) terms)]
[(a . rest)
(loop (syntax rest) (add1 pos) id-pos (cons (syntax a) terms))]
[() (list (reverse terms) id-pos)])))
(define-signature language-dictionary^ (misspelled misscap missclass))
(define-signature combinator-parser-forms^
(terminal choice seq repeat
(define-syntaxes (define-simple-terminals)
(values
(lambda (stx)
(syntax-case stx ()
((_ group elts)
(let ([name-string-thunks
(let loop ([elt-list (syntax elts)])
(syntax-case elt-list (lambda)
[() null]
[(id . rest)
(identifier? (syntax id))
(cons (list (syntax id)
(syntax (symbol->string (quote id)))
`(lambda (x . args) x))
(loop (syntax rest)))]
[((id name) . rest)
(and (identifier? (syntax id)) (string? (syntax-e (syntax name))))
(cons (list (syntax id)
(syntax name)
`(lambda (x . args) x))
(loop (syntax rest)))]
[((id thunk) . rest)
(and (identifier? (syntax id)) (identifier? (syntax thunk)))
(cons (list (syntax id)
(syntax (symbol->string (quote id)))
(syntax thunk))
(loop (syntax rest)))]
[((id (lambda x body ...)) . rest)
(identifier? (syntax id))
(cons (list (syntax id)
(syntax (symbol->string (quote id)))
(syntax (lambda x body ...)))
(loop (syntax rest)))]
[((id name thunk) . rest)
(and (identifier? (syntax id)) (string? (syntax-e (syntax name))))
(cons (list (syntax id)
(syntax name)
(syntax thunk))
(loop (syntax rest)))]))])
(with-syntax ([(id ...) (map car name-string-thunks)]
[(name ...) (map cadr name-string-thunks)]
[(thunk ...) (map caddr name-string-thunks)])
(syntax
(begin
(define-empty-tokens group (id ...))
(define id
(terminal
(lambda (token) (eq? (token-name token) (quote id)))
thunk
name)) ...)))))))))
(define-syntaxes (define-terminals)
(values
(lambda (stx)
(syntax-case stx ()
[(_ group elts)
(identifier? (syntax group))
(let ([name-string-thunks
(let loop ([elt-list (syntax elts)])
(syntax-case elt-list (lambda)
[() null]
[((id (lambda (arg1 ...) body ...)) . rest)
(identifier? (syntax id))
(cons (list (syntax id)
(syntax (symbol->string (quote id)))
(syntax (lambda (arg1 ...) body ...)))
(loop (syntax rest)))]
[((id thunk) . rest)
(and (identifier? (syntax id)) (identifier? (syntax thunk)))
(cons (list (syntax id)
(syntax (symbol->string (quote id)))
(syntax thunk))
(loop (syntax rest)))]
[((id name thunk) . rest)
(cons (list (syntax id)
(syntax name)
(syntax thunk))
(loop (syntax rest)))]))])
(with-syntax ([(id ...) (map car name-string-thunks)]
[(name ...) (map cadr name-string-thunks)]
[(thunk ...) (map caddr name-string-thunks)])
(syntax
(begin
(define-tokens group (id ...))
(define id
(terminal
(lambda (token) (eq? (token-name token) (quote id)))
(lambda (x . args)
(if (null? args)
(thunk (token-value x))
(thunk (token-value x) (car args) (cadr args))))
name
(lambda (token) #f)
(lambda (token) #f))) ...))))]))))
(define-syntaxes (sequence)
(values
(lambda (stx)
(syntax-case stx (^)
[(_ (term ...) proc)
(syntax
(seq (list term ...) proc (symbol->string (gensym 'seq))))]
[(_ terms proc name)
(let ([new-terms (insert-name (syntax terms) (syntax name))])
(with-syntax (((term ...) (car new-terms))
(id-pos (cadr new-terms)))
(syntax (seq (list term ...) proc name id-pos))))]))))
(define-syntaxes(choose)
(values
(lambda (stx)
(syntax-case stx ()
[(_ (term ...))
(syntax
(choice (list term ...) (symbol->string (gensym 'choice))))]
[(_ terms name)
(with-syntax (((term ...) [car (insert-name (syntax terms) (syntax name))]))
(syntax
(choice (list term ...) name)))]))))
(define-syntaxes (^)
(values
(syntax-rules ()
[(_ f) f])))
(define-syntaxes (eta)
(values (syntax-rules ()
[(_ f)
(opt-lambda (x [c 1]) (f x c))])))
))
(define-signature parser^ (parser))
(define-signature out^ ((struct err (msg src))))
(define-signature language-format-parameters^ (class-type input->output-name))
(define-signature error-format-parameters^
(src? input-type show-options max-depth max-choice-depth))
(define-signature ranking-parameters^ (rank-choice))
(define-signature error^ (fail-type->message))
(define-signature combinator-parser^ extends combinator-parser-forms^
(parser (struct err (msg src))))
)

View File

@ -0,0 +1,76 @@
(module structs mzscheme
(provide (all-defined-except make-fail-type))
(require (lib "force.ss" "lazy")
(lib "lex.ss" "parser-tools"))
;fail-src: (list line col pos span loc)
;make-src-lst: position position -> src-list
(define (make-src-lst start end)
(list (position-line start)
(position-col start)
(position-offset start)
(- (position-offset end)
(position-offset start))))
;(make-fail-type float fail-src string int int)
(define-struct fail-type (chance src name used may-use) (make-inspector))
;(make-terminal-fail float fail-src string symbol 'a)
(define-struct (terminal-fail fail-type) (kind found))
;(make-sequence-fail float fail-src string symbol (list string) string 'a boolean string)
(define-struct (sequence-fail fail-type) (id kind correct expected found repeat? last-seen))
;(make-choice-fail float fail-src string int (list string) (list fail-type))
(define-struct (choice-fail fail-type) (options names messages) (make-inspector))
;(make-options-fail float #f #f (list fail-type))
(define-struct (options-fail fail-type) (opts))
(define (!!!-fail fail)
(let*-values ([(chance src name used may-use)
(if (fail-type? fail)
(values (!!! (fail-type-chance fail))
(!!! (fail-type-src fail))
(!!! (fail-type-name fail))
(!!! (fail-type-used fail))
(!!! (fail-type-may-use fail)))
(values #f #f #f #f #f))])
(cond
[(terminal-fail? fail)
(make-terminal-fail chance src name used may-use
(!!! (terminal-fail-kind fail))
(!!! (terminal-fail-found fail)))]
[(sequence-fail? fail)
(make-sequence-fail chance src name used may-use
(!!! (sequence-fail-id fail))
(!!! (sequence-fail-kind fail))
(!!! (sequence-fail-correct fail))
(!!! (sequence-fail-expected fail))
(!!!-fail (sequence-fail-found fail))
(!!! (sequence-fail-repeat? fail))
(!!! (sequence-fail-last-seen fail)))]
[(choice-fail? fail)
(make-choice-fail chance src name used may-use
(!!! (choice-fail-options fail))
(!!! (choice-fail-names fail))
(map !!!-fail (!!! (choice-fail-messages fail))))]
[(options-fail? fail)
(make-options-fail chance src name used may-use
(map !!!-fail (!!! (options-fail-opts fail))))]
[else (!!! fail)])))
;result = res | choice-res | repeat-res | (listof (U res choice-res))
;(make-res (U #f (listof 'b)) (listof 'a) (U string fail-type) (U string 'a) int) [U #f fail-type] token
(define-struct res (a rest msg id used possible-error first-tok) (make-inspector))
;make-choice-res string (listof res)
(define-struct choice-res (name matches))
;(make-repeat-res answer (U symbol fail-type))
(define-struct repeat-res (a stop) (make-inspector))
(define (fail-res rst msg) (make-res #f rst msg "" 0 #f #f))
)