#lang scribble/manual @(require (for-label typed/racket/base syntax/parse ;"template.rkt" )) @(define ellipses (racket ...)) @title[#:tag "template-lib"]{Versatile parser and template library} Keywords: grammar, parser, template. @defform[(parse expr [pattern body …] …)]{ Analogous to @racket[syntax-parse], except it isn't specialized for syntax, but rather works for arbitrary s-expressions, including syntax ones (denoted by @racket[#'(…)] in the pattern).} @defform[#:literals (: :: ... else struct) (tmpl template) #:grammar [(template variable [variable : type] ;; (ann variable type) ;; cons-template (template . template) (template :: template) ;; list (template**) ;; list* template**-dotted ;; vector #(template**) (vector . template**-dotted) ;; hash-template: template** must expand to a list of pairs. (hash . template**-dotted) ;; TODO: how to distinguish (hasheq . template**-dotted) ;; mutable and immutable? (hasheqv . template**-dotted) #hash([template . template]) #hasheq([template . template]) #hasheqv([template . template]) ;; struct-template (struct-id template …) (struct struct-id template …) #s(prefab-id template …) #s(template template …) ;; Only allowed in untyped racket ;; box #&template ;; call-template (~identifier args …) ;; calls (identifier args …) (~ expr args …) ;; calls (expr args …) ;; unquote-template ,expr ,@(list expr) ,@(list* expr) ;; must appear in last position. ;; template-expander template-expander-id (template-expander-id args …) ;; maybe-template (should all be template expanders ;; which means the system is extensible enough to express ;; these special cases). (?? alt-template …) (?@ . template**-dotted) (??@ . template**-dotted) (?if condition template template) (|@if| condition template template) (if@ condition template template) (|@cond| [condition template] …) (|@cond| [condition template] … [else template]) (cond@ condition template template) ;; like #,@(with-syntax ([meta-var #'template]) ;; #'(template**)) (~let ([meta-var+args template]) . template**-dotted) (~sort key template ooo) (~loc stxloc . template) ;; Like (template . template), but discards the first and ;; keeps just the second. If the first contains pattern ;; variables which are repeated, this has the effect of ;; repeating the second as many times as the first. Example: ;; #'(vector (~each some-pattern-var '())) ;; => (vector '() '() '() '() '()) (~each template template) ;; escaped (ddd escaped) ;; ;; literal #t #f string bytes number char keyword regexp pregexp) (meta-var+args meta-var (meta-var meta-arg …)) (tail-template template) ;; specialize mid-sequence in repetition (diagonal-matrix-style) (variable identifier) (template**-dotted (template* … . template) template**) (template** (code:line template* …) (code:line template* … :: template) (code:line template* … (~rest . template))) (template* template (code:line template ooo) special-cased-template) (special-cased-template (code:line template vardd) (code:line template ddvar)) ;; Where var is an iterated variable. (vardd var.. ;; exclude the current iteration var...) ;; include the current iteration (ddvar ..var ;; exclude the current iteration ...var) ;; include the current iteration (ooo #,ellipses ;; TODO: make it a hyperlink ___ ..k ;; k positive integer __k ;; k positive integer (code:line .. expr) ;; expr must return a positive integer (code:line __ expr)) ;; expr must return a positive integer (ddd #,ellipses) ]]{ TODO: implement the versatile template library. @racket[...] TODO: support for typed/racket. TODO: optimization feature: would it be useful if the expanded code could be optimized? For example, when looking at the output of syntax-parse, the code is far from being concise. The patterns for @racket[parse] should all have a way to create a symmetric counterpart for @racket[tmpl], which produces the original value. This symmetry is important because it allows lens-like macros, which operate on only a part of the data structure, leaving everything else intact. @racket[??] works like @racket[??] from @racket[syntax/parse/experimental/template], except it allows any number of alternatives (including 0, to avoid special-casing in macros). It is more or less equivalent to @racket[(?? a (?? b (?? c …)))], following syntax/parse's semantics. @racket[?@] has the same meaning as in syntax/parse. @racket[(??@ t* …)] is a shortcut for @racket[(?? (?@ t* …))] For better compatibility with at-exp, @racket[|@if|] can be written @racket[if@], and the same goes for @racket[|@cond|] etc. TODO: what's the difference between @racket[~], @racket[template-expander] and @racket[unquote]? @racket[template-expander] runs at compile-time and should treat its arguments as syntax. Concerning unquoting, unlike @racket[racket]'s default behaviour in @RACKET[#'([x #,(y …)] …)], unquoting should not break the nesting of ellipses. How should we express voluntary variation of the level of nesting? @racket[~let] already allows expanding part of the template at some level and inserting it verbatim somewhere below, but it's not a silver bullet. One case which comes to mind is when some of the nested data should be mixed with less-nested data, for example going from @racket[([10 1 2 3] [100 4 5] [1000 6])] to @racket[([10 20 30] [400 500] [6000])] should be relatively easy to express. Maybe @racket[~let] with parameters can be a suitable generalized solution: @RACKET[({~let ([(addx v) #,(+ x v)]) [(addx y) …]} …)] The special-cased template syntax should allow special treatment of the @racket[i]-th iteration in a doubly-nested loop: matching @racket[x] on @racket[(1 2 3 4 5)], and using the template @racket[(0 x.. ,(* x x) ..x 1) …] will produce @racket[(1 1 1 1 1) (0 4 1 1 1) (0 0 9 1 1) (0 0 0 16 1) (0 0 0 0 24)]. The pattern before @racket[x..] and the pattern after @racket[..x] can expand to multiple items which will be spliced in by wrapping it with @racket[?@].} @section{Ideas for implementation} @subsection{Extensibility (expanders)} Allow normal, inline-prefix, inline-postfix and inline-infix expanders, which can bind using regular expressions. This allows implementing exotic syntax like @racket[var..] (postfix, operates on the pattern preceeding it), @racket[..var] (postfix, operates on the pattern after it), @racket[(… escaped-pattern)] (normal, operates on the containing s-exp) @subsection{Customization} For things that are likely to be customized by the user in the whole file scope, define a grammar/custom module, used as follows: @racketblock[(require grammar/custom) (grammar/custom option …)] The @racket[grammar/custom] macro expands to @racket[(require grammar/core)] followed by a bunch of @racket[define-syntax] which wrap the core macros, providing them the custom options: @racketblock[(require grammar/core) (define-syntax-rule (parse . rest) (parse/core #:global-options (option …) . rest)) (define-syntax-rule (tmpl . rest) (parse/core #:global-options (option …) . rest))] This can also be used to rename the @racket[parse] and @racket[tmpl] macros, if desired (for example, @racket[tmpl] could be renamed to @racket[quasisyntax], or something similar). Otherwise, @racket[grammar/custom] could just @racket[set!] some for-syntax variable which stores the options. A second boolean for-syntax variable could be used to check if @racket[grammar/custom] was called twice, and throw an error in that case. Or maybe we should just use units? Can they be customized in a similar way? The idea is to avoid having to wrap the whole file in a @racket[(parameterize …)], and be able to easily @racket[provide] a customized variation of this library: @racketblock[(provide (customized-out grammar/custom))] @subsection{Unsorted ideas} @subsubsection{Global pattern constraints} For patterns, have global constraints: @racket[(~global-or id)] binds @racket[id] to true if the enclosing pattern was matched at least once, and false otherwise. Multiple occurrences of the same @racket[(~global-or id)] make the @racket[id] true if any of the containing clauses was matched at least once. Inside a @racket[{~no-order}], it should be possible to impose some partial order constraints, so that we can say: @racketblock[ {~no-order {~optional pat-a} {~optional pat-b} pat-c {~optional {~constrain pat-d {~after pat-a}}}}] The above code means that @racket[pat-a], @racket[pat-b] and @racket[pat-d] are optional (but not @racket[pat-c]), the four patterns can appear in any order, but if @racket[pat-a] and @racket[pat-d] are both present, then @racket[pat-d] must appear after @racket[pat-a]. Scopes: the global constraints apply within a scope. By default, there is an implicit top-level scope, and some forms might implicitly introduce a catch-all scope unless otherwise specified, like the implicit @racket[~demimit-cut] for @racket[define-syntax-class] from @racket[syntax/parse]. There could be two kinds of scopes: unhygienic catch-all scopes which scope all "global" constraints within, and naming scopes, which explicitly say which identifiers they scope. @racketblock[ {~scope {a} {~vector {~scope {b} {~no-order {~once a} {~optional b}}} {~scope {b} {~no-order {~once a} {~optional b}}}}}] The code above matches against a vector of two @racket[~no-order] lists. The @racket[a] pattern must appear exactly once, either in the first list or in the second, but not in both. On the other hand, the @racket[b] pattern may appear zero or one time in the first list, zero or one time in the second list, and may appear in both since its constraint is scoped for each list. Although it is less clear, the following code is semantically identical: @racketblock[ {~scope {a b} {~vector {~no-order {~once a} {~optional b}} {~scope {b} {~no-order {~once a} {~optional b}}}}}] Since the @racket[b] in the @racket{~no-order} is bound to the enclosing @racket[{~scope {b} …}], it does not interact in any way with the outer scope. The @racket[~optional] constraint on the @racket[b] in the first @racket[~no-order] therefore does not interact withe the @racket[~optional] constraint in the second @racket[~no-order]. @subsubsection{Generalization of pattern/template kinds} Nearly all patterns and templates should work equally well for regular lists and syntax objects. It should be possible and easy enough to create new "kinds" of data, which modify how patterns and templates work all the way through the pattern or template tree, until it switches to a new kind. As an example, the following pattern starts as a normal s-expr pattern, and switches to syntax in two nodes: @racketblock[ {~s-expr 1 2 (buckle {~optional my} shoe) 3 4 {~syntax (knock {~optional at the} door)} 5 6 (pick {~optional-wrap (up _) (sticks)}) 7 8 {~syntax (lay {~optional-wrap (them _) (straight)})}}] That pattern should match the following value: @racketblock[ `(1 2 (buckle shoe) 3 4 ,#'(knock door) 5 6 (pick (up (sticks))) 7 8 ,#'(lay (them (straight))))] The @racket[~syntax] indicates that the whole subtree should start matching (or producing) syntax objects, instead of regular s-expressions. It is worht noting that syntax objects have extra information (source location, syntax properties) that regular s-expressions lack. One way of implementing this would be to make the pattern directives operate on "enhanced" s-expressions. Enhanced s-expressions are s-expressions with arbitrary kind-specific data attached to them. The @racket[~s-expr] simply translates s-expressions into enhanced s-expressions with an empty data attached, while @racket[~syntax] is a sort of pre-processor which turns syntax objects into enhanced s-expressions with source location and syntax properties attached. These "kind" pre-processors run before the normal pattern directives are applied. Some kind-specific pattern directives can access those properties (if they are used in within the scope of the appropriate @racket[~kind]), so that a @racket[(~loc srcloc . pattern)] matches @racket[pattern] and saves its source location into the variable @racket[srcloc]. Kinds should also be able to alter how the pattern variables are bound: @racket[~s-expr] simply binds (in patterns) and uses (in templates) normal Racket variables. On the other hand, @racket[~syntax] binds and uses syntax pattern variables, so that the bound variables are used as @racket[#'var] instead of @racket[var]. Different pattern and template forms can specify a default kind (possibly by simply wrapping their pattern or tempalte with the appropriate @racket[~kind]). For example, a @racket[define/match] form would use @racket[~s-expr] by default, whereas a @racket[define-syntax/match] would use @racket[~syntax]. The same would apply for re-implementations of Racket's @racket[match] and @racket[syntax-parse]. Do the "kinds" form some sort of monad? TODO: Think about this, and try to see if there are some monads which can be translated to pattern/template kinds usefully. @subsubsection{Lenses} It should be possible to describe lenses using the patterns: you can work on the focused part of the match, possibly access (read-only) other parts, and return a new value. What should happen when the focused part is under an ellipsis and has more than one match ? Implicitly execute the code n times, like a sort of @racket[for/list]? @subsubsection{Backtracking} Since the parser may need to backtrack, we need to expose the backtracking mechanism to the user in some way, so that the user can: @itemlist[ @item{Cut the current branch} @item{Perform some side-effects and undo them when backtracking (dangerous)} @item{Record a side-effectful lambda which is executed when the match succeeds or when the current branch is @racket[~commit]ted.} @item{Querry information about the previously failed branches} @item{Maybe affect the order in which non-deterministic branches are taken. This feature would mainly be used by optimizers. As a toy "just because we can" example, the backtracking mechanism should be configurable enough that some CSP algorithm like AC2003 can be expressed by the user, turning the pattern library into a CSP solver (where the CSP problem is expressed as a pattern over an empty object). Another toy "just because we can" example would be a datalog implementation built upon this library, where the deduction rules are expressed as patterns. The goal is that the parser's backtracking mechanism should be modular enough to allow us to implement a dead-simple unoptimized backtracker, and allow optimizers to be written as plug-ins. For example, an optimiazer could statically detect branches that can be cut due to a prior failure (e.g. if the two-element-list pattern @racket[(foo:id bar:number)] failed because the first element was not an @racket[identifier?], there's no point in trying @racket[(baz:id quux:string fuzz:number)] on the same term. Extensive configurability of the backtracking mechanism and optimization features may interact badly with partial application and partial compilation, see below. Think it through before giving too much or too little expressivity to the user.}] @subsubsection{Partial application} It should be possible to give a partial input with holes to a pattern or template form, and, for optimization purposes, request that the pattern or template processes the input as much as it can (for the parser, it would potentially open a bounded number of backtracking branches, ready to switch to the next one if one fails), leaving an efficient "continuation". @subsubsection{Partial compilation} One of the drawbacks of @racketmodname[syntax/parse] is that compiling a @racket[syntax-parse] form takes some non-negligible time. This means that if a macro generates another macro, and the generated macro code uses syntax-parse, each call to the "generator" macro will be expensive. A complex macro generating syntax which contains hundreds of uses of syntax-case will be reasonnably fast. The same code using syntax-parse will be much slower. Since the generated uses of @racket[syntax-parse] will all have the same "shape" with a few identifiers etc. changing, it would be nice to be able to partially pre-expand a use of @racket[syntax-parse], leaving only the "holes" to be expanded. With a bottom-up expansion mechanism there's not much to do, so we have to try hard to make the pattern / template expander top-down as much as possible, and/or use a lazy language (for which most things can be evaluated, leaving a continuation for the few things that actually depend on the holes). Although partial compilation sounds like a very interesting academic project, it might be too difficult to get something useful out of it in practice. An alternative, which would procude the sought performance benefits for macros generating code which uses the pattern/template library, would be to make as many of the concepts first-class, so that they can easily be supplied as a parameter. Note that firs-class in this case does not necessarily mean "run-time first-class", but possibly "compile-time first-class": we only need to be able to pre-declare parametric templates, then use them in the code generated by a macro. As long as the parametric templates support a form of "separate compilation" and optimization, filling in the parameters can be handled by a fast macro. Some of the optimization plug-ins may however rely on a closed-world assumption (i.e. they want to have the whole, final pattern or template, in order to optimize it). If such an optimization plug-in is used, we may have to fall back to the idea of using partial compilation, or simply accept that macros which generate such code will take a while to expand. @subsubsection{QuickCheck test generation} It should be possible to generate random data that matches (and does not match, too, that's a distinct problem) a pattern (unless there's a user-provided predicate that is opaque to the library, in which case we can just ignore it and generate instances at random, hoping that some will match and some won't). Combined with the fact that pattern directives should be reversible into template directives, and vica versa, it means that each directive should also express its set of accepted values in terms of its contents. Of course, we don't expect to be able to uniformly sample random instances, nor do we expect to be able to support in a useful way complex patterns with lots of opaque predicates. @subsubsection{Error messages} @racketmodname[syntax/parse] generates good error messages, but it does not work as well when the patterns become complex. Think this through, so that the annotation burden is minimal, and so that users don't have to think too hard about where to put a @racket[~describe] (I frequently had the problem with @racket[syntax/parse] where I wrote a @racket[~describe], but it wasn't taken into account. @subsection{Things to look at} @itemlist[ @item{@racket[math/arry], for @racket[::] and array broadcasting.} @item{Quasipatterns in @racket[match].} @item{The @racket[lens] library} @item{@url{https://github.com/racket/racket/issues/1304} non-linear matching (with repeated binding variables, for example, that should be eq? or equal?)}]