phc-toolkit/scribblings/template.scrbl
2017-04-27 23:38:55 +02:00

493 lines
21 KiB
Racket

#lang scribble/manual
@(require (for-label typed/racket/base
syntax/parse
;"template.rkt"
))
@(define ellipses (racket ...))
@title[#:tag "template-lib"]{Versatile parser and template library}
Keywords: grammar, parser, template.
@defform[(parse expr [pattern body ] )]{
Analogous to @racket[syntax-parse], except it isn't
specialized for syntax, but rather works for arbitrary
s-expressions, including syntax ones (denoted by
@racket[#'()] in the pattern).}
@defform[#:literals (: :: ... else struct)
(tmpl template)
#:grammar
[(template variable
[variable : type] ;; (ann variable type)
;; cons-template
(template . template)
(template :: template)
;; list
(template**)
;; list*
template**-dotted
;; vector
#(template**)
(vector . template**-dotted)
;; hash-template: template** must expand to a list of pairs.
(hash . template**-dotted) ;; TODO: how to distinguish
(hasheq . template**-dotted) ;; mutable and immutable?
(hasheqv . template**-dotted)
#hash([template . template])
#hasheq([template . template])
#hasheqv([template . template])
;; struct-template
(struct-id template )
(struct struct-id template )
#s(prefab-id template )
#s(template template ) ;; Only allowed in untyped racket
;; box
#&template
;; call-template
(~identifier args ) ;; calls (identifier args …)
(~ expr args ) ;; calls (expr args …)
;; unquote-template
,expr
,@(list expr)
,@(list* expr) ;; must appear in last position.
;; template-expander
template-expander-id
(template-expander-id args )
;; maybe-template (should all be template expanders
;; which means the system is extensible enough to express
;; these special cases).
(?? alt-template )
(?@ . template**-dotted)
(??@ . template**-dotted)
(?if condition template template)
(|@if| condition template template)
(if@ condition template template)
(|@cond| [condition template] )
(|@cond| [condition template] [else template])
(cond@ condition template template)
;; like #,@(with-syntax ([meta-var #'template])
;; #'(template**))
(~let ([meta-var+args template])
. template**-dotted)
(~sort key template ooo)
(~loc stxloc . template)
;; Like (template . template), but discards the first and
;; keeps just the second. If the first contains pattern
;; variables which are repeated, this has the effect of
;; repeating the second as many times as the first. Example:
;; #'(vector (~each some-pattern-var '()))
;; => (vector '() '() '() '() '())
(~each template template)
;; escaped
(ddd escaped)
;;
;; literal
#t
#f
string
bytes
number
char
keyword
regexp
pregexp)
(meta-var+args meta-var
(meta-var meta-arg ))
(tail-template template)
;; specialize mid-sequence in repetition (diagonal-matrix-style)
(variable identifier)
(template**-dotted (template* . template)
template**)
(template** (code:line template* )
(code:line template* :: template)
(code:line template* (~rest . template)))
(template* template
(code:line template ooo)
special-cased-template)
(special-cased-template (code:line template vardd)
(code:line template ddvar))
;; Where var is an iterated variable.
(vardd var.. ;; exclude the current iteration
var...) ;; include the current iteration
(ddvar ..var ;; exclude the current iteration
...var) ;; include the current iteration
(ooo #,ellipses ;; TODO: make it a hyperlink
___
..k ;; k positive integer
__k ;; k positive integer
(code:line .. expr) ;; expr must return a positive integer
(code:line __ expr)) ;; expr must return a positive integer
(ddd #,ellipses)
]]{
TODO: implement the versatile template library.
@racket[...]
TODO: support for typed/racket.
TODO: optimization feature: would it be useful if the
expanded code could be optimized? For example, when looking
at the output of syntax-parse, the code is far from being
concise.
The patterns for @racket[parse] should all have a way to
create a symmetric counterpart for @racket[tmpl], which
produces the original value. This symmetry is important
because it allows lens-like macros, which operate on only a
part of the data structure, leaving everything else
intact.
@racket[??] works like @racket[??] from
@racket[syntax/parse/experimental/template], except it
allows any number of alternatives (including 0, to avoid
special-casing in macros). It is more or less equivalent to
@racket[(?? a (?? b (?? c )))], following syntax/parse's
semantics.
@racket[?@] has the same meaning as in syntax/parse.
@racket[(??@ t* )] is a shortcut for
@racket[(?? (?@ t* ))]
For better compatibility with at-exp, @racket[|@if|] can be
written @racket[if@], and the same goes for
@racket[|@cond|] etc.
TODO: what's the difference between @racket[~],
@racket[template-expander] and @racket[unquote]?
@racket[template-expander] runs at compile-time and should
treat its arguments as syntax.
Concerning unquoting, unlike @racket[racket]'s default
behaviour in @RACKET[#'([x #,(y )] )], unquoting should
not break the nesting of ellipses. How should we express
voluntary variation of the level of nesting? @racket[~let]
already allows expanding part of the template at some level
and inserting it verbatim somewhere below, but it's not a
silver bullet. One case which comes to mind is when some of
the nested data should be mixed with less-nested data, for
example going from
@racket[([10 1 2 3] [100 4 5] [1000 6])] to
@racket[([10 20 30] [400 500] [6000])] should be relatively
easy to express. Maybe @racket[~let] with parameters can be
a suitable generalized solution:
@RACKET[({~let ([(addx v) #,(+ x v)]) [(addx y) ]} )]
The special-cased template syntax should allow special
treatment of the @racket[i]-th iteration in a doubly-nested
loop: matching @racket[x] on @racket[(1 2 3 4 5)], and
using the template @racket[(0 x.. ,(* x x) ..x 1) ] will
produce @racket[(1 1 1 1 1)
(0 4 1 1 1)
(0 0 9 1 1)
(0 0 0 16 1)
(0 0 0 0 24)]. The pattern before
@racket[x..] and the pattern after @racket[..x] can expand
to multiple items which will be spliced in by wrapping it
with @racket[?@].}
@section{Ideas for implementation}
@subsection{Extensibility (expanders)}
Allow normal, inline-prefix, inline-postfix and inline-infix
expanders, which can bind using regular expressions. This
allows implementing exotic syntax like @racket[var..]
(postfix, operates on the pattern preceeding it),
@racket[..var] (postfix, operates on the pattern after it),
@racket[( escaped-pattern)] (normal, operates on the
containing s-exp)
@subsection{Customization}
For things that are likely to be customized by the user in
the whole file scope, define a grammar/custom module, used
as follows:
@racketblock[(require grammar/custom)
(grammar/custom option )]
The @racket[grammar/custom] macro expands to
@racket[(require grammar/core)] followed by a bunch of
@racket[define-syntax] which wrap the core macros, providing
them the custom options:
@racketblock[(require grammar/core)
(define-syntax-rule (parse . rest)
(parse/core #:global-options (option ) . rest))
(define-syntax-rule (tmpl . rest)
(parse/core #:global-options (option ) . rest))]
This can also be used to rename the @racket[parse] and
@racket[tmpl] macros, if desired (for example,
@racket[tmpl] could be renamed to @racket[quasisyntax], or
something similar).
Otherwise, @racket[grammar/custom] could just @racket[set!]
some for-syntax variable which stores the options. A second
boolean for-syntax variable could be used to check if
@racket[grammar/custom] was called twice, and throw an error
in that case.
Or maybe we should just use units? Can they be customized in
a similar way?
The idea is to avoid having to wrap the whole file in a
@racket[(parameterize )], and be able to easily
@racket[provide] a customized variation of this library:
@racketblock[(provide (customized-out grammar/custom))]
@subsection{Unsorted ideas}
@subsubsection{Global pattern constraints}
For patterns, have global constraints: @racket[(~global-or id)] binds
@racket[id] to true if the enclosing pattern was matched at least once, and
false otherwise. Multiple occurrences of the same @racket[(~global-or id)] make
the @racket[id] true if any of the containing clauses was matched at least
once.
Inside a @racket[{~no-order}], it should be possible to impose some partial
order constraints, so that we can say:
@racketblock[
{~no-order
{~optional pat-a}
{~optional pat-b}
pat-c
{~optional {~constrain pat-d {~after pat-a}}}}]
The above code means that @racket[pat-a], @racket[pat-b] and @racket[pat-d] are
optional (but not @racket[pat-c]), the four patterns can appear in any order,
but if @racket[pat-a] and @racket[pat-d] are both present, then @racket[pat-d]
must appear after @racket[pat-a].
Scopes: the global constraints apply within a scope. By default, there is an
implicit top-level scope, and some forms might implicitly introduce a catch-all
scope unless otherwise specified, like the implicit @racket[~demimit-cut] for
@racket[define-syntax-class] from @racket[syntax/parse]. There could be two
kinds of scopes: unhygienic catch-all scopes which scope all "global"
constraints within, and naming scopes, which explicitly say which identifiers
they scope.
@racketblock[
{~scope {a}
{~vector
{~scope {b} {~no-order {~once a} {~optional b}}}
{~scope {b} {~no-order {~once a} {~optional b}}}}}]
The code above matches against a vector of two @racket[~no-order] lists. The
@racket[a] pattern must appear exactly once, either in the first list or in the
second, but not in both. On the other hand, the @racket[b] pattern may appear
zero or one time in the first list, zero or one time in the second list, and may
appear in both since its constraint is scoped for each list. Although it is less
clear, the following code is semantically identical:
@racketblock[
{~scope {a b}
{~vector
{~no-order {~once a} {~optional b}}
{~scope {b} {~no-order {~once a} {~optional b}}}}}]
Since the @racket[b] in the @racket{~no-order} is bound to the enclosing
@racket[{~scope {b} }], it does not interact in any way with the outer scope.
The @racket[~optional] constraint on the @racket[b] in the first
@racket[~no-order] therefore does not interact withe the @racket[~optional]
constraint in the second @racket[~no-order].
@subsubsection{Generalization of pattern/template kinds}
Nearly all patterns and templates should work equally well for regular lists and
syntax objects. It should be possible and easy enough to create new "kinds" of
data, which modify how patterns and templates work all the way through the
pattern or template tree, until it switches to a new kind. As an example, the
following pattern starts as a normal s-expr pattern, and switches to syntax in
two nodes:
@racketblock[
{~s-expr 1 2 (buckle {~optional my} shoe)
3 4 {~syntax (knock {~optional at the} door)}
5 6 (pick {~optional-wrap (up _) (sticks)})
7 8 {~syntax (lay {~optional-wrap (them _) (straight)})}}]
That pattern should match the following value:
@racketblock[
`(1 2 (buckle shoe)
3 4 ,#'(knock door)
5 6 (pick (up (sticks)))
7 8 ,#'(lay (them (straight))))]
The @racket[~syntax] indicates that the whole subtree should start matching (or
producing) syntax objects, instead of regular s-expressions. It is worht noting
that syntax objects have extra information (source location, syntax properties)
that regular s-expressions lack. One way of implementing this would be to make
the pattern directives operate on "enhanced" s-expressions. Enhanced
s-expressions are s-expressions with arbitrary kind-specific data attached to
them. The @racket[~s-expr] simply translates s-expressions into enhanced
s-expressions with an empty data attached, while @racket[~syntax] is a sort of
pre-processor which turns syntax objects into enhanced s-expressions with source
location and syntax properties attached. These "kind" pre-processors run before
the normal pattern directives are applied. Some kind-specific pattern directives
can access those properties (if they are used in within the scope of the
appropriate @racket[~kind]), so that a @racket[(~loc srcloc . pattern)] matches
@racket[pattern] and saves its source location into the variable
@racket[srcloc].
Kinds should also be able to alter how the pattern variables are bound:
@racket[~s-expr] simply binds (in patterns) and uses (in templates) normal
Racket variables. On the other hand, @racket[~syntax] binds and uses syntax
pattern variables, so that the bound variables are used as @racket[#'var]
instead of @racket[var].
Different pattern and template forms can specify a default kind (possibly by
simply wrapping their pattern or tempalte with the appropriate @racket[~kind]).
For example, a @racket[define/match] form would use @racket[~s-expr] by default,
whereas a @racket[define-syntax/match] would use @racket[~syntax]. The same
would apply for re-implementations of Racket's @racket[match] and
@racket[syntax-parse].
Do the "kinds" form some sort of monad? TODO: Think about this, and try to see
if there are some monads which can be translated to pattern/template kinds
usefully.
@subsubsection{Lenses}
It should be possible to describe lenses using the patterns: you can work on
the focused part of the match, possibly access (read-only) other parts, and
return a new value. What should happen when the focused part is under an
ellipsis and has more than one match ? Implicitly execute the code n times, like
a sort of @racket[for/list]?
@subsubsection{Backtracking}
Since the parser may need to backtrack, we need to expose the backtracking
mechanism to the user in some way, so that the user can:
@itemlist[
@item{Cut the current branch}
@item{Perform some side-effects and undo them when backtracking (dangerous)}
@item{Record a side-effectful lambda which is executed when the match succeeds
or when the current branch is @racket[~commit]ted.}
@item{Querry information about the previously failed branches}
@item{Maybe affect the order in which non-deterministic branches are taken.
This feature would mainly be used by optimizers.
As a toy "just because we can" example, the backtracking mechanism should be
configurable enough that some CSP algorithm like AC2003 can be expressed by
the user, turning the pattern library into a CSP solver (where the CSP problem
is expressed as a pattern over an empty object). Another toy "just because we
can" example would be a datalog implementation built upon this library, where
the deduction rules are expressed as patterns.
The goal is that the parser's backtracking mechanism should be modular enough
to allow us to implement a dead-simple unoptimized backtracker, and allow
optimizers to be written as plug-ins. For example, an optimiazer could
statically detect branches that can be cut due to a prior failure (e.g. if the
two-element-list pattern @racket[(foo:id bar:number)] failed because the first
element was not an @racket[identifier?], there's no point in trying
@racket[(baz:id quux:string fuzz:number)] on the same term.
Extensive configurability of the backtracking mechanism and optimization
features may interact badly with partial application and partial compilation,
see below. Think it through before giving too much or too little expressivity
to the user.}]
@subsubsection{Partial application}
It should be possible to give a partial input with holes to a pattern or
template form, and, for optimization purposes, request that the pattern or
template processes the input as much as it can (for the parser, it would
potentially open a bounded number of backtracking branches, ready to switch to
the next one if one fails), leaving an efficient "continuation".
@subsubsection{Partial compilation}
One of the drawbacks of @racketmodname[syntax/parse] is that compiling a
@racket[syntax-parse] form takes some non-negligible time. This means that if a
macro generates another macro, and the generated macro code uses syntax-parse,
each call to the "generator" macro will be expensive. A complex macro generating
syntax which contains hundreds of uses of syntax-case will be reasonnably fast.
The same code using syntax-parse will be much slower. Since the generated uses
of @racket[syntax-parse] will all have the same "shape" with a few identifiers
etc. changing, it would be nice to be able to partially pre-expand a use of
@racket[syntax-parse], leaving only the "holes" to be expanded. With a bottom-up
expansion mechanism there's not much to do, so we have to try hard to make the
pattern / template expander top-down as much as possible, and/or use a lazy
language (for which most things can be evaluated, leaving a continuation for the
few things that actually depend on the holes).
Although partial compilation sounds like a very interesting academic project,
it might be too difficult to get something useful out of it in practice. An
alternative, which would procude the sought performance benefits for macros
generating code which uses the pattern/template library, would be to make as
many of the concepts first-class, so that they can easily be supplied as a
parameter. Note that firs-class in this case does not necessarily mean "run-time
first-class", but possibly "compile-time first-class": we only need to be able
to pre-declare parametric templates, then use them in the code generated by a
macro. As long as the parametric templates support a form of "separate
compilation" and optimization, filling in the parameters can be handled by a
fast macro.
Some of the optimization plug-ins may however rely on a closed-world assumption
(i.e. they want to have the whole, final pattern or template, in order to
optimize it). If such an optimization plug-in is used, we may have to fall back
to the idea of using partial compilation, or simply accept that macros which
generate such code will take a while to expand.
@subsubsection{QuickCheck test generation}
It should be possible to generate random data that matches (and does not match,
too, that's a distinct problem) a pattern (unless there's a user-provided
predicate that is opaque to the library, in which case we can just ignore it and
generate instances at random, hoping that some will match and some won't).
Combined with the fact that pattern directives should be reversible into
template directives, and vica versa, it means that each directive should also
express its set of accepted values in terms of its contents. Of course, we don't
expect to be able to uniformly sample random instances, nor do we expect to be
able to support in a useful way complex patterns with lots of opaque predicates.
@subsubsection{Error messages}
@racketmodname[syntax/parse] generates good error messages, but it does not
work as well when the patterns become complex. Think this through, so that the
annotation burden is minimal, and so that users don't have to think too hard
about where to put a @racket[~describe] (I frequently had the problem with
@racket[syntax/parse] where I wrote a @racket[~describe], but it wasn't taken
into account.
@subsection{Things to look at}
@itemlist[
@item{@racket[math/arry], for @racket[::] and array
broadcasting.}
@item{Quasipatterns in @racket[match].}
@item{The @racket[lens] library}
@item{@url{https://github.com/racket/racket/issues/1304}
non-linear matching (with repeated binding variables, for
example, that should be eq? or equal?)}]