diff --git a/icfp-2016/bib.rkt b/icfp-2016/bib.rkt index e6bcaf7..b7a52ab 100644 --- a/icfp-2016/bib.rkt +++ b/icfp-2016/bib.rkt @@ -32,7 +32,7 @@ (define/short popl "POPL" (string-append ACM Symposium "on Principles of Programming Languages")) (define/short icse "ICSE" "International Conference on Software Engineering") (define/short lncs "LNCS" "Lecture Notes in Computer Science") -(define/short sigmod "SIGMOD" (string-append ACM "SIGMOD" International Conference "on Management of Data")) +(define/short sigmod "SIGMOD" (string-append ACM "SIGMOD " International Conference "on Management of Data")) (define/short sigplan-notices "SIGPLAN Notices" (string-append ACM "SIGPLAN Notices")) (define/short scheme-workshop "SFP" (string-append "Scheme and Functional Programming Workshop")) (define/short jfp "JFP" (string-append Journal "Functional Programming")) @@ -1123,3 +1123,11 @@ #:author (authors "Erik Meijer" "Brain Beckman" "Gavin Bierman") #:location (proceedings-location sigmod #:pages '(706 706)) #:date 2006)) + +(define c-dissertation-2010 + (make-bib + #:title "Refining Syntactic Sugar: Tools for Supporting Macro Development" + #:author "Ryan Culpepper" + #:location (dissertation-location #:institution "Northeastern University") + #:date 2010)) + diff --git a/icfp-2016/fig-staging.tex b/icfp-2016/fig-staging.tex index ccb16a3..17edadb 100644 --- a/icfp-2016/fig-staging.tex +++ b/icfp-2016/fig-staging.tex @@ -1,3 +1,11 @@ +%% TODO +%% draw 'barriers' between my analysis & typed racket +%% | | +%% ME | | +%% + | TR | +%% TR | | +%% | | + \begin{center} \begin{tikzpicture} \node (0) {$\bullet$}; diff --git a/icfp-2016/fig-stats.tex b/icfp-2016/fig-stats.tex index 1cfd3ea..8a785bc 100644 --- a/icfp-2016/fig-stats.tex +++ b/icfp-2016/fig-stats.tex @@ -1,14 +1,13 @@ \newcommand{\twoline}[2]{\parbox[s]{1.44cm}{\flushright\hfill #1\newline#2}} -\newcommand{\mod}[1]{$\mathsf{#1}$} \begin{tabular}{l r r r} - Module & LOC & $\interp$ & $\trans$ \\\hline - \mod{db} & 263 & 2 (78) & 2 (101) \\ - \mod{format} & 66 & 1 (33) & 1 (21) \\ - \mod{function} & 31 & 1 (8) & 1 (11) \\ - \mod{math} & 90 & 1 (3) & 5 (46) \\ - \mod{regexp} & 122 & 4 (60) & 5 (33) \\ - \mod{vector} & 228 & 1 (19) & 13 (163) \\\hline - {\bf Total} & 800 & 10 (201) & 27 (375) \\ + Module & LOC & $\interp$ & $\trans$ \\\hline + \mod{db} & 263 & 2 (78) & 2 (101) \\ + \mod{format} & 66 & 1 (33) & 1 \,~(21) \\ + \mod{function} & 31 & 1 ~~(8) & 1 \,~(11) \\ + \mod{math} & 90 & 1 ~~(3) & 5 \,~(46) \\ + \mod{regexp} & 122 & 6 (60) & 5 \,~(33) \\ + \mod{vector} & 228 & 1 (19) & 13 (163) \\\hline + {\bf Total} & 800 & 12 (201) & 27 (375) \\ \end{tabular} %% AVG % loc : 228.5 diff --git a/icfp-2016/implementation.scrbl b/icfp-2016/implementation.scrbl index f8b7683..e65b96a 100644 --- a/icfp-2016/implementation.scrbl +++ b/icfp-2016/implementation.scrbl @@ -1,37 +1,394 @@ #lang scribble/sigplan +@require["common.rkt"] -@title{Implementation} - -@; - syntax parse -@; - -@; - identifier macros -@; - let-bindings, make=rename-transformer -@; - define-bindings, free-id-table -@; - local expand -@; - phasing (we have + at all levels) - -@; We don't re-implement format or regexp, -@; but we do implement + and some vector operations, to go faster +@title[#:tag "sec:implementation"]{Implementation} +@; Amazing Macros +@; Whirlwind tour +@; Accounting, taking stock +@; Alas I knew him well +@; TODO this is all so boring right now, need to revise heavily +@Figure-ref{fig:stats} gives a few statistics regarding our implementation. +The purpose of this section is to explain why the numbers are low. +@; Ode to macros, implementation a symphony - - -@section{Correctness} - -Our implementation of @racket[format] in @todo{figure-ref} exhibits a few desirable properties. -@itemlist[ - @item{} +@figure["fig:stats" + "Quantifying the implementation" + @exact|{\input{fig-stats}}| ] -Generally speaking these properties are the ``right'' way to judge if a transformation is correct. +In total, our six applications comprise 800 lines of code (LOC). +Another 145 lines implement common functionality, putting the grand total + just under 1000 LOC. + +Except for @exact|{\mod{db}}| and @exact|{\mod{regexp}}|, each of the + core modules defines a single function in @exact|{$\interp$}|. +In @exact|{\mod{db}}| the two functions are the schema predicate and @tt{SQL} + query parser (we omit the trivial interpreter for connections). + @; TODO no parentheses? +On the other hand, @exact|{\mod{regexp}}| implements six group-parsing functions + to match the six string-like input types + @;@note{Strings, Regex literals, Posix Regex literls, and byte-string variations of each.} + accepted by Racket's @racket[regexp-match]. +These group parsers, however, share a 33-line kernel. +Incidentally, the average size of all value-interpreting functions is 33 LOC. +The smallest interpreter is the composition of Racket's @racket[number?] predicate + with the identity interpretation in @exact|{\mod{math}}| (3 LOC). +The largest is the query parser (35 LOC), though the analyses for + format strings and regular expressions are approximately the same size. + +The @exact{$\trans$} functions are aliases for standard library procedures. + @; TODO much better to show off short code. But let's draft it first. +In many cases we are able to re-use code between similar functions. +For instance, the arithmetic operators @racket[+ - * /] are implemented by + a single fold. + + +@; ----------------------------------------------------------------------------- +@section{Implementing Transformations} @; TODO not a great name +@; @section{Ode to Macros: The Long Version} + +At this point, we have carried on long enough talking about the implementation + without actually showing any code. +No longer---here is our definition of @racket[vector-length]: + +@codeblock{ + (make-alias #'vector-length + (syntax-parser + [(_ v:vector/length) + #''v.evidence] + [_ #false])) +} + +First of all, this transformation works as specified in @Secref{sec:vector}. +When the length of its argument is known, it expands to that length. +Otherwise, it expands to an ordinary call to @racket[vector-length]. + +Second, we need to introduce a few mysterious characters: +@itemlist[ + @item{ + @racket[(make-alias id f)] creates a transformation from an identifier @racket[id] + and a partial function @racket[f]. + } + @item{ + The symbol @tt{#'} creates a syntax object from a value or template. + } + @item{ + A @racket[syntax-parser] is a match statement over syntactic patterns. + This parser recognizes two cases: application to a single argument via + the pattern @racket[(_ v:vector/length)] and anything else with the + wildcard @tt{_}. + } + @item{ + The colon character (@tt{:}) used int @racket[v:vector/length] + binds the variable @racket[v] to the @emph{syntax class} @racket[vector/length]. + } + @item{ + The dot character (@tt{.}) accesses an @emph{attribute} of the value bound + to @racket[v]. + In this case, the attribute @racket[evidence] is set when + @racket[vector/length] matches successfully. + } +] + +Third, we remark that the pattern @racket[v:vector/length] unfolds all + transformations to @racket[v] recursively. +So we handle each of the following cases, as well as any other combination of + length-preserving vector operations. + +@racketblock[ +> '(vector-length #(H I)) +2 +> '(vector-length (vector-append #(Y O) + #(L O))) +4 +] + +@; The general features are explained in greater deteail below + +@; make-alias +@; TODO variable name for f +The overall structure of @racket[vector-length] is common to many of our transformations. +That is, we define a rule to handle an interesting syntactic pattern and + then generate an alias from the rule using the helper function @racket[make-alias]. + +@codeblock{ + (define ((make-alias orig-id f) stx) + (or (f stx) + (syntax-parse stx + [_:id + orig-id] + [(_ e* ...) + #`(#,orig-id e* ...)]))) +} + +The transformation defined by @racket[(make-alias id f)] is a function on + syntax objects. +First, the function applies @racket[f] to the syntax object @racket[stx]. +If the result is not + @racket[#false] we return. +Otherwise the function matches its argument against two possible patterns: +@itemize[ + @item{ + @tt{_:id} recognizes identifiers with the built-in syntax class @racket[id]. + When this pattern succeeds, we return the aliased @racket[orig-id]. + } + @item{ + @racket[(_ e* ...)] matches function application. + In the result of this branch, + @; TODO backtick not printing right + we declare a syntax template with @tt{#`} and splice the identifier + @racket[orig-id] into the template with @tt{#,}. + These operators are formally known as @racket[quasisyntax] and @racket[unsyntax]; + you may know their cousins @racket[quasiquote] and @racket[unquote]. + } +] + +@emph{Note:} the identifier @racket[...] is not pseudocode! +In a pattern, it captures zero-or-more repetitions of the preceding pattern---in + this case, the variable @racket[e*] binds anything so @racket[(_ e* ...)] matches + lists with at least one element.@note{The name @racket[e*] is our own convention.} +All but the first element of such a list is then bound to the identifier + @racket[e*] in the result. +We use @racket[...] in the result to flatten the contents of @racket[e*] into + the final expression. + +One last example transformation using @racket[make-alias] + is our definition of @racket[vector-ref], shown below. +When given a sized vector @racket[v] and an expression @racket[e] that + expands to a number @racket[i], the function asserts that @racket[i] is + in bounds. +If either @racket[vector/length] or @racket[expr->num] fail to coerce numeric + values, the function returns @racket[#false]. + +@codeblock{ + (make-alias #'vector-ref + (syntax-parser + [(_ v:vector/length e) + (let ([i (expr->num #'e)]) + (if i + (if (< i (syntax->datum #'v.evidence)) + #`(unsafe-vector-ref v.expanded '#,i) + (raise-vector-bounds-error #'v i)) + #false))] + [_ #false])) +} + +Unlike the previous two functions, our @racket[vector-ref] transformation + does more than just matching a pattern and returning a new syntax object. +Crucially, it compares the @emph{value} used to index its argument vector with + that vector's length before choosing how to expand. +To access these integer values outside of a template, we lift the pattern variables + @racket[v] and @racket[e] to syntax objects with a @tt{#'} prefix. +A helper function @racket[expr->num] then fully expands the syntax object @racket[#'e] + and the built-in @racket[syntax->datum] gets the integer value stored at the + attribute @racket[#'v.evidence]. + +Programming in this style is similar to the example-driven explanations we gave + in @Secref{sec:usage}. +The interesting design challenge is making one pattern that covers all + relevant cases and one algorithm to uniformly derive the correct result. + + +@; ============================================================================= +@section{Implementing Interpretations} @; TODO a decidedly bad name + +By now we have seen two useful syntax classes: @racket[id] and @racket[vector/length]. +In fact, we use syntax classes as the front-end for each function in @exact{$\interp$}. +@Figure-ref{fig:stxclass} lists all of our syntax classes and ties each to a purpose + motivated in @Secref{sec:usage}.@note{The name @racket[vector/length] should + be read as ``vector @emph{with} length information''.} + +@figure["fig:stxclass" + "Registry of syntax classes" + @exact|{\input{fig-stxclass}}| +] + +These classes are implemented uniformly from predicates on syntax objects. +One such predicate is @racket[arity?], shown below, which counts + the parameters accepted by an uncurried anonymous function and returns + @racket[#false] for all other inputs. + +@codeblock{ + (define arity? + (syntax-parser #:literals (λ) + [(λ (x*:id ...) e* ...) + (length (syntax->datum #'(x* ...)))] + [_ #f])) +} + +The syntax class @racket[procedure/arity] is then defined as ... + +@racketblock[ +> (define-stxclass/pred procedure/arity + arity?) +] + +... in terms of another macro, which handles the routine work of recursively + expanding its input, applying the @racket[arity?] predicate, + and caching results in the @racket[evidence] and @racket[expanded] attributes. + +@codeblock{ + (define-syntax-rule (define-stxclass/pred id p?) + (define-syntax-class id + #:attributes (evidence expanded) + (pattern e + #:with e+ (expand-expr #'e) + #:with p+ (p? #'e+) + #:when (syntax->datum #'p+) + #:attr evidence #'p+ + #:attr expanded #'e+))) +} + +A @racket[define-syntax-rule] is an inlined definition; using it here does not + save any space, but in practice we re-use the same alias for each of + our custom syntax classes. +The @racket[#:attributes] declaration is very important. +This is where the earlier-mentioned @racket[v.expanded] and @racket[v.evidence] + were defined, and indeed these two attributes form the backbone of our value-parsing + protocol. +In terms of a pattern @racket[x:procedure/arity], their meaning is: +@itemlist[ + @item{ + @racket[x.expanded] is the result of fully expanding all macros and transformations + contained in the syntax object bound to @racket[x]. + The helper function @racket[expand-expr] triggers this depth-first expansion. + } + @item{ + @racket[x.evidence] is the result of applying the @racket[arity?] predicate + to the expanded version of @racket[x]. + Intuitively, @racket[x.evidence] is the reason why we should be able to + perform transformations using @racket[x]. + } +] + +If the predicate @racket[arity?] returns @racket[#false], then the boolean + @racket[#:when] guard fails because the value contained in the syntax object + @racket[p+] will be @racket[#false]. +When this happens, neither attribute is bound and the pattern + @racket[x.procedure/arity] will fail. + +@; ============================================================================= +@section{Implementing Definitions} + +With that, we have essentially finished our tour of the key ideas underlying + our implementation. +The one detail we elided is precisely how interpreted data is propogated upward + through recursive transformations, especially since transformations may unfold + into arbitrary, difficult-to-parse code. + +An illustrative example is our transformation for @racket[sql-connect], + the library function for connecting a user to a database. +Recall that our library imposes an extra constraint on calls to @racket[sql-connect]: + they must supply a database schema, which is erased in translation. + +@racketblock[ + (syntax-parser + [(_ s:schema/spec e* ...) + (syntax-property + #'(sql-connect e* ...) + connection-key + #'s.evidence)] + [_ (raise-syntax-error 'sql-connect + "Missing schema")]) +] + +Most of this definition is routine. +We use the syntax class @racket[schema/spec] to lift schema specifications to + the compile-time environment and we ultimately forward all non-schema arguments + to the default @racket[sql-connect].@note{If an one of the arguments + @racket[e* ...] is malformed, this will be reported by the original + @racket[sql-connect]. Three cheers for division of labor!} +The new form is @racket[syntax-property], which tags our new syntax object + with a key/value pair. +Here the key is @racket[connection-key], which we generate when compiling a file + and use to identify connection objects. +The value is the evidence parsed from the schema description. + +Transformation writers must take care to install @racket[syntax-property] + information, but we automate the task of retrieving cached properties + in our syntax classes---before + applying a predicate, we first search for a cached value. +Syntax properties are likewise the trick for propagating metadata through + @racket[let] and @racket[define] bindings. +The technical tools for this are @racket[rename-transformer]s and @racket[free-id-table]s, + which we discuss in @Secref{sec:rename}. + +@;@codeblock{ +@; (make-alias #'vector-append +@; (syntax-parser +@; [(_ v0:vector/length v1:vector/length) +@; (define len0 (syntax-e #'v1.evidence)) +@; (define len1 (syntax-e #'v2.evidence)) +@; (define new-len (+ len0 len1)) +@; (syntax-property +@; #`(build-vector +@; #,new-len +@; (lambda (i) +@; (if (< i '#,len0) +@; (unsafe-vector-ref v1.expanded i) +@; (unsafe-vector-ref v2.expanded i))) +@; vector-length-key +@; new-len))] +@; [_ #f])) +@;} + + +@; ----------------------------------------------------------------------------- +@section{Ode to Macros: Greatest Hits} + +@; Symphony of features + +Whereas the previous section was a code-first tour of key techniques supporting + our implementation, this section is a checklist of important meta-programming + tools provided by the Racket macro system. +For ease of reference, our discussion proceeds from the most useful feature to + the least. +Each sub-section title is the name of a function or macro. +Titles marked with an asterisk are essential to our implementation, + just so other macrosystem users can compare with their toolkit. +@; TODO why is it all so shitty + + +@subsection[#:tag "sec:parse"]{Syntax Parse} + +You already know. +Best way to specify transformations. + + +@subsection[#:tag "sec:local-expand"]{Depth-First Expansion (*)} + +Bottom-up recursion. + + +@subsection[#:tag "sec:class"]{Syntax Classes} + +Abstracting patterns. +Honestly non-essential but a pain in the ass without. + + +@subsection[#:tag "sec:idmacro"]{Identifier Macros (*)} + + +@subsection[#:tag "sec:def-implementation"]{Syntax Properties (*)} + +Caching information, lets us go beyond constants. + + +@subsection[#:tag "sec:rename"]{Rename Transformers, Free Id Tables} + +Lets and definitions cannot stop us now. + + +@subsection[#:tag "sec:phase"]{Phasing} +Identify @tt{+*-/} + + +@subsection{Lexical Scope, Source Locations} + +Usability, tooling, debugging. -Open question: can we design a restricted macro language where every transformation - is statically checked to ensure termination and correctness. - -We have argued that our macros help typed programming, but they did so only - by going outside the law of the type checker. - diff --git a/icfp-2016/intro.scrbl b/icfp-2016/intro.scrbl index 2e0787c..24edf2b 100644 --- a/icfp-2016/intro.scrbl +++ b/icfp-2016/intro.scrbl @@ -4,6 +4,7 @@ @; - sam : too vague! @; : can you give more example? @; : too generous to other languages -- why didn't you do the entire paper in hs? +@; - a note on macro-land? where truthy & boolean monads are king? @require["common.rkt"] diff --git a/icfp-2016/texstyle.tex b/icfp-2016/texstyle.tex index e531c18..0c9645d 100644 --- a/icfp-2016/texstyle.tex +++ b/icfp-2016/texstyle.tex @@ -38,3 +38,4 @@ \newcommand{\tos}{\mathsf{types_of_spec}} \newcommand{\trt}[1]{\emph{#1}} \newcommand{\tprintf}{\mathsf{t_printf}} +\newcommand{\mod}[1]{$\mathsf{#1}$} diff --git a/icfp-2016/usage.scrbl b/icfp-2016/usage.scrbl index 996ab24..36acd44 100644 --- a/icfp-2016/usage.scrbl +++ b/icfp-2016/usage.scrbl @@ -71,6 +71,8 @@ Think of this convention as removing the oyster shell to get a clear view of the Using infix @tt{:} for type annotations, for instance @racket[(x : Integer)]. These are normally written as @racket[(ann x Integer)]. +sql is short for postgresql + @; ============================================================================= @section{String Formatting} @@ -303,7 +305,7 @@ These transformations are most effective when applied bottom-up, from the @; ============================================================================= -@section{Sized Data Structures} +@section[#:tag "sec:vector"]{Sized Data Structures} Vector bounds errors are always frustrating to debug, especially since their cause is rarely deep.