[icfp] checkpoint: codewalk drafted

2016-03-13 17:50:29 -04:00 · 2016-03-13 17:50:29 -04:00 · fec9513a80
commit fec9513a80
parent 185fc5fea4
7 changed files with 413 additions and 37 deletions
--- a/icfp-2016/bib.rkt
+++ b/icfp-2016/bib.rkt
@ -32,7 +32,7 @@
 (define/short popl "POPL" (string-append ACM Symposium "on Principles of Programming Languages"))
 (define/short icse "ICSE" "International Conference on Software Engineering")
 (define/short lncs "LNCS" "Lecture Notes in Computer Science")
-(define/short sigmod "SIGMOD" (string-append ACM "SIGMOD" International Conference "on Management of Data"))
+(define/short sigmod "SIGMOD" (string-append ACM "SIGMOD " International Conference "on Management of Data"))
 (define/short sigplan-notices "SIGPLAN Notices" (string-append ACM "SIGPLAN Notices"))
 (define/short scheme-workshop "SFP" (string-append "Scheme and Functional Programming Workshop"))
 (define/short jfp "JFP" (string-append Journal "Functional Programming"))
@ -1123,3 +1123,11 @@
   #:author (authors "Erik Meijer" "Brain Beckman" "Gavin Bierman")
   #:location (proceedings-location sigmod #:pages '(706 706))
   #:date 2006))
+
+(define c-dissertation-2010
+  (make-bib
+   #:title "Refining Syntactic Sugar: Tools for Supporting Macro Development"
+   #:author "Ryan Culpepper"
+   #:location (dissertation-location #:institution "Northeastern University")
+   #:date 2010))
+
--- a/icfp-2016/fig-staging.tex
+++ b/icfp-2016/fig-staging.tex
@ -1,3 +1,11 @@
+%% TODO
+%%  draw 'barriers' between my analysis & typed racket
+%%      |      |
+%%  ME  |      |
+%%   +  |  TR  |
+%%  TR  |      |
+%%      |      |
+
 \begin{center}
 \begin{tikzpicture}
  \node (0) {$\bullet$};
--- a/icfp-2016/fig-stats.tex
+++ b/icfp-2016/fig-stats.tex
@ -1,14 +1,13 @@
 \newcommand{\twoline}[2]{\parbox[s]{1.44cm}{\flushright\hfill #1\newline#2}}
-\newcommand{\mod}[1]{$\mathsf{#1}$}
 \begin{tabular}{l r r r}
-  Module             & LOC & $\interp$ & $\trans$ \\\hline
-  \mod{db}           & 263 &   2  (78) &  2 (101) \\
-  \mod{format}       &  66 &   1  (33) &  1  (21) \\
-  \mod{function}     &  31 &   1   (8) &  1  (11) \\
-  \mod{math}         &  90 &   1   (3) &  5  (46) \\
-  \mod{regexp}       & 122 &   4  (60) &  5  (33) \\
-  \mod{vector}       & 228 &   1  (19) & 13 (163) \\\hline
-  {\bf Total}        & 800 &  10 (201) & 27 (375) \\
+  Module             & LOC & $\interp$ & $\trans$   \\\hline
+  \mod{db}           & 263 &   2  (78) &  2  (101)  \\
+  \mod{format}       &  66 &   1  (33) &  1 \,~(21) \\
+  \mod{function}     &  31 &   1 ~~(8) &  1 \,~(11) \\
+  \mod{math}         &  90 &   1 ~~(3) &  5 \,~(46) \\
+  \mod{regexp}       & 122 &   6  (60) &  5 \,~(33) \\
+  \mod{vector}       & 228 &   1  (19) & 13  (163)  \\\hline
+  {\bf Total}        & 800 &  12 (201) & 27  (375)  \\
 \end{tabular}
 %% AVG
 % loc : 228.5
--- a/icfp-2016/implementation.scrbl
+++ b/icfp-2016/implementation.scrbl
@ -1,37 +1,394 @@
 #lang scribble/sigplan
+@require["common.rkt"]

-@title{Implementation}
-
-@; - syntax parse
-@; - 
-@; - identifier macros
-@; - let-bindings, make=rename-transformer
-@; - define-bindings, free-id-table
-@; - local expand
-@; - phasing (we have + at all levels)
-
-@; We don't re-implement format or regexp,
-@; but we do implement + and some vector operations, to go faster
+@title[#:tag "sec:implementation"]{Implementation}
+@; Amazing Macros
+@; Whirlwind tour
+@; Accounting, taking stock
+@; Alas I knew him well


+@; TODO this is all so boring right now, need to revise heavily

+@Figure-ref{fig:stats} gives a few statistics regarding our implementation.
+The purpose of this section is to explain why the numbers are low.
+@; Ode to macros, implementation a symphony

-
-
-@section{Correctness}
-
-Our implementation of @racket[format] in @todo{figure-ref} exhibits a few desirable properties.
-@itemlist[
-  @item{}
+@figure["fig:stats"
+  "Quantifying the implementation"
+  @exact|{\input{fig-stats}}|
 ]

-Generally speaking these properties are the ``right'' way to judge if a transformation is correct.
+In total, our six applications comprise 800 lines of code (LOC).
+Another 145 lines implement common functionality, putting the grand total
+ just under 1000 LOC.
+
+Except for @exact|{\mod{db}}| and @exact|{\mod{regexp}}|, each of the
+ core modules defines a single function in @exact|{$\interp$}|.
+In @exact|{\mod{db}}| the two functions are the schema predicate and @tt{SQL}
+ query parser (we omit the trivial interpreter for connections).
+    @; TODO no parentheses?
+On the other hand, @exact|{\mod{regexp}}| implements six group-parsing functions
+ to match the six string-like input types
+   @;@note{Strings, Regex literals, Posix Regex literls, and byte-string variations of each.}
+ accepted by Racket's @racket[regexp-match].
+These group parsers, however, share a 33-line kernel.
+Incidentally, the average size of all value-interpreting functions is 33 LOC.
+The smallest interpreter is the composition of Racket's @racket[number?] predicate
+ with the identity interpretation in @exact|{\mod{math}}| (3 LOC).
+The largest is the query parser (35 LOC), though the analyses for
+ format strings and regular expressions are approximately the same size.
+
+The @exact{$\trans$} functions are aliases for standard library procedures.
+   @; TODO much better to show off short code. But let's draft it first.
+In many cases we are able to re-use code between similar functions.
+For instance, the arithmetic operators @racket[+ - * /] are implemented by
+ a single fold.
+
+
+@; -----------------------------------------------------------------------------
+@section{Implementing Transformations} @; TODO not a great name
+@; @section{Ode to Macros: The Long Version}
+
+At this point, we have carried on long enough talking about the implementation
+ without actually showing any code.
+No longer---here is our definition of @racket[vector-length]:
+
+@codeblock{
+  (make-alias #'vector-length
+    (syntax-parser
+     [(_ v:vector/length)
+      #''v.evidence]
+     [_ #false]))
+}
+
+First of all, this transformation works as specified in @Secref{sec:vector}.
+When the length of its argument is known, it expands to that length.
+Otherwise, it expands to an ordinary call to @racket[vector-length].
+
+Second, we need to introduce a few mysterious characters:
+@itemlist[
+  @item{
+    @racket[(make-alias id f)] creates a transformation from an identifier @racket[id]
+    and a partial function @racket[f].
+  }
+  @item{
+    The symbol @tt{#'} creates a syntax object from a value or template.
+  }
+  @item{
+    A @racket[syntax-parser] is a match statement over syntactic patterns.
+    This parser recognizes two cases: application to a single argument via
+     the pattern @racket[(_ v:vector/length)] and anything else with the
+     wildcard @tt{_}.
+  }
+  @item{
+    The colon character (@tt{:}) used int @racket[v:vector/length]
+     binds the variable @racket[v] to the @emph{syntax class} @racket[vector/length].
+  }
+  @item{
+    The dot character (@tt{.}) accesses an @emph{attribute} of the value bound
+     to @racket[v].
+    In this case, the attribute @racket[evidence] is set when 
+     @racket[vector/length] matches successfully.
+  }
+]
+
+Third, we remark that the pattern @racket[v:vector/length] unfolds all
+ transformations to @racket[v] recursively.
+So we handle each of the following cases, as well as any other combination of
+ length-preserving vector operations.
+
+@racketblock[
+> '(vector-length #(H I))
+2
+> '(vector-length (vector-append #(Y O)
+                                 #(L O)))
+4
+]
+
+@; The general features are explained in greater deteail below
+
+@; make-alias
+@; TODO variable name for f
+The overall structure of @racket[vector-length] is common to many of our transformations.
+That is, we define a rule to handle an interesting syntactic pattern and
+ then generate an alias from the rule using the helper function @racket[make-alias].
+
+@codeblock{
+  (define ((make-alias orig-id f) stx)
+    (or (f stx)
+        (syntax-parse stx
+         [_:id
+          orig-id]
+         [(_ e* ...)
+          #`(#,orig-id e* ...)])))
+}
+
+The transformation defined by @racket[(make-alias id f)] is a function on
+ syntax objects.
+First, the function applies @racket[f] to the syntax object @racket[stx].
+If the result is not
+ @racket[#false] we return.
+Otherwise the function matches its argument against two possible patterns:
+@itemize[
+  @item{
+    @tt{_:id} recognizes identifiers with the built-in syntax class @racket[id].
+    When this pattern succeeds, we return the aliased @racket[orig-id].
+  }
+  @item{
+    @racket[(_ e* ...)] matches function application.
+    In the result of this branch,
+     @; TODO backtick not printing right
+     we declare a syntax template with @tt{#`} and splice the identifier
+     @racket[orig-id] into the template with @tt{#,}.
+    These operators are formally known as @racket[quasisyntax] and @racket[unsyntax];
+     you may know their cousins @racket[quasiquote] and @racket[unquote].
+  }
+]
+
+@emph{Note:} the identifier @racket[...] is not pseudocode!
+In a pattern, it captures zero-or-more repetitions of the preceding pattern---in
+ this case, the variable @racket[e*] binds anything so @racket[(_ e* ...)] matches
+ lists with at least one element.@note{The name @racket[e*] is our own convention.}
+All but the first element of such a list is then bound to the identifier
+ @racket[e*] in the result.
+We use @racket[...] in the result to flatten the contents of @racket[e*] into
+ the final expression.
+
+One last example transformation using @racket[make-alias]
+ is our definition of @racket[vector-ref], shown below.
+When given a sized vector @racket[v] and an expression @racket[e] that
+ expands to a number @racket[i], the function asserts that @racket[i] is
+ in bounds.
+If either @racket[vector/length] or @racket[expr->num] fail to coerce numeric
+ values, the function returns @racket[#false].
+
+@codeblock{
+  (make-alias #'vector-ref
+    (syntax-parser
+     [(_ v:vector/length e)
+      (let ([i (expr->num #'e)])
+        (if i
+          (if (< i (syntax->datum #'v.evidence))
+            #`(unsafe-vector-ref v.expanded '#,i)
+            (raise-vector-bounds-error #'v i))
+          #false))]
+     [_ #false]))
+}
+
+Unlike the previous two functions, our @racket[vector-ref] transformation
+ does more than just matching a pattern and returning a new syntax object.
+Crucially, it compares the @emph{value} used to index its argument vector with
+ that vector's length before choosing how to expand.
+To access these integer values outside of a template, we lift the pattern variables
+ @racket[v] and @racket[e] to syntax objects with a @tt{#'} prefix.
+A helper function @racket[expr->num] then fully expands the syntax object @racket[#'e]
+ and the built-in @racket[syntax->datum] gets the integer value stored at the
+ attribute @racket[#'v.evidence].
+
+Programming in this style is similar to the example-driven explanations we gave
+ in @Secref{sec:usage}.
+The interesting design challenge is making one pattern that covers all
+ relevant cases and one algorithm to uniformly derive the correct result.
+
+
+@; =============================================================================
+@section{Implementing Interpretations} @; TODO a decidedly bad name
+
+By now we have seen two useful syntax classes: @racket[id] and @racket[vector/length].
+In fact, we use syntax classes as the front-end for each function in @exact{$\interp$}.
+@Figure-ref{fig:stxclass} lists all of our syntax classes and ties each to a purpose
+ motivated in @Secref{sec:usage}.@note{The name @racket[vector/length] should
+  be read as ``vector @emph{with} length information''.}
+
+@figure["fig:stxclass"
+  "Registry of syntax classes"
+  @exact|{\input{fig-stxclass}}|
+]
+
+These classes are implemented uniformly from predicates on syntax objects.
+One such predicate is @racket[arity?], shown below, which counts
+ the parameters accepted by an uncurried anonymous function and returns
+ @racket[#false] for all other inputs.
+
+@codeblock{
+  (define arity?
+    (syntax-parser #:literals (λ)
+     [(λ (x*:id ...) e* ...)
+      (length (syntax->datum #'(x* ...)))]
+     [_ #f]))
+}
+
+The syntax class @racket[procedure/arity] is then defined as ...
+
+@racketblock[
+> (define-stxclass/pred procedure/arity
+    arity?)
+]
+
+... in terms of another macro, which handles the routine work of recursively
+ expanding its input, applying the @racket[arity?] predicate,
+ and caching results in the @racket[evidence] and @racket[expanded] attributes.
+
+@codeblock{
+  (define-syntax-rule (define-stxclass/pred id p?)
+    (define-syntax-class id
+     #:attributes (evidence expanded)
+     (pattern e
+      #:with e+ (expand-expr #'e)
+      #:with p+ (p? #'e+)
+      #:when (syntax->datum #'p+)
+      #:attr evidence #'p+
+      #:attr expanded #'e+)))
+}
+
+A @racket[define-syntax-rule] is an inlined definition; using it here does not
+ save any space, but in practice we re-use the same alias for each of
+ our custom syntax classes.
+The @racket[#:attributes] declaration is very important.
+This is where the earlier-mentioned @racket[v.expanded] and @racket[v.evidence]
+ were defined, and indeed these two attributes form the backbone of our value-parsing
+ protocol.
+In terms of a pattern @racket[x:procedure/arity], their meaning is:
+@itemlist[
+  @item{
+    @racket[x.expanded] is the result of fully expanding all macros and transformations
+     contained in the syntax object bound to @racket[x].
+    The helper function @racket[expand-expr] triggers this depth-first expansion.
+  }
+  @item{
+    @racket[x.evidence] is the result of applying the @racket[arity?] predicate
+     to the expanded version of @racket[x].
+    Intuitively, @racket[x.evidence] is the reason why we should be able to
+     perform transformations using @racket[x].
+  }
+]
+
+If the predicate @racket[arity?] returns @racket[#false], then the boolean
+ @racket[#:when] guard fails because the value contained in the syntax object
+ @racket[p+] will be @racket[#false].
+When this happens, neither attribute is bound and the pattern
+ @racket[x.procedure/arity] will fail.
+
+@; =============================================================================
+@section{Implementing Definitions}
+
+With that, we have essentially finished our tour of the key ideas underlying
+ our implementation.
+The one detail we elided is precisely how interpreted data is propogated upward
+ through recursive transformations, especially since transformations may unfold
+ into arbitrary, difficult-to-parse code.
+
+An illustrative example is our transformation for @racket[sql-connect],
+ the library function for connecting a user to a database.
+Recall that our library imposes an extra constraint on calls to @racket[sql-connect]:
+ they must supply a database schema, which is erased in translation.
+
+@racketblock[
+  (syntax-parser
+   [(_ s:schema/spec e* ...)
+    (syntax-property
+      #'(sql-connect e* ...)
+      connection-key
+      #'s.evidence)]
+   [_ (raise-syntax-error 'sql-connect
+        "Missing schema")])
+]
+
+Most of this definition is routine.
+We use the syntax class @racket[schema/spec] to lift schema specifications to
+ the compile-time environment and we ultimately forward all non-schema arguments
+ to the default @racket[sql-connect].@note{If an one of the arguments
+   @racket[e* ...] is malformed, this will be reported by the original
+   @racket[sql-connect]. Three cheers for division of labor!}
+The new form is @racket[syntax-property], which tags our new syntax object
+ with a key/value pair.
+Here the key is @racket[connection-key], which we generate when compiling a file
+ and use to identify connection objects.
+The value is the evidence parsed from the schema description.
+
+Transformation writers must take care to install @racket[syntax-property]
+ information, but we automate the task of retrieving cached properties
+ in our syntax classes---before
+ applying a predicate, we first search for a cached value.
+Syntax properties are likewise the trick for propagating metadata through
+ @racket[let] and @racket[define] bindings.
+The technical tools for this are @racket[rename-transformer]s and @racket[free-id-table]s,
+ which we discuss in @Secref{sec:rename}.
+
+@;@codeblock{
+@;  (make-alias #'vector-append
+@;    (syntax-parser
+@;     [(_ v0:vector/length v1:vector/length)
+@;      (define len0 (syntax-e #'v1.evidence))
+@;      (define len1 (syntax-e #'v2.evidence))
+@;      (define new-len (+ len0 len1))
+@;      (syntax-property
+@;        #`(build-vector
+@;              #,new-len
+@;              (lambda (i)
+@;                (if (< i '#,len0)
+@;                  (unsafe-vector-ref v1.expanded i)
+@;                  (unsafe-vector-ref v2.expanded i)))
+@;        vector-length-key
+@;        new-len))]
+@;     [_ #f]))
+@;}
+
+
+@; -----------------------------------------------------------------------------
+@section{Ode to Macros: Greatest Hits}
+
+@; Symphony of features
+
+Whereas the previous section was a code-first tour of key techniques supporting
+ our implementation, this section is a checklist of important meta-programming
+ tools provided by the Racket macro system.
+For ease of reference, our discussion proceeds from the most useful feature to
+ the least.
+Each sub-section title is the name of a function or macro.
+Titles marked with an asterisk are essential to our implementation,
+ just so other macrosystem users can compare with their toolkit.
+@; TODO why is it all so shitty
+
+
+@subsection[#:tag "sec:parse"]{Syntax Parse}
+
+You already know.
+Best way to specify transformations.
+
+
+@subsection[#:tag "sec:local-expand"]{Depth-First Expansion (*)}
+
+Bottom-up recursion.
+
+
+@subsection[#:tag "sec:class"]{Syntax Classes}
+
+Abstracting patterns.
+Honestly non-essential but a pain in the ass without.
+
+
+@subsection[#:tag "sec:idmacro"]{Identifier Macros (*)}
+
+
+@subsection[#:tag "sec:def-implementation"]{Syntax Properties (*)}
+
+Caching information, lets us go beyond constants.
+
+
+@subsection[#:tag "sec:rename"]{Rename Transformers, Free Id Tables}
+
+Lets and definitions cannot stop us now.
+
+
+@subsection[#:tag "sec:phase"]{Phasing}
+Identify @tt{+*-/}
+
+
+@subsection{Lexical Scope, Source Locations}
+
+Usability, tooling, debugging.



-Open question: can we design a restricted macro language where every transformation
- is statically checked to ensure termination and correctness.
-
-We have argued that our macros help typed programming, but they did so only
- by going outside the law of the type checker.
-
--- a/icfp-2016/intro.scrbl
+++ b/icfp-2016/intro.scrbl
@ -4,6 +4,7 @@
@; - sam    : too vague!
@;          : can you give more example?
@;          : too generous to other languages -- why didn't you do the entire paper in hs?
+@; - a note on macro-land? where truthy & boolean monads are king?


@require["common.rkt"]
--- a/icfp-2016/texstyle.tex
+++ b/icfp-2016/texstyle.tex
@ -38,3 +38,4 @@
 \newcommand{\tos}{\mathsf{types_of_spec}}
 \newcommand{\trt}[1]{\emph{#1}}
 \newcommand{\tprintf}{\mathsf{t_printf}}
+\newcommand{\mod}[1]{$\mathsf{#1}$}
--- a/icfp-2016/usage.scrbl
+++ b/icfp-2016/usage.scrbl
@ -71,6 +71,8 @@ Think of this convention as removing the oyster shell to get a clear view of the
 Using infix @tt{:} for type annotations, for instance @racket[(x : Integer)].
 These are normally written as @racket[(ann x Integer)].

+sql is short for postgresql
+

@; =============================================================================
@section{String Formatting}
@ -303,7 +305,7 @@ These transformations are most effective when applied bottom-up, from the


@; =============================================================================
-@section{Sized Data Structures}
+@section[#:tag "sec:vector"]{Sized Data Structures}

 Vector bounds errors are always frustrating to debug, especially since
 their cause is rarely deep.