diff --git a/icfp-2016/README.md b/icfp-2016/README.md index 79816f2..f9d4e8c 100644 --- a/icfp-2016/README.md +++ b/icfp-2016/README.md @@ -78,3 +78,30 @@ A submission you wish to have treated as a pearl must be marked as such These steps will alert reviewers to use the appropriate evaluation criteria. Pearls will be combined with ordinary papers, however, for the purpose of computing the conference’s acceptance rate. + + + +Overflow +--- + +This pearl describes a technique for extending a simple type system with + a value-indexed form of polymorphism. +By analyzing the syntactic structure of values and partially evaluating + constant expressions before typechecking, we specialize the types of functions + like @racket[curry], @racket[first], and @racket[regexp-match] at their + call-sites when possible. +Whereas the general type of @racket[curry] is @racket[(⊥ -> ⊤)], + our system infers when it is safe to use a subtype instead. +For instance: + +@racketblock[ + (curry (λ (x y) x)) +] + +generates the type constraint + +@racketblock[ + curry : ((A B -> A) -> (A -> B -> A)) +] + + diff --git a/icfp-2016/bib.rkt b/icfp-2016/bib.rkt index 7b6b247..e6bcaf7 100644 --- a/icfp-2016/bib.rkt +++ b/icfp-2016/bib.rkt @@ -32,6 +32,7 @@ (define/short popl "POPL" (string-append ACM Symposium "on Principles of Programming Languages")) (define/short icse "ICSE" "International Conference on Software Engineering") (define/short lncs "LNCS" "Lecture Notes in Computer Science") +(define/short sigmod "SIGMOD" (string-append ACM "SIGMOD" International Conference "on Management of Data")) (define/short sigplan-notices "SIGPLAN Notices" (string-append ACM "SIGPLAN Notices")) (define/short scheme-workshop "SFP" (string-append "Scheme and Functional Programming Workshop")) (define/short jfp "JFP" (string-append Journal "Functional Programming")) @@ -1101,3 +1102,24 @@ #:number 3 #:pages '(269 285)) #:date 1957)) + +(define wmpk-algol-1968 + (make-bib + #:title "Report on the Algorithmic Language ALGOL 68" + #:author (authors "A. van Wijngaarden" "B. J. Mailloux" "J.E.L. Peck" "C.H.A. Koster") + #:date 1968)) + +(define s-lisp-1990 + (make-bib + #:title "Common Lisp the Language" + #:author "Guy L. Steele" + #:location (book-location #:edition "2nd" #:publisher "Digital Press") + #:is-book? #t + #:date 1990)) + +(define mbb-sigmod-2006 + (make-bib + #:title "LINQ: reconciling object, relations and XML in the .NET framework" + #:author (authors "Erik Meijer" "Brain Beckman" "Gavin Bierman") + #:location (proceedings-location sigmod #:pages '(706 706)) + #:date 2006)) diff --git a/icfp-2016/common.rkt b/icfp-2016/common.rkt index ed7096c..3174a77 100644 --- a/icfp-2016/common.rkt +++ b/icfp-2016/common.rkt @@ -10,6 +10,7 @@ author) ~cite citet + second etal exact generate-bibliography @@ -138,3 +139,7 @@ (define (todo . x*) (make-element 'bold (cons "TODO:" x*))) + +(define (second) + (make-element (make-style "relax" '(exact-chars)) + "$2^\\emph{nd}$")) diff --git a/icfp-2016/paper.scrbl b/icfp-2016/paper.scrbl index c6dc11f..3b1bc4e 100644 --- a/icfp-2016/paper.scrbl +++ b/icfp-2016/paper.scrbl @@ -39,7 +39,7 @@ @include-section{solution.scrbl} @include-section{usage.scrbl} @include-section{experience.scrbl} @; Merge with usage? -@;@include-section{implementation.scrbl} +@include-section{implementation.scrbl} @;@include-section{correctness.scrbl} @;@include-section{related.scrbl} @;@include-section{conclusion.scrbl} diff --git a/icfp-2016/solution.scrbl b/icfp-2016/solution.scrbl index 2761bc5..85dd0f6 100644 --- a/icfp-2016/solution.scrbl +++ b/icfp-2016/solution.scrbl @@ -1,5 +1,6 @@ #lang scribble/sigplan @onecolumn - +@; TODO semantic brackets for translations, but not for sets +@; TODO something like semantic brackets for interpretations? @require["common.rkt"] @@ -9,29 +10,29 @@ The out-of-bounds reference in @code{[0,1,2] !! 3} is evident from the definition of @code{!!} and the values passed to it. We also know that @code{1 `div` 0} will go wrong because division by zero is mathematically undefined. -Similar reasoning based on the meaning of @racket{%d} and the number of variables - bound by anonymous functions like @code{\ x y z -> x} can determine the correctness +Similar reasoning about the meaning of @racket{%d} and the variables + bound in @code{(\ x y z -> x)} can determine the correctness of calls to @racket[printf] and @racket[curry]. -In general, our analysis begins with a class of predicates for extracting - meta-information from expressions; for example the length of a - list value or arity of a procedure. +Generalizing these observations, our analysis begins with a class of + predicates for extracting meta-information from expressions; + for example deriving the length of a list value or arity of a procedure value. - @exact|{$$\interp\ : \big\{\emph{expr} \rightarrow \emph{Maybe(value)}\big\}$$}| + @exact|{$$\interp\ : \big\{\emph{expr} \rightarrow \emph{Maybe\,(value)}\big\}$$}| -Applying a function @code{f} @exact|{$\in \interp\ $}| to a syntactically +Applying a function @exact|{${\tt f} \in \interp\ $}| to a syntactically well-formed expression should either yield a value describing some aspect of the input expression or return a failure result.@note{The name @emph{interp} is a mnemonic for @emph{interpret} or @emph{interpolant}@~cite[c-jsl-1997].} Correct predicates @code{f} should recognize expressions with some common - structure (not necessarily the expressions' type) and apply a uniform algorithm + structure (not necessarily a common type) and apply a uniform algorithm to computer their result. The reason for specifying @exact|{$\interp\ $}| over expressions rather than values will be made clear in @Secref{sec:usage}. Once we have predicates for extracting data from the syntax of expressions, we can use the data to guide program transformations. -The main result of this pearl is defining a compelling set of such transformations. +@; The main result of this pearl is defining a compelling set of such transformations. @exact|{$$\trans : \big\{ \emph{expr} \rightarrow \emph{expr}\big\} $$}| @@ -40,39 +41,46 @@ Each @exact|{${\tt g} \in \trans$}| is a partial function such that @exact|{${\t error. These transformations should be applied to expressions @exact|{${\tt e}$}| before type-checking; the critera for correct transformations can then be given - in terms of the language's typing judgment @exact|{$\vdash {\tt e} : \tau$}| - and untyped evaluation relation @exact|{$\untyped{{\tt e}} \Downarrow {\tt v}$}|, + in terms of the language's typing judgment @exact|{$~\vdash {\tt e} : \tau$}| + and evaluation relation @exact|{$\untyped{{\tt e}} \Downarrow {\tt v}$}|, where @exact|{$\untyped{{\tt e}}$}| is the untyped erasure of @exact|{${\tt e}$}|. We also assume a subtyping relation @exact|{$\subt$}| on types. @itemlist[ @item{@emph{ - If @exact|{$\vdash {\tt e} : \tau$}| and @exact|{$\vdash {\tt e'} : \tau'$}| - then @exact|{$\tau' \subt \tau$}| and both + If @exact|{$~\vdash {\tt e} : \tau$}| and @exact|{$~\vdash {\tt e'} : \tau'$}| + @exact|{\\}| + then @exact|{$\tau' \subt \tau$}| + @exact|{\\}| + and both @exact|{$\untyped{{\tt e}} \Downarrow {\tt v}$}| and @exact|{$\untyped{{\tt e'}} \Downarrow {\tt v}$}|. }} @; e:t e':t' => t' <: t /\ e <=> e' @item{@emph{ - If @exact|{$\not\vdash {\tt e} : \tau$}| but @exact|{$\vdash {\tt e'} : \tau'$}| + If @exact|{$~\not\vdash {\tt e} : \tau$}| but @exact|{$~\vdash {\tt e'} : \tau'$}| + @exact|{\\}| then @exact|{$\untyped{{\tt e}} \Downarrow {\tt v}$}| and @exact|{$\untyped{{\tt e'}} \Downarrow {\tt v}$}|. }} @; -e:t e':t' => e <=> e' @item{@emph{ - If @exact|{$\vdash {\tt e} : \tau$}| but @exact|{${\tt e'} = \bot$}| - or @exact|{$\not\vdash {\tt e'} : \tau'$}| - then @exact|{$\untyped{{\tt e}} \Downarrow \mathsf{wrong}$}| or diverges. + If @exact|{$~\vdash {\tt e} : \tau$}| but @exact|{${\tt e'} = \bot$}| + or @exact|{$~\not\vdash {\tt e'} : \tau'$}| + @exact|{\\}| + then @exact|{$\untyped{{\tt e}} \Downarrow \mathsf{wrong}$}| or + @exact|{$\untyped{{\tt e}}$}| diverges. }} @; e:t -e':t' => e^ ] If neither @exact|{${\tt e}$}| nor @exact|{${\tt e'}$}| type checks, then we have no guarantees about the run-time behavior of either term. -The hope is that both diverge, but proving this fact in a realistic language is - more trouble than it is worth. +In a perfect world both would diverge, but the fundamental limitations of + static typing@~cite[fagan-dissertation-1992] and computability apply to our + humble system. @; Erasure, moral, immoral Finally, we say that a translation @exact|{${\tt (g~e) = e'}$}| is @emph{moral} if @@ -80,6 +88,7 @@ Finally, we say that a translation @exact|{${\tt (g~e) = e'}$}| is @emph{moral} Otherwise, the tranlation has altered more than just type annotations and is @emph{immoral}. All our examples in @Secref{sec:intro} can be implemented as moral translations. +@; @todo{really, even curry? well it's just picking from an indexed family} Immoral translations are harder to show correct, but also much more useful. diff --git a/icfp-2016/texstyle.tex b/icfp-2016/texstyle.tex index 6458e63..e531c18 100644 --- a/icfp-2016/texstyle.tex +++ b/icfp-2016/texstyle.tex @@ -35,3 +35,6 @@ \newcommand{\untyped}[1]{\lfloor #1 \rfloor} \newcommand{\trans}{\llbracket\emph{transform}\rrbracket} \newcommand{\subt}{\mathbf{<:\,}} +\newcommand{\tos}{\mathsf{types_of_spec}} +\newcommand{\trt}[1]{\emph{#1}} +\newcommand{\tprintf}{\mathsf{t_printf}} diff --git a/icfp-2016/usage.scrbl b/icfp-2016/usage.scrbl index d1d4f7a..996ab24 100644 --- a/icfp-2016/usage.scrbl +++ b/icfp-2016/usage.scrbl @@ -1,138 +1,512 @@ #lang scribble/sigplan -@require["common.rkt"] +@(require "common.rkt" racket/string racket/sandbox) + +@; TODO 'need to' etc +@; TODO clever examples +@; TODO definition pane for examples +@; +------------ +@; | f \in \interp +@; | +@; | > (f x) +@; | y +@; +------------ +@; TODO -- maybe a better title is "what can we learn from values"? +@; TODO remove all refs to implementation? + @title[#:tag "sec:usage"]{Real-World Metaprogramming} +@;@(define base-eval +@; (make-evaluator 'racket/base +@; (sandbox-output 'string) +@; (sandbox-error-output 'string) +@; (error-display-handler (lambda (x y) (displayln "wepa"))))) We have defined useful letter-of-values transformations for a variety of common programming tasks ranging from type-safe string formatting to constant-folding of arithmetic. These transformations are implemented in Typed Racket@~cite[TypedRacket], which inherits Racket's powerful macro system@~cite[plt-tr1]. +For us, this means that Typed Racket comes equipped with powerful tools + for analyzing and transforming the syntax of programs before any type checking + occurs. -Our exposition does not assume any knowledge of Racket or Typed Racket, but - a key design choice of the Typed Racket language model bears mentioning: - evaluation of a Typed Racket program happens across three distinct stages, - shown in @Figure-ref{fig:staging}. -First the program is read and macro-expanded; as expanding a macro introduces - new code, the result is recursively read and expanded until no macros remain. -Next, the @emph{fully-expanded} Typed Racket program is type checked. -If checking succeeds, types are erased and the program is handed to the Racket - compiler. -For us, this means that we can implement @exact|{$\trans\ $}| - functions as macros referencing @exact|{$\interp\ $}| functions and rely - on the macro expander to invoke our transformations before the type checker runs. +@;Our exposition does not assume any knowledge of Racket or Typed Racket, but +@; a key design choice of the Typed Racket language model bears mentioning: +@; evaluation of a Typed Racket program happens across three distinct stages, +@; shown in @Figure-ref{fig:staging}. +@;First the program is read and macro-expanded; as expanding a macro introduces +@; new code, the result is recursively read and expanded until no macros remain. +@;@; TODO stronger distinction between read/expand and typecheck +@;Next, the @emph{fully-expanded} Typed Racket program is type checked. +@;If checking succeeds, types are erased and the program is handed to the Racket +@; compiler. +@; +@;For us, this means that we can implement @exact|{$\trans$}| +@; functions as macros and rely on the macro expander to invoke our +@; transformations before the type checker runs. +@;The transformations can use @exact|{$\interp$}| functions as +@; needed to complete their analysis. +@; +@;@figure["fig:staging" +@; @list{Typed Racket language model} +@; @exact|{\input{fig-staging}}| +@;] -@figure["fig:staging" - @list{Typed Racket language model} - @exact|{\input{fig-staging}}| -] -Though we describe each of the following transformations using in-line constant - values, our implementation applies @exact|{$\interp\ $}| functions to every - definition and let-binding in the program and then associates compile-time - data with the bound identifier. -When a defined value flows into a function like @racket[printf] without being - mutated along the way, we retrieve this cached information. -The macro system features used to implement this behavior are described in - @Secref{sec:implementation}. +@parag{Conventions} +Interpretation and transformation functions are defined over syntactic expressions + and values. +To make clear the difference between Typed Racket terms and syntactic representations + of terms, we quote the latter and typeset it in green. +Using this notation, @racket[(λ (x) x)] is a term implementing the identity + function and @racket['(λ (x) x)] is a syntactic object that will evaluate + to the identity function. +Values are typeset in green because their syntax and term representations are identical. -@; regexp-match -@; format -@; math -@; vectors -@; arity +In practice, syntax objects carry lexical information. +Such information is extremely important, but to simplify our presentation we + pretend it does not exist. +Think of this convention as removing the oyster shell to get a clear view of the pearl. -@section{Regexp} -Regular expression patterns are another common value whose structure is not - expressed by conventional type systems. -Consider the function @racket[regexp-match] from Typed Racket: +Using infix @tt{:} for type annotations, for instance @racket[(x : Integer)]. +These are normally written as @racket[(ann x Integer)]. + + +@; ============================================================================= +@section{String Formatting} + +@; TODO add note about the ridiculous survey figure? Something like 4/5 doctors +Format strings are the world's second most-loved domain-specific language (DSL). +@; @~cite[wmpk-algol-1968] +At first glance, a format string is just a string; in particular, any string + used as the first argument in a call to @racket[printf]. +But between the quotation marks, format strings may contain @emph{format directives} + that tell @emph{where} and @emph{how} values can be spliced into the format string. +@; TODO scheme or common lisp? +Racket follows the Lisp tradition@~cite[s-lisp-1990] of using a tilde character (@tt{~}) + to prefix format directives. +For example, @racket[~s] converts any value to a string and @racket[~b] converts a + number to binary form. @interaction[ - (regexp-match #rx"types" "types@ccs.neu.edu") - - (regexp-match #rx"lambda" "types@ccs.neu.edu") - - (regexp-match #rx"(.*)@(.*)" "types@ccs.neu.edu") + (printf "binary(~s) = ~b" 7 7) ] -@; TODO note that nested groups cannot fail. -When called with a pattern @racket[p] and string @racket[s] to match with, - @racket[(regexp-match p s)] returns a list of all substrings in @racket[s] - matching groups in the pattern @racket[p]. -The pattern itself counts for one group and parentheses within @racket[p] - delimit nested groups. -Hence the third example returns a list of three elements. -A match can also fail, in which case the value @racket[#f] is returned. +If the format directives do not match the arguments to @racket[printf], most + languages fail at run-time@~cite[a-icfp-1999]. +This is a simple kind of value error that should be caught statically. -Although Typed Racket's @racket[regexp-match] has the type@note{Assuming that a - successful match implies that all nested groups successfully match.} - @racket[(Regexp String -> (U #f (Listof String)))], the number of strings - in the result list is determined by the number of groups in the input regular - expression. -Like the arity of a function or the size of a tuple, the number of groups - in a pattern is often clear to the programmer. -We could imagine using - an indexed family of @racket[regexp-match] functions for patterns of - two, three, or more groups. -Ideally though, the programming language should understand - these unconventional or domain-specific flavors of polymorphism. +@; TODO print errors nicer +@interaction[ + (printf "binary(~s) = ~b" "7" "7") +] + +Detecting inconsistencies between a format string and its arguments is easy + provided we have a function @racket[fmt->types] @exact|{$\in \interp$}| for + reading types from a format string. +In Typed Racket this function is rather simple because the most common + directives accept @code{Any} type of value. +Such are the joys of uniform syntax---printing is free. + +@racketblock[ +> (fmt->types "binary(~s) = ~b") +'[Any Integer] +> (fmt->types '(λ (x) x)) +#false +] + +Now to use @racket[fmt->types] in a function @racket[t-printf] @exact|{$\in \trans$}|. +Given a call to @racket[printf], we validate the number of arguments and + add type annotations derived using @racket[fmt->types]. +For all other syntax patterns, @racket[t-printf] is the identity transformation. + +@racketblock[ +> (t-printf '(printf "~a")) +⊥ +> (t-printf '(printf "~b" "2")) +'(printf "~b" ("2" : Integer)) +> (t-printf printf) +'printf +] + +The first example is rejected immediately as a syntax error. +The second is temporarily accepted, but will cause a static type error. +Put another way, the format string @racket{~b} specializes the type of + @racket[printf] from @racket[(String Any * -> Void)] to @racket[(String Integer -> Void)]. +The third is slightly more interesting; it demonstrates that higher-order + uses of @racket[printf] default to the standard behavior. -@section{Database} +@; ============================================================================= +@section{Regular Expressions} +Moving now from the second most-loved DSL to the first, regular expressions + are often used to capture sub-patterns within a string. + +@racketblock[ +> (regexp-match #rx"(.*)@(.*)" "toni@merchant.net") +'("toni@merchant.net" "toni" "merchant.net") +] + +The first argument to @racket[regexp-match] is a regular expression pattern. +Inside the pattern, the parentheses delimit sub-pattern @emph{groups}, the dots match + any single character, and the Kleene star matches zero-or-more repetitions + of the pattern-or-character preceding it. +The second argument is a string to match against the pattern. +If the match succeeds, the result is a list containing the entire matched string + and substrings corresponding to each group captured by a sub-pattern. +If the match fails, Racket's @racket[regexp-match] returns @racket[#false]. + +@racketblock[ +> (regexp-match #rx"-(2*)-" "111-222-3333") +'("-222-" "222") +> (regexp-match #rx"¥(.*)" "$2,000") +#false +] + +Certain groups can also fail to capture even when the overall match succeeds. +This can happen, for example, when a group is followed by a Kleene star. + +@racketblock[ +> (regexp-match #rx"(a)*(b)" "b") +'("b" #f "b") +] + +Therefore, a simple catch-all type for @racket[regexp-match] is + @racket[(Regexp String -> (U #f (Listof (U #f String))))]. +Using the general type, however, is cumbersome for simple patterns + where a match implies that all groups will successfully capture. + +@;@(define tr-eval (make-base-eval '(require typed/racket/base racket/list))) +@racketblock[ +> (define (get-domain [full-name : String]) : String + (cond + [(regexp-match #rx"(.*)@(.*)" full-name) + => third] + [else "Match Failed"])) +] +@;Error: expected String, got (U #false String) +@; `third` could not be applied to arguments +@; Arguments: (Listof (U #false String)) +@; Expected Result: String + +Analysing the parentheses contained in a regular expression pattern can often + determine the number of groups statically. +We implement this as a function @racket[rx->groups] @exact|{$\in \interp$}| + +@racketblock[ +> (rx->groups #rx"(a)(b)(c)") +3 +> (rx->groups #rx"((a)b)") +2 +> (rx->groups #rx"(a)*(b)") +#false +] + +@; TODO can we not talk about casts? +The corresponding transformation + @racket[t-regexp] @exact|{$\in \trans$}| + inserts casts to refine the type of results + produced by @racket[regexp-match]. +It also flags malformed groups in uncompiled regular expressions. + +@racketblock[ +> (t-regexp '(regexp-match #rx"(a)b" str)) +'(cast (regexp-match #rx"(a)b" str) + (U #f (List String String))) +> (t-regexp '(regexp-match "(" str)) +⊥ +] + + +@; ============================================================================= +@section{Procedure Arity} + +Anonymous functions are another value form whose representation contains + useful data. +By tokenizing symbolic λ-expressions, we can parse their domain + syntactically in a function @racket[fun->domain] @exact|{$\in \interp$}| + +@racketblock[ +> (fun->arity '(λ (x y z) (x (z y) y))) +'[Any Any Any] +> (fun->arity '(λ ([x : Real] [y : Real]) x)) +'[Real Real] +] + +When domain information is available at call sites to a @racket[curry] function, + we can produce code that will pass typechecking. +One method is to swap @racket[curry] with an indexed version, + as demonstrated by @racket[t-swap] @exact|{$\in \trans$}| + +@racketblock[ +> (t-swap '(curry (λ (x y) x))) +'(curry_1 (λ (x y) x)) +] + +Another option is to implement @racket[curry] with unsafe tools from + the language runtime and assign it the trivial type @racket[(⊥ -> ⊤)] +A transformation would replace this type with a subtype where possible + and let the type checker reject other calls. +Our implementation employs the first strategy, but generates a new @racket[curry_i] + at each call site by folding over the inferred function domain. +@; Mention this later, or show code? + + +@; ============================================================================= +@section{Constant Folding} + +The identity interpretation is useful for lifting constant values to + the compile-time environment. +@racket[id] @exact|{$\in \interp$}| + +@racketblock[ +> (id 2) +2 +> (id "~s") +"~s" +] + +When composed with a filter, we can recognize classes of constants. +That is, if @racket[int?] is the identity function on integer values + and the constant function returning @racket[#false] otherwise, we have + (@racket[int?]@exact{$\,\circ\,$}@racket[id])@exact|{$\,\in \interp$}| + +@racketblock[ +> (int? (id 2)) +2 +> (int? (id "~s")) +#false +] + +Using this technique we can detect division by the constant zero and + implement constant-folding transformation of arithmetic operations. +@racket[t-arith] @exact{$\in \interp$} + +@racketblock[ +> (t-arith '(/ 1 0)) +⊥ +> (t-arith '(* 2 3 7 z)) +'(* 42 z) +] + +These transformations are most effective when applied bottom-up, from the + leaves of a syntax tree to the root. + +@racketblock[ +> (t-arith '(/ 1 (- 5 5))) +⊥ +] + +@Secref{sec:implementation} describes how we accomplish this recursion scheme + with the Racket macro system. +@; The implementation of @racket[t-arith] expects flat/local data. + + +@; ============================================================================= +@section{Sized Data Structures} + +Vector bounds errors are always frustrating to debug, especially since + their cause is rarely deep. +In many cases, vectors are declared with a constant size and + accessed at constant positions, plus or minus an offset.@note{Loops do not + fit this pattern, but many languages already provide for/each constructs + to streamline common looping patterns.} +But unless the compiler or runtime system tracks vector bounds, the programmer + must unfold definitions and manually find the source of the problem. + +A second, more compelling reason to track vector bounds is to remove the + run-time checks inserted by memory-safe languages. +If we can statically prove that a reference is in-bounds, we should be able + to optimize it. + +Racket provides a few ways to create vectors. +Length information is explicit in some and derivable for + others, provided we interpret sub-expressions recursively. +@racket[vector->length] @exact{$\in \interp$} + +@racketblock[ +> (vector->length '#(0 1 2 3)) +3 +> (vector->length '(make-vector 100 #true)) +100 +> (vector->length '(vector-append #(left) #(right))) +7 +> (vector->length '(vector-drop #(A B C) 2)) +1 +] + +Once we have sizes associated with compile-time vectors, + we can define optimized translations of built-in functions. +For the following, assume that @racket[unsafe-ref] is an unsafe version of @racket[vector-ref]. +@racket[t-vec] @exact{$\in \trans$} + +@racketblock[ +> (t-vec '(vector-ref #(Y E S) 2)) +'(unsafe-ref #(Y E S) 2) +> (t-vec '(vector-ref #(N O) 2)) +⊥ +> (t-vec '(vector-length (make-vector 100 #true))) +100 +] + +We can define vectorized arithmetic using similar translations. +Each operation match the length of its arguments at compile-time and expand + to an efficient implementation. + + +@; ============================================================================= +@section{Database Queries} @; db @; TODO Ocaml, Haskell story -Racket's @racket[db] library provides a direct connection to the database, - modulo some important safety checks. - -Typed Racket programmers may use the @racket[db] library by giving type signatures - specialized to their needs. -For example, the statement below asserts that @racket[query-row] always returns - a natural number and a string. -In contrast, @racket[query-maybe-row] can optionally fail, but otherwise returns - a natural number and a string. +Racket's @racket[db] library provides a string-based API for executing SQL + queries. +Connecting to a database requires only a username. +Queries are strings, but may optionally reference query parameters---natural numbers + prefixed by a dollar sign (@tt{$}). +Arguments substituted for query parameters are guarded against SQL injection. @racketblock[ - (require/typed db - (#:opaque Connection connection?) - (sql-connect (->* () (#:user String #:database String) Connection)) - (query-row (-> Connection String Any * (Vector Natural String))) - (query-maybe-row (-> Connection String Any * (Option (Vector Natural String)))) - ....) +> (define C (sql-connect #:user "shylock" + #:database "accounts")) +> (query-row C + "SELECT amt_owed FROM loans WHERE name = '$1'" + "antonio") +3000 ] -To manage additional tables, the programmer can re-import @racket[db] identifiers - with a fresh name and type signature. +This is a far cry from language-integrated query@~cite[mbb-sigmod-2006], but the interface + is relatively safe and very simple to use. -@todo{coherent example} +Typed Racket programmers may use the @racket[db] library by assigning specific + type signatures to functions like @racket[query-row]. +This is somewhat tedious, as the distinct operations for querying one value, + one row, or a lazy sequence of rows each need a type. +A proper type signature might express the database library as a functor over the + database schema, but Typed Racket does not have functors or even existential types. +Even if it did, the queries-as-strings interface makes it impossible for a standard + type checker to infer type constraints on query parameters. + +The situation worsens if the programmer uses multiple database connections. +One can either alias one query function to multiple identifiers (each with a specific type) + or weaken type signatures and manually type-cast query results. + +Our technique provides a solution. +We associate database connections with a compile-time value representing the + database schema. +Operations like @racket[query-row] then extract the schema from a connection + and interpret type constraints from the query string. +These constraints are passed to the type system in two forms: as annotations + on expressions used as query parameters and as casts on the result of a query. +Note that the casts are not guaranteed to succeed---in particular, the schema + may differ from the actual types in the database. + +In total, the solution requires three interpretation functions and at least two + transformations. +First, we implement a predicate to recognize schema values. +A schema is a list of table schemas; a table schema is a tagged list of + column names and column types. + +(@racket[schema?]@exact{$\,\circ\,$}@racket[id]) @exact{$\in \interp$} @racketblock[ - (require/typed (prefix-in w: db) - (w:query-row (-> Connection String Any * (Vector Natural String)))) - - (require/typed (prefix-in s: db) - (s:query-row (-> Connection String Any * (Vector Natural Natural)))) +> (schema? '([loans [(id . Natural) + (name . String) + (amt_owed . Integer)]])) +'([loans ...]) ] -This approach works, but is tedious and error-prone. -Our contribution is to associate a type signature with database connections. -Calls to generic operations like @racket[query-row] are then specialized to - the exact result type. +Composed with the identity interpretation, this predicate lifts schema values + from the program text to the compile-time environment. -@; PERKS -@; - Single point-of-control -@; - lightweight -@; - no changes to underlying library / type system -@; - compile-time validation of query strings +@; TODO this must be clearer +@; TODO note that this isn't a drop-in replacement +Next, we need to associate database connections with database schemas. +We do this by transforming calls to @racket[sql-connect]; connections made + without an explicit schema argument are rejected. -Incidentally, @racket[regexp-match:] is useful for parsing queries. -We also implemented @racket[query-row] to return a sized vector. +(@racket[sc]) @exact{$\in \trans$} +@racketblock[ +> (sc '(sql-connect #:user "shylock" + #:database "accounts")) +⊥ +> (sc '(sql-connect ([loans ...]) + #:user "shylock" + #:database "accounts")) +'(sql-connect #:user "shylock" + #:database "accounts") +] + +To be consistent with our other examples, we should now describe an + interpretation from connection objects to schemas. +That is, we should be able to extract the schema for the @racket{accounts} database + from the syntactic representation of @racket{shylock}'s connection object. +Our implementation, however, defines @racket[connection->schema] @exact{$\in \interp$} as + @racket[(λ (x) #false)]. +Instead, we assume that all connection objects are named and rely on a general + technique for associating compile-time data with identifiers. +See @Secref{sec:def-overview} for an overview and @Secref{sec:def-implementation} + for details. + +Our final interpretation function tokenizes query strings and returns + column and table information. +In particular, we parse @tt{SELECT} statements and extract +@itemlist[ + @item{the names of selected columns,} + @item{the table name, and} + @item{an association from query parameters to column names.} +] + +@racketblock[ +> "SELECT amt_owed FROM loans WHERE name = '$1'" +'[(amt_owed) loans ($1 . name)] +> "SELECT * FROM loans" +'[* loans ()] +] + +The schema, connection, and query constraints now come together in transformations + such as @racket[t] @exact{$\in \trans$}. +There is still a non-trivial amount of work to be done resolving wildcards and + validating table and row names before the type-annotated result is produced, + but all the necessary information is available. + +@racketblock[ +> (t '(query-row C + "SELECT amt_owed FROM loans WHERE name = '$1'" + "antonio")) +'(cast (query-row C + "SELECT amt_owed FROM loans WHERE name = '$1'" + ("antonio" : String)) + (Vector Integer)) +> (t '(query-row C + "SELECT * FROM loans WHERE name = '$1'" + "antonio")) +'(cast (query-row C + "SELECT amt_owed FROM loans WHERE name = '$1'" + ("antonio" : String)) + (Vector Natural String Integer)) +> (t '(query-row C "SELECT * FROM itunes")) +⊥ +] -While light years away from LINQ, this technique provides at least basic - safety guarantees for an existing library and demonstrates the generality of - our technique. +@; ============================================================================= +@section[#:tag "sec:def-overview"]{Definitions and Let-bindings} +Though we described each of the above transformations using in-line constant + values, our implementation cooperates with definitions and let bindings. +By means of an example, the difference @racket[(- n m)] in the following program + is compiled to the value 0. -@section{} -@; identifier macros -@; rename transformers +@racketblock[ +> (define m (* 6 7)) +> (let ([n m]) + (- m n)) +] + +Conceptually, we accomplish this by applying known functions + @exact{${\tt f} \in \interp$} at definition sites and keeping a compile-time + mapping from identifiers to interpreted data. +Thanks to Racket's meta-programming tools, implementing this table in a way + that cooperates with aliases and lexical scope is simple. + +@; TODO set!, for defines and lets