[icfp] checkpoint, got applications drafted
This commit is contained in:
parent
3cc1917410
commit
28e5b4b04f
|
@ -78,3 +78,30 @@ A submission you wish to have treated as a pearl must be marked as such
|
||||||
These steps will alert reviewers to use the appropriate evaluation criteria.
|
These steps will alert reviewers to use the appropriate evaluation criteria.
|
||||||
Pearls will be combined with ordinary papers, however,
|
Pearls will be combined with ordinary papers, however,
|
||||||
for the purpose of computing the conference’s acceptance rate.
|
for the purpose of computing the conference’s acceptance rate.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Overflow
|
||||||
|
---
|
||||||
|
|
||||||
|
This pearl describes a technique for extending a simple type system with
|
||||||
|
a value-indexed form of polymorphism.
|
||||||
|
By analyzing the syntactic structure of values and partially evaluating
|
||||||
|
constant expressions before typechecking, we specialize the types of functions
|
||||||
|
like @racket[curry], @racket[first], and @racket[regexp-match] at their
|
||||||
|
call-sites when possible.
|
||||||
|
Whereas the general type of @racket[curry] is @racket[(⊥ -> ⊤)],
|
||||||
|
our system infers when it is safe to use a subtype instead.
|
||||||
|
For instance:
|
||||||
|
|
||||||
|
@racketblock[
|
||||||
|
(curry (λ (x y) x))
|
||||||
|
]
|
||||||
|
|
||||||
|
generates the type constraint
|
||||||
|
|
||||||
|
@racketblock[
|
||||||
|
curry : ((A B -> A) -> (A -> B -> A))
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -32,6 +32,7 @@
|
||||||
(define/short popl "POPL" (string-append ACM Symposium "on Principles of Programming Languages"))
|
(define/short popl "POPL" (string-append ACM Symposium "on Principles of Programming Languages"))
|
||||||
(define/short icse "ICSE" "International Conference on Software Engineering")
|
(define/short icse "ICSE" "International Conference on Software Engineering")
|
||||||
(define/short lncs "LNCS" "Lecture Notes in Computer Science")
|
(define/short lncs "LNCS" "Lecture Notes in Computer Science")
|
||||||
|
(define/short sigmod "SIGMOD" (string-append ACM "SIGMOD" International Conference "on Management of Data"))
|
||||||
(define/short sigplan-notices "SIGPLAN Notices" (string-append ACM "SIGPLAN Notices"))
|
(define/short sigplan-notices "SIGPLAN Notices" (string-append ACM "SIGPLAN Notices"))
|
||||||
(define/short scheme-workshop "SFP" (string-append "Scheme and Functional Programming Workshop"))
|
(define/short scheme-workshop "SFP" (string-append "Scheme and Functional Programming Workshop"))
|
||||||
(define/short jfp "JFP" (string-append Journal "Functional Programming"))
|
(define/short jfp "JFP" (string-append Journal "Functional Programming"))
|
||||||
|
@ -1101,3 +1102,24 @@
|
||||||
#:number 3
|
#:number 3
|
||||||
#:pages '(269 285))
|
#:pages '(269 285))
|
||||||
#:date 1957))
|
#:date 1957))
|
||||||
|
|
||||||
|
(define wmpk-algol-1968
|
||||||
|
(make-bib
|
||||||
|
#:title "Report on the Algorithmic Language ALGOL 68"
|
||||||
|
#:author (authors "A. van Wijngaarden" "B. J. Mailloux" "J.E.L. Peck" "C.H.A. Koster")
|
||||||
|
#:date 1968))
|
||||||
|
|
||||||
|
(define s-lisp-1990
|
||||||
|
(make-bib
|
||||||
|
#:title "Common Lisp the Language"
|
||||||
|
#:author "Guy L. Steele"
|
||||||
|
#:location (book-location #:edition "2nd" #:publisher "Digital Press")
|
||||||
|
#:is-book? #t
|
||||||
|
#:date 1990))
|
||||||
|
|
||||||
|
(define mbb-sigmod-2006
|
||||||
|
(make-bib
|
||||||
|
#:title "LINQ: reconciling object, relations and XML in the .NET framework"
|
||||||
|
#:author (authors "Erik Meijer" "Brain Beckman" "Gavin Bierman")
|
||||||
|
#:location (proceedings-location sigmod #:pages '(706 706))
|
||||||
|
#:date 2006))
|
||||||
|
|
|
@ -10,6 +10,7 @@
|
||||||
author)
|
author)
|
||||||
~cite
|
~cite
|
||||||
citet
|
citet
|
||||||
|
second
|
||||||
etal
|
etal
|
||||||
exact
|
exact
|
||||||
generate-bibliography
|
generate-bibliography
|
||||||
|
@ -138,3 +139,7 @@
|
||||||
|
|
||||||
(define (todo . x*)
|
(define (todo . x*)
|
||||||
(make-element 'bold (cons "TODO:" x*)))
|
(make-element 'bold (cons "TODO:" x*)))
|
||||||
|
|
||||||
|
(define (second)
|
||||||
|
(make-element (make-style "relax" '(exact-chars))
|
||||||
|
"$2^\\emph{nd}$"))
|
||||||
|
|
|
@ -39,7 +39,7 @@
|
||||||
@include-section{solution.scrbl}
|
@include-section{solution.scrbl}
|
||||||
@include-section{usage.scrbl}
|
@include-section{usage.scrbl}
|
||||||
@include-section{experience.scrbl} @; Merge with usage?
|
@include-section{experience.scrbl} @; Merge with usage?
|
||||||
@;@include-section{implementation.scrbl}
|
@include-section{implementation.scrbl}
|
||||||
@;@include-section{correctness.scrbl}
|
@;@include-section{correctness.scrbl}
|
||||||
@;@include-section{related.scrbl}
|
@;@include-section{related.scrbl}
|
||||||
@;@include-section{conclusion.scrbl}
|
@;@include-section{conclusion.scrbl}
|
||||||
|
|
|
@ -1,5 +1,6 @@
|
||||||
#lang scribble/sigplan @onecolumn
|
#lang scribble/sigplan @onecolumn
|
||||||
|
@; TODO semantic brackets for translations, but not for sets
|
||||||
|
@; TODO something like semantic brackets for interpretations?
|
||||||
|
|
||||||
@require["common.rkt"]
|
@require["common.rkt"]
|
||||||
|
|
||||||
|
@ -9,29 +10,29 @@ The out-of-bounds reference in @code{[0,1,2] !! 3} is evident from the
|
||||||
definition of @code{!!} and the values passed to it.
|
definition of @code{!!} and the values passed to it.
|
||||||
We also know that @code{1 `div` 0} will go wrong because division by zero is
|
We also know that @code{1 `div` 0} will go wrong because division by zero is
|
||||||
mathematically undefined.
|
mathematically undefined.
|
||||||
Similar reasoning based on the meaning of @racket{%d} and the number of variables
|
Similar reasoning about the meaning of @racket{%d} and the variables
|
||||||
bound by anonymous functions like @code{\ x y z -> x} can determine the correctness
|
bound in @code{(\ x y z -> x)} can determine the correctness
|
||||||
of calls to @racket[printf] and @racket[curry].
|
of calls to @racket[printf] and @racket[curry].
|
||||||
|
|
||||||
In general, our analysis begins with a class of predicates for extracting
|
Generalizing these observations, our analysis begins with a class of
|
||||||
meta-information from expressions; for example the length of a
|
predicates for extracting meta-information from expressions;
|
||||||
list value or arity of a procedure.
|
for example deriving the length of a list value or arity of a procedure value.
|
||||||
|
|
||||||
@exact|{$$\interp\ : \big\{\emph{expr} \rightarrow \emph{Maybe(value)}\big\}$$}|
|
@exact|{$$\interp\ : \big\{\emph{expr} \rightarrow \emph{Maybe\,(value)}\big\}$$}|
|
||||||
|
|
||||||
Applying a function @code{f} @exact|{$\in \interp\ $}| to a syntactically
|
Applying a function @exact|{${\tt f} \in \interp\ $}| to a syntactically
|
||||||
well-formed expression should either yield a value describing some aspect
|
well-formed expression should either yield a value describing some aspect
|
||||||
of the input expression or return a failure result.@note{The name @emph{interp}
|
of the input expression or return a failure result.@note{The name @emph{interp}
|
||||||
is a mnemonic for @emph{interpret} or @emph{interpolant}@~cite[c-jsl-1997].}
|
is a mnemonic for @emph{interpret} or @emph{interpolant}@~cite[c-jsl-1997].}
|
||||||
Correct predicates @code{f} should recognize expressions with some common
|
Correct predicates @code{f} should recognize expressions with some common
|
||||||
structure (not necessarily the expressions' type) and apply a uniform algorithm
|
structure (not necessarily a common type) and apply a uniform algorithm
|
||||||
to computer their result.
|
to computer their result.
|
||||||
The reason for specifying @exact|{$\interp\ $}| over expressions rather than
|
The reason for specifying @exact|{$\interp\ $}| over expressions rather than
|
||||||
values will be made clear in @Secref{sec:usage}.
|
values will be made clear in @Secref{sec:usage}.
|
||||||
|
|
||||||
Once we have predicates for extracting data from the syntax of expressions,
|
Once we have predicates for extracting data from the syntax of expressions,
|
||||||
we can use the data to guide program transformations.
|
we can use the data to guide program transformations.
|
||||||
The main result of this pearl is defining a compelling set of such transformations.
|
@; The main result of this pearl is defining a compelling set of such transformations.
|
||||||
|
|
||||||
@exact|{$$\trans : \big\{ \emph{expr} \rightarrow \emph{expr}\big\} $$}|
|
@exact|{$$\trans : \big\{ \emph{expr} \rightarrow \emph{expr}\big\} $$}|
|
||||||
|
|
||||||
|
@ -40,39 +41,46 @@ Each @exact|{${\tt g} \in \trans$}| is a partial function such that @exact|{${\t
|
||||||
error.
|
error.
|
||||||
These transformations should be applied to expressions @exact|{${\tt e}$}| before
|
These transformations should be applied to expressions @exact|{${\tt e}$}| before
|
||||||
type-checking; the critera for correct transformations can then be given
|
type-checking; the critera for correct transformations can then be given
|
||||||
in terms of the language's typing judgment @exact|{$\vdash {\tt e} : \tau$}|
|
in terms of the language's typing judgment @exact|{$~\vdash {\tt e} : \tau$}|
|
||||||
and untyped evaluation relation @exact|{$\untyped{{\tt e}} \Downarrow {\tt v}$}|,
|
and evaluation relation @exact|{$\untyped{{\tt e}} \Downarrow {\tt v}$}|,
|
||||||
where @exact|{$\untyped{{\tt e}}$}| is the untyped erasure of @exact|{${\tt e}$}|.
|
where @exact|{$\untyped{{\tt e}}$}| is the untyped erasure of @exact|{${\tt e}$}|.
|
||||||
We also assume a subtyping relation @exact|{$\subt$}| on types.
|
We also assume a subtyping relation @exact|{$\subt$}| on types.
|
||||||
|
|
||||||
@itemlist[
|
@itemlist[
|
||||||
@item{@emph{
|
@item{@emph{
|
||||||
If @exact|{$\vdash {\tt e} : \tau$}| and @exact|{$\vdash {\tt e'} : \tau'$}|
|
If @exact|{$~\vdash {\tt e} : \tau$}| and @exact|{$~\vdash {\tt e'} : \tau'$}|
|
||||||
then @exact|{$\tau' \subt \tau$}| and both
|
@exact|{\\}|
|
||||||
|
then @exact|{$\tau' \subt \tau$}|
|
||||||
|
@exact|{\\}|
|
||||||
|
and both
|
||||||
@exact|{$\untyped{{\tt e}} \Downarrow {\tt v}$}| and
|
@exact|{$\untyped{{\tt e}} \Downarrow {\tt v}$}| and
|
||||||
@exact|{$\untyped{{\tt e'}} \Downarrow {\tt v}$}|.
|
@exact|{$\untyped{{\tt e'}} \Downarrow {\tt v}$}|.
|
||||||
}}
|
}}
|
||||||
@; e:t e':t' => t' <: t /\ e <=> e'
|
@; e:t e':t' => t' <: t /\ e <=> e'
|
||||||
|
|
||||||
@item{@emph{
|
@item{@emph{
|
||||||
If @exact|{$\not\vdash {\tt e} : \tau$}| but @exact|{$\vdash {\tt e'} : \tau'$}|
|
If @exact|{$~\not\vdash {\tt e} : \tau$}| but @exact|{$~\vdash {\tt e'} : \tau'$}|
|
||||||
|
@exact|{\\}|
|
||||||
then @exact|{$\untyped{{\tt e}} \Downarrow {\tt v}$}| and
|
then @exact|{$\untyped{{\tt e}} \Downarrow {\tt v}$}| and
|
||||||
@exact|{$\untyped{{\tt e'}} \Downarrow {\tt v}$}|.
|
@exact|{$\untyped{{\tt e'}} \Downarrow {\tt v}$}|.
|
||||||
}}
|
}}
|
||||||
@; -e:t e':t' => e <=> e'
|
@; -e:t e':t' => e <=> e'
|
||||||
|
|
||||||
@item{@emph{
|
@item{@emph{
|
||||||
If @exact|{$\vdash {\tt e} : \tau$}| but @exact|{${\tt e'} = \bot$}|
|
If @exact|{$~\vdash {\tt e} : \tau$}| but @exact|{${\tt e'} = \bot$}|
|
||||||
or @exact|{$\not\vdash {\tt e'} : \tau'$}|
|
or @exact|{$~\not\vdash {\tt e'} : \tau'$}|
|
||||||
then @exact|{$\untyped{{\tt e}} \Downarrow \mathsf{wrong}$}| or diverges.
|
@exact|{\\}|
|
||||||
|
then @exact|{$\untyped{{\tt e}} \Downarrow \mathsf{wrong}$}| or
|
||||||
|
@exact|{$\untyped{{\tt e}}$}| diverges.
|
||||||
}}
|
}}
|
||||||
@; e:t -e':t' => e^
|
@; e:t -e':t' => e^
|
||||||
]
|
]
|
||||||
|
|
||||||
If neither @exact|{${\tt e}$}| nor @exact|{${\tt e'}$}| type checks, then we have no guarantees
|
If neither @exact|{${\tt e}$}| nor @exact|{${\tt e'}$}| type checks, then we have no guarantees
|
||||||
about the run-time behavior of either term.
|
about the run-time behavior of either term.
|
||||||
The hope is that both diverge, but proving this fact in a realistic language is
|
In a perfect world both would diverge, but the fundamental limitations of
|
||||||
more trouble than it is worth.
|
static typing@~cite[fagan-dissertation-1992] and computability apply to our
|
||||||
|
humble system.
|
||||||
|
|
||||||
@; Erasure, moral, immoral
|
@; Erasure, moral, immoral
|
||||||
Finally, we say that a translation @exact|{${\tt (g~e) = e'}$}| is @emph{moral} if
|
Finally, we say that a translation @exact|{${\tt (g~e) = e'}$}| is @emph{moral} if
|
||||||
|
@ -80,6 +88,7 @@ Finally, we say that a translation @exact|{${\tt (g~e) = e'}$}| is @emph{moral}
|
||||||
Otherwise, the tranlation has altered more than just type annotations and is
|
Otherwise, the tranlation has altered more than just type annotations and is
|
||||||
@emph{immoral}.
|
@emph{immoral}.
|
||||||
All our examples in @Secref{sec:intro} can be implemented as moral translations.
|
All our examples in @Secref{sec:intro} can be implemented as moral translations.
|
||||||
|
@; @todo{really, even curry? well it's just picking from an indexed family}
|
||||||
Immoral translations are harder to show correct, but also much more useful.
|
Immoral translations are harder to show correct, but also much more useful.
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -35,3 +35,6 @@
|
||||||
\newcommand{\untyped}[1]{\lfloor #1 \rfloor}
|
\newcommand{\untyped}[1]{\lfloor #1 \rfloor}
|
||||||
\newcommand{\trans}{\llbracket\emph{transform}\rrbracket}
|
\newcommand{\trans}{\llbracket\emph{transform}\rrbracket}
|
||||||
\newcommand{\subt}{\mathbf{<:\,}}
|
\newcommand{\subt}{\mathbf{<:\,}}
|
||||||
|
\newcommand{\tos}{\mathsf{types_of_spec}}
|
||||||
|
\newcommand{\trt}[1]{\emph{#1}}
|
||||||
|
\newcommand{\tprintf}{\mathsf{t_printf}}
|
||||||
|
|
|
@ -1,138 +1,512 @@
|
||||||
#lang scribble/sigplan
|
#lang scribble/sigplan
|
||||||
@require["common.rkt"]
|
@(require "common.rkt" racket/string racket/sandbox)
|
||||||
|
|
||||||
|
@; TODO 'need to' etc
|
||||||
|
@; TODO clever examples
|
||||||
|
@; TODO definition pane for examples
|
||||||
|
@; +------------
|
||||||
|
@; | f \in \interp
|
||||||
|
@; |
|
||||||
|
@; | > (f x)
|
||||||
|
@; | y
|
||||||
|
@; +------------
|
||||||
|
@; TODO -- maybe a better title is "what can we learn from values"?
|
||||||
|
@; TODO remove all refs to implementation?
|
||||||
|
|
||||||
|
|
||||||
@title[#:tag "sec:usage"]{Real-World Metaprogramming}
|
@title[#:tag "sec:usage"]{Real-World Metaprogramming}
|
||||||
|
@;@(define base-eval
|
||||||
|
@; (make-evaluator 'racket/base
|
||||||
|
@; (sandbox-output 'string)
|
||||||
|
@; (sandbox-error-output 'string)
|
||||||
|
@; (error-display-handler (lambda (x y) (displayln "wepa")))))
|
||||||
|
|
||||||
We have defined useful letter-of-values transformations for a variety of
|
We have defined useful letter-of-values transformations for a variety of
|
||||||
common programming tasks ranging from type-safe string formatting to
|
common programming tasks ranging from type-safe string formatting to
|
||||||
constant-folding of arithmetic.
|
constant-folding of arithmetic.
|
||||||
These transformations are implemented in Typed Racket@~cite[TypedRacket],
|
These transformations are implemented in Typed Racket@~cite[TypedRacket],
|
||||||
which inherits Racket's powerful macro system@~cite[plt-tr1].
|
which inherits Racket's powerful macro system@~cite[plt-tr1].
|
||||||
|
For us, this means that Typed Racket comes equipped with powerful tools
|
||||||
|
for analyzing and transforming the syntax of programs before any type checking
|
||||||
|
occurs.
|
||||||
|
|
||||||
Our exposition does not assume any knowledge of Racket or Typed Racket, but
|
@;Our exposition does not assume any knowledge of Racket or Typed Racket, but
|
||||||
a key design choice of the Typed Racket language model bears mentioning:
|
@; a key design choice of the Typed Racket language model bears mentioning:
|
||||||
evaluation of a Typed Racket program happens across three distinct stages,
|
@; evaluation of a Typed Racket program happens across three distinct stages,
|
||||||
shown in @Figure-ref{fig:staging}.
|
@; shown in @Figure-ref{fig:staging}.
|
||||||
First the program is read and macro-expanded; as expanding a macro introduces
|
@;First the program is read and macro-expanded; as expanding a macro introduces
|
||||||
new code, the result is recursively read and expanded until no macros remain.
|
@; new code, the result is recursively read and expanded until no macros remain.
|
||||||
Next, the @emph{fully-expanded} Typed Racket program is type checked.
|
@;@; TODO stronger distinction between read/expand and typecheck
|
||||||
If checking succeeds, types are erased and the program is handed to the Racket
|
@;Next, the @emph{fully-expanded} Typed Racket program is type checked.
|
||||||
compiler.
|
@;If checking succeeds, types are erased and the program is handed to the Racket
|
||||||
For us, this means that we can implement @exact|{$\trans\ $}|
|
@; compiler.
|
||||||
functions as macros referencing @exact|{$\interp\ $}| functions and rely
|
@;
|
||||||
on the macro expander to invoke our transformations before the type checker runs.
|
@;For us, this means that we can implement @exact|{$\trans$}|
|
||||||
|
@; functions as macros and rely on the macro expander to invoke our
|
||||||
|
@; transformations before the type checker runs.
|
||||||
|
@;The transformations can use @exact|{$\interp$}| functions as
|
||||||
|
@; needed to complete their analysis.
|
||||||
|
@;
|
||||||
|
@;@figure["fig:staging"
|
||||||
|
@; @list{Typed Racket language model}
|
||||||
|
@; @exact|{\input{fig-staging}}|
|
||||||
|
@;]
|
||||||
|
|
||||||
@figure["fig:staging"
|
|
||||||
@list{Typed Racket language model}
|
|
||||||
@exact|{\input{fig-staging}}|
|
|
||||||
]
|
|
||||||
|
|
||||||
Though we describe each of the following transformations using in-line constant
|
@parag{Conventions}
|
||||||
values, our implementation applies @exact|{$\interp\ $}| functions to every
|
Interpretation and transformation functions are defined over syntactic expressions
|
||||||
definition and let-binding in the program and then associates compile-time
|
and values.
|
||||||
data with the bound identifier.
|
To make clear the difference between Typed Racket terms and syntactic representations
|
||||||
When a defined value flows into a function like @racket[printf] without being
|
of terms, we quote the latter and typeset it in green.
|
||||||
mutated along the way, we retrieve this cached information.
|
Using this notation, @racket[(λ (x) x)] is a term implementing the identity
|
||||||
The macro system features used to implement this behavior are described in
|
function and @racket['(λ (x) x)] is a syntactic object that will evaluate
|
||||||
@Secref{sec:implementation}.
|
to the identity function.
|
||||||
|
Values are typeset in green because their syntax and term representations are identical.
|
||||||
|
|
||||||
@; regexp-match
|
In practice, syntax objects carry lexical information.
|
||||||
@; format
|
Such information is extremely important, but to simplify our presentation we
|
||||||
@; math
|
pretend it does not exist.
|
||||||
@; vectors
|
Think of this convention as removing the oyster shell to get a clear view of the pearl.
|
||||||
@; arity
|
|
||||||
|
|
||||||
@section{Regexp}
|
Using infix @tt{:} for type annotations, for instance @racket[(x : Integer)].
|
||||||
Regular expression patterns are another common value whose structure is not
|
These are normally written as @racket[(ann x Integer)].
|
||||||
expressed by conventional type systems.
|
|
||||||
Consider the function @racket[regexp-match] from Typed Racket:
|
|
||||||
|
@; =============================================================================
|
||||||
|
@section{String Formatting}
|
||||||
|
|
||||||
|
@; TODO add note about the ridiculous survey figure? Something like 4/5 doctors
|
||||||
|
Format strings are the world's second most-loved domain-specific language (DSL).
|
||||||
|
@; @~cite[wmpk-algol-1968]
|
||||||
|
At first glance, a format string is just a string; in particular, any string
|
||||||
|
used as the first argument in a call to @racket[printf].
|
||||||
|
But between the quotation marks, format strings may contain @emph{format directives}
|
||||||
|
that tell @emph{where} and @emph{how} values can be spliced into the format string.
|
||||||
|
@; TODO scheme or common lisp?
|
||||||
|
Racket follows the Lisp tradition@~cite[s-lisp-1990] of using a tilde character (@tt{~})
|
||||||
|
to prefix format directives.
|
||||||
|
For example, @racket[~s] converts any value to a string and @racket[~b] converts a
|
||||||
|
number to binary form.
|
||||||
|
|
||||||
@interaction[
|
@interaction[
|
||||||
(regexp-match #rx"types" "types@ccs.neu.edu")
|
(printf "binary(~s) = ~b" 7 7)
|
||||||
|
|
||||||
(regexp-match #rx"lambda" "types@ccs.neu.edu")
|
|
||||||
|
|
||||||
(regexp-match #rx"(.*)@(.*)" "types@ccs.neu.edu")
|
|
||||||
]
|
]
|
||||||
@; TODO note that nested groups cannot fail.
|
|
||||||
|
|
||||||
When called with a pattern @racket[p] and string @racket[s] to match with,
|
If the format directives do not match the arguments to @racket[printf], most
|
||||||
@racket[(regexp-match p s)] returns a list of all substrings in @racket[s]
|
languages fail at run-time@~cite[a-icfp-1999].
|
||||||
matching groups in the pattern @racket[p].
|
This is a simple kind of value error that should be caught statically.
|
||||||
The pattern itself counts for one group and parentheses within @racket[p]
|
|
||||||
delimit nested groups.
|
|
||||||
Hence the third example returns a list of three elements.
|
|
||||||
A match can also fail, in which case the value @racket[#f] is returned.
|
|
||||||
|
|
||||||
Although Typed Racket's @racket[regexp-match] has the type@note{Assuming that a
|
@; TODO print errors nicer
|
||||||
successful match implies that all nested groups successfully match.}
|
@interaction[
|
||||||
@racket[(Regexp String -> (U #f (Listof String)))], the number of strings
|
(printf "binary(~s) = ~b" "7" "7")
|
||||||
in the result list is determined by the number of groups in the input regular
|
]
|
||||||
expression.
|
|
||||||
Like the arity of a function or the size of a tuple, the number of groups
|
Detecting inconsistencies between a format string and its arguments is easy
|
||||||
in a pattern is often clear to the programmer.
|
provided we have a function @racket[fmt->types] @exact|{$\in \interp$}| for
|
||||||
We could imagine using
|
reading types from a format string.
|
||||||
an indexed family of @racket[regexp-match] functions for patterns of
|
In Typed Racket this function is rather simple because the most common
|
||||||
two, three, or more groups.
|
directives accept @code{Any} type of value.
|
||||||
Ideally though, the programming language should understand
|
Such are the joys of uniform syntax---printing is free.
|
||||||
these unconventional or domain-specific flavors of polymorphism.
|
|
||||||
|
@racketblock[
|
||||||
|
> (fmt->types "binary(~s) = ~b")
|
||||||
|
'[Any Integer]
|
||||||
|
> (fmt->types '(λ (x) x))
|
||||||
|
#false
|
||||||
|
]
|
||||||
|
|
||||||
|
Now to use @racket[fmt->types] in a function @racket[t-printf] @exact|{$\in \trans$}|.
|
||||||
|
Given a call to @racket[printf], we validate the number of arguments and
|
||||||
|
add type annotations derived using @racket[fmt->types].
|
||||||
|
For all other syntax patterns, @racket[t-printf] is the identity transformation.
|
||||||
|
|
||||||
|
@racketblock[
|
||||||
|
> (t-printf '(printf "~a"))
|
||||||
|
⊥
|
||||||
|
> (t-printf '(printf "~b" "2"))
|
||||||
|
'(printf "~b" ("2" : Integer))
|
||||||
|
> (t-printf printf)
|
||||||
|
'printf
|
||||||
|
]
|
||||||
|
|
||||||
|
The first example is rejected immediately as a syntax error.
|
||||||
|
The second is temporarily accepted, but will cause a static type error.
|
||||||
|
Put another way, the format string @racket{~b} specializes the type of
|
||||||
|
@racket[printf] from @racket[(String Any * -> Void)] to @racket[(String Integer -> Void)].
|
||||||
|
The third is slightly more interesting; it demonstrates that higher-order
|
||||||
|
uses of @racket[printf] default to the standard behavior.
|
||||||
|
|
||||||
|
|
||||||
@section{Database}
|
@; =============================================================================
|
||||||
|
@section{Regular Expressions}
|
||||||
|
Moving now from the second most-loved DSL to the first, regular expressions
|
||||||
|
are often used to capture sub-patterns within a string.
|
||||||
|
|
||||||
|
@racketblock[
|
||||||
|
> (regexp-match #rx"(.*)@(.*)" "toni@merchant.net")
|
||||||
|
'("toni@merchant.net" "toni" "merchant.net")
|
||||||
|
]
|
||||||
|
|
||||||
|
The first argument to @racket[regexp-match] is a regular expression pattern.
|
||||||
|
Inside the pattern, the parentheses delimit sub-pattern @emph{groups}, the dots match
|
||||||
|
any single character, and the Kleene star matches zero-or-more repetitions
|
||||||
|
of the pattern-or-character preceding it.
|
||||||
|
The second argument is a string to match against the pattern.
|
||||||
|
If the match succeeds, the result is a list containing the entire matched string
|
||||||
|
and substrings corresponding to each group captured by a sub-pattern.
|
||||||
|
If the match fails, Racket's @racket[regexp-match] returns @racket[#false].
|
||||||
|
|
||||||
|
@racketblock[
|
||||||
|
> (regexp-match #rx"-(2*)-" "111-222-3333")
|
||||||
|
'("-222-" "222")
|
||||||
|
> (regexp-match #rx"¥(.*)" "$2,000")
|
||||||
|
#false
|
||||||
|
]
|
||||||
|
|
||||||
|
Certain groups can also fail to capture even when the overall match succeeds.
|
||||||
|
This can happen, for example, when a group is followed by a Kleene star.
|
||||||
|
|
||||||
|
@racketblock[
|
||||||
|
> (regexp-match #rx"(a)*(b)" "b")
|
||||||
|
'("b" #f "b")
|
||||||
|
]
|
||||||
|
|
||||||
|
Therefore, a simple catch-all type for @racket[regexp-match] is
|
||||||
|
@racket[(Regexp String -> (U #f (Listof (U #f String))))].
|
||||||
|
Using the general type, however, is cumbersome for simple patterns
|
||||||
|
where a match implies that all groups will successfully capture.
|
||||||
|
|
||||||
|
@;@(define tr-eval (make-base-eval '(require typed/racket/base racket/list)))
|
||||||
|
@racketblock[
|
||||||
|
> (define (get-domain [full-name : String]) : String
|
||||||
|
(cond
|
||||||
|
[(regexp-match #rx"(.*)@(.*)" full-name)
|
||||||
|
=> third]
|
||||||
|
[else "Match Failed"]))
|
||||||
|
]
|
||||||
|
@;Error: expected String, got (U #false String)
|
||||||
|
@; `third` could not be applied to arguments
|
||||||
|
@; Arguments: (Listof (U #false String))
|
||||||
|
@; Expected Result: String
|
||||||
|
|
||||||
|
Analysing the parentheses contained in a regular expression pattern can often
|
||||||
|
determine the number of groups statically.
|
||||||
|
We implement this as a function @racket[rx->groups] @exact|{$\in \interp$}|
|
||||||
|
|
||||||
|
@racketblock[
|
||||||
|
> (rx->groups #rx"(a)(b)(c)")
|
||||||
|
3
|
||||||
|
> (rx->groups #rx"((a)b)")
|
||||||
|
2
|
||||||
|
> (rx->groups #rx"(a)*(b)")
|
||||||
|
#false
|
||||||
|
]
|
||||||
|
|
||||||
|
@; TODO can we not talk about casts?
|
||||||
|
The corresponding transformation
|
||||||
|
@racket[t-regexp] @exact|{$\in \trans$}|
|
||||||
|
inserts casts to refine the type of results
|
||||||
|
produced by @racket[regexp-match].
|
||||||
|
It also flags malformed groups in uncompiled regular expressions.
|
||||||
|
|
||||||
|
@racketblock[
|
||||||
|
> (t-regexp '(regexp-match #rx"(a)b" str))
|
||||||
|
'(cast (regexp-match #rx"(a)b" str)
|
||||||
|
(U #f (List String String)))
|
||||||
|
> (t-regexp '(regexp-match "(" str))
|
||||||
|
⊥
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
@; =============================================================================
|
||||||
|
@section{Procedure Arity}
|
||||||
|
|
||||||
|
Anonymous functions are another value form whose representation contains
|
||||||
|
useful data.
|
||||||
|
By tokenizing symbolic λ-expressions, we can parse their domain
|
||||||
|
syntactically in a function @racket[fun->domain] @exact|{$\in \interp$}|
|
||||||
|
|
||||||
|
@racketblock[
|
||||||
|
> (fun->arity '(λ (x y z) (x (z y) y)))
|
||||||
|
'[Any Any Any]
|
||||||
|
> (fun->arity '(λ ([x : Real] [y : Real]) x))
|
||||||
|
'[Real Real]
|
||||||
|
]
|
||||||
|
|
||||||
|
When domain information is available at call sites to a @racket[curry] function,
|
||||||
|
we can produce code that will pass typechecking.
|
||||||
|
One method is to swap @racket[curry] with an indexed version,
|
||||||
|
as demonstrated by @racket[t-swap] @exact|{$\in \trans$}|
|
||||||
|
|
||||||
|
@racketblock[
|
||||||
|
> (t-swap '(curry (λ (x y) x)))
|
||||||
|
'(curry_1 (λ (x y) x))
|
||||||
|
]
|
||||||
|
|
||||||
|
Another option is to implement @racket[curry] with unsafe tools from
|
||||||
|
the language runtime and assign it the trivial type @racket[(⊥ -> ⊤)]
|
||||||
|
A transformation would replace this type with a subtype where possible
|
||||||
|
and let the type checker reject other calls.
|
||||||
|
Our implementation employs the first strategy, but generates a new @racket[curry_i]
|
||||||
|
at each call site by folding over the inferred function domain.
|
||||||
|
@; Mention this later, or show code?
|
||||||
|
|
||||||
|
|
||||||
|
@; =============================================================================
|
||||||
|
@section{Constant Folding}
|
||||||
|
|
||||||
|
The identity interpretation is useful for lifting constant values to
|
||||||
|
the compile-time environment.
|
||||||
|
@racket[id] @exact|{$\in \interp$}|
|
||||||
|
|
||||||
|
@racketblock[
|
||||||
|
> (id 2)
|
||||||
|
2
|
||||||
|
> (id "~s")
|
||||||
|
"~s"
|
||||||
|
]
|
||||||
|
|
||||||
|
When composed with a filter, we can recognize classes of constants.
|
||||||
|
That is, if @racket[int?] is the identity function on integer values
|
||||||
|
and the constant function returning @racket[#false] otherwise, we have
|
||||||
|
(@racket[int?]@exact{$\,\circ\,$}@racket[id])@exact|{$\,\in \interp$}|
|
||||||
|
|
||||||
|
@racketblock[
|
||||||
|
> (int? (id 2))
|
||||||
|
2
|
||||||
|
> (int? (id "~s"))
|
||||||
|
#false
|
||||||
|
]
|
||||||
|
|
||||||
|
Using this technique we can detect division by the constant zero and
|
||||||
|
implement constant-folding transformation of arithmetic operations.
|
||||||
|
@racket[t-arith] @exact{$\in \interp$}
|
||||||
|
|
||||||
|
@racketblock[
|
||||||
|
> (t-arith '(/ 1 0))
|
||||||
|
⊥
|
||||||
|
> (t-arith '(* 2 3 7 z))
|
||||||
|
'(* 42 z)
|
||||||
|
]
|
||||||
|
|
||||||
|
These transformations are most effective when applied bottom-up, from the
|
||||||
|
leaves of a syntax tree to the root.
|
||||||
|
|
||||||
|
@racketblock[
|
||||||
|
> (t-arith '(/ 1 (- 5 5)))
|
||||||
|
⊥
|
||||||
|
]
|
||||||
|
|
||||||
|
@Secref{sec:implementation} describes how we accomplish this recursion scheme
|
||||||
|
with the Racket macro system.
|
||||||
|
@; The implementation of @racket[t-arith] expects flat/local data.
|
||||||
|
|
||||||
|
|
||||||
|
@; =============================================================================
|
||||||
|
@section{Sized Data Structures}
|
||||||
|
|
||||||
|
Vector bounds errors are always frustrating to debug, especially since
|
||||||
|
their cause is rarely deep.
|
||||||
|
In many cases, vectors are declared with a constant size and
|
||||||
|
accessed at constant positions, plus or minus an offset.@note{Loops do not
|
||||||
|
fit this pattern, but many languages already provide for/each constructs
|
||||||
|
to streamline common looping patterns.}
|
||||||
|
But unless the compiler or runtime system tracks vector bounds, the programmer
|
||||||
|
must unfold definitions and manually find the source of the problem.
|
||||||
|
|
||||||
|
A second, more compelling reason to track vector bounds is to remove the
|
||||||
|
run-time checks inserted by memory-safe languages.
|
||||||
|
If we can statically prove that a reference is in-bounds, we should be able
|
||||||
|
to optimize it.
|
||||||
|
|
||||||
|
Racket provides a few ways to create vectors.
|
||||||
|
Length information is explicit in some and derivable for
|
||||||
|
others, provided we interpret sub-expressions recursively.
|
||||||
|
@racket[vector->length] @exact{$\in \interp$}
|
||||||
|
|
||||||
|
@racketblock[
|
||||||
|
> (vector->length '#(0 1 2 3))
|
||||||
|
3
|
||||||
|
> (vector->length '(make-vector 100 #true))
|
||||||
|
100
|
||||||
|
> (vector->length '(vector-append #(left) #(right)))
|
||||||
|
7
|
||||||
|
> (vector->length '(vector-drop #(A B C) 2))
|
||||||
|
1
|
||||||
|
]
|
||||||
|
|
||||||
|
Once we have sizes associated with compile-time vectors,
|
||||||
|
we can define optimized translations of built-in functions.
|
||||||
|
For the following, assume that @racket[unsafe-ref] is an unsafe version of @racket[vector-ref].
|
||||||
|
@racket[t-vec] @exact{$\in \trans$}
|
||||||
|
|
||||||
|
@racketblock[
|
||||||
|
> (t-vec '(vector-ref #(Y E S) 2))
|
||||||
|
'(unsafe-ref #(Y E S) 2)
|
||||||
|
> (t-vec '(vector-ref #(N O) 2))
|
||||||
|
⊥
|
||||||
|
> (t-vec '(vector-length (make-vector 100 #true)))
|
||||||
|
100
|
||||||
|
]
|
||||||
|
|
||||||
|
We can define vectorized arithmetic using similar translations.
|
||||||
|
Each operation match the length of its arguments at compile-time and expand
|
||||||
|
to an efficient implementation.
|
||||||
|
|
||||||
|
|
||||||
|
@; =============================================================================
|
||||||
|
@section{Database Queries}
|
||||||
@; db
|
@; db
|
||||||
@; TODO Ocaml, Haskell story
|
@; TODO Ocaml, Haskell story
|
||||||
|
|
||||||
Racket's @racket[db] library provides a direct connection to the database,
|
Racket's @racket[db] library provides a string-based API for executing SQL
|
||||||
modulo some important safety checks.
|
queries.
|
||||||
|
Connecting to a database requires only a username.
|
||||||
Typed Racket programmers may use the @racket[db] library by giving type signatures
|
Queries are strings, but may optionally reference query parameters---natural numbers
|
||||||
specialized to their needs.
|
prefixed by a dollar sign (@tt{$}).
|
||||||
For example, the statement below asserts that @racket[query-row] always returns
|
Arguments substituted for query parameters are guarded against SQL injection.
|
||||||
a natural number and a string.
|
|
||||||
In contrast, @racket[query-maybe-row] can optionally fail, but otherwise returns
|
|
||||||
a natural number and a string.
|
|
||||||
|
|
||||||
@racketblock[
|
@racketblock[
|
||||||
(require/typed db
|
> (define C (sql-connect #:user "shylock"
|
||||||
(#:opaque Connection connection?)
|
#:database "accounts"))
|
||||||
(sql-connect (->* () (#:user String #:database String) Connection))
|
> (query-row C
|
||||||
(query-row (-> Connection String Any * (Vector Natural String)))
|
"SELECT amt_owed FROM loans WHERE name = '$1'"
|
||||||
(query-maybe-row (-> Connection String Any * (Option (Vector Natural String))))
|
"antonio")
|
||||||
....)
|
3000
|
||||||
]
|
]
|
||||||
|
|
||||||
To manage additional tables, the programmer can re-import @racket[db] identifiers
|
This is a far cry from language-integrated query@~cite[mbb-sigmod-2006], but the interface
|
||||||
with a fresh name and type signature.
|
is relatively safe and very simple to use.
|
||||||
|
|
||||||
@todo{coherent example}
|
Typed Racket programmers may use the @racket[db] library by assigning specific
|
||||||
|
type signatures to functions like @racket[query-row].
|
||||||
|
This is somewhat tedious, as the distinct operations for querying one value,
|
||||||
|
one row, or a lazy sequence of rows each need a type.
|
||||||
|
A proper type signature might express the database library as a functor over the
|
||||||
|
database schema, but Typed Racket does not have functors or even existential types.
|
||||||
|
Even if it did, the queries-as-strings interface makes it impossible for a standard
|
||||||
|
type checker to infer type constraints on query parameters.
|
||||||
|
|
||||||
|
The situation worsens if the programmer uses multiple database connections.
|
||||||
|
One can either alias one query function to multiple identifiers (each with a specific type)
|
||||||
|
or weaken type signatures and manually type-cast query results.
|
||||||
|
|
||||||
|
Our technique provides a solution.
|
||||||
|
We associate database connections with a compile-time value representing the
|
||||||
|
database schema.
|
||||||
|
Operations like @racket[query-row] then extract the schema from a connection
|
||||||
|
and interpret type constraints from the query string.
|
||||||
|
These constraints are passed to the type system in two forms: as annotations
|
||||||
|
on expressions used as query parameters and as casts on the result of a query.
|
||||||
|
Note that the casts are not guaranteed to succeed---in particular, the schema
|
||||||
|
may differ from the actual types in the database.
|
||||||
|
|
||||||
|
In total, the solution requires three interpretation functions and at least two
|
||||||
|
transformations.
|
||||||
|
First, we implement a predicate to recognize schema values.
|
||||||
|
A schema is a list of table schemas; a table schema is a tagged list of
|
||||||
|
column names and column types.
|
||||||
|
|
||||||
|
(@racket[schema?]@exact{$\,\circ\,$}@racket[id]) @exact{$\in \interp$}
|
||||||
@racketblock[
|
@racketblock[
|
||||||
(require/typed (prefix-in w: db)
|
> (schema? '([loans [(id . Natural)
|
||||||
(w:query-row (-> Connection String Any * (Vector Natural String))))
|
(name . String)
|
||||||
|
(amt_owed . Integer)]]))
|
||||||
(require/typed (prefix-in s: db)
|
'([loans ...])
|
||||||
(s:query-row (-> Connection String Any * (Vector Natural Natural))))
|
|
||||||
]
|
]
|
||||||
|
|
||||||
This approach works, but is tedious and error-prone.
|
Composed with the identity interpretation, this predicate lifts schema values
|
||||||
Our contribution is to associate a type signature with database connections.
|
from the program text to the compile-time environment.
|
||||||
Calls to generic operations like @racket[query-row] are then specialized to
|
|
||||||
the exact result type.
|
|
||||||
|
|
||||||
@; PERKS
|
@; TODO this must be clearer
|
||||||
@; - Single point-of-control
|
@; TODO note that this isn't a drop-in replacement
|
||||||
@; - lightweight
|
Next, we need to associate database connections with database schemas.
|
||||||
@; - no changes to underlying library / type system
|
We do this by transforming calls to @racket[sql-connect]; connections made
|
||||||
@; - compile-time validation of query strings
|
without an explicit schema argument are rejected.
|
||||||
|
|
||||||
Incidentally, @racket[regexp-match:] is useful for parsing queries.
|
(@racket[sc]) @exact{$\in \trans$}
|
||||||
We also implemented @racket[query-row] to return a sized vector.
|
@racketblock[
|
||||||
|
> (sc '(sql-connect #:user "shylock"
|
||||||
|
#:database "accounts"))
|
||||||
|
⊥
|
||||||
|
> (sc '(sql-connect ([loans ...])
|
||||||
|
#:user "shylock"
|
||||||
|
#:database "accounts"))
|
||||||
|
'(sql-connect #:user "shylock"
|
||||||
|
#:database "accounts")
|
||||||
|
]
|
||||||
|
|
||||||
|
To be consistent with our other examples, we should now describe an
|
||||||
|
interpretation from connection objects to schemas.
|
||||||
|
That is, we should be able to extract the schema for the @racket{accounts} database
|
||||||
|
from the syntactic representation of @racket{shylock}'s connection object.
|
||||||
|
Our implementation, however, defines @racket[connection->schema] @exact{$\in \interp$} as
|
||||||
|
@racket[(λ (x) #false)].
|
||||||
|
Instead, we assume that all connection objects are named and rely on a general
|
||||||
|
technique for associating compile-time data with identifiers.
|
||||||
|
See @Secref{sec:def-overview} for an overview and @Secref{sec:def-implementation}
|
||||||
|
for details.
|
||||||
|
|
||||||
|
Our final interpretation function tokenizes query strings and returns
|
||||||
|
column and table information.
|
||||||
|
In particular, we parse @tt{SELECT} statements and extract
|
||||||
|
@itemlist[
|
||||||
|
@item{the names of selected columns,}
|
||||||
|
@item{the table name, and}
|
||||||
|
@item{an association from query parameters to column names.}
|
||||||
|
]
|
||||||
|
|
||||||
|
@racketblock[
|
||||||
|
> "SELECT amt_owed FROM loans WHERE name = '$1'"
|
||||||
|
'[(amt_owed) loans ($1 . name)]
|
||||||
|
> "SELECT * FROM loans"
|
||||||
|
'[* loans ()]
|
||||||
|
]
|
||||||
|
|
||||||
|
The schema, connection, and query constraints now come together in transformations
|
||||||
|
such as @racket[t] @exact{$\in \trans$}.
|
||||||
|
There is still a non-trivial amount of work to be done resolving wildcards and
|
||||||
|
validating table and row names before the type-annotated result is produced,
|
||||||
|
but all the necessary information is available.
|
||||||
|
|
||||||
|
@racketblock[
|
||||||
|
> (t '(query-row C
|
||||||
|
"SELECT amt_owed FROM loans WHERE name = '$1'"
|
||||||
|
"antonio"))
|
||||||
|
'(cast (query-row C
|
||||||
|
"SELECT amt_owed FROM loans WHERE name = '$1'"
|
||||||
|
("antonio" : String))
|
||||||
|
(Vector Integer))
|
||||||
|
> (t '(query-row C
|
||||||
|
"SELECT * FROM loans WHERE name = '$1'"
|
||||||
|
"antonio"))
|
||||||
|
'(cast (query-row C
|
||||||
|
"SELECT amt_owed FROM loans WHERE name = '$1'"
|
||||||
|
("antonio" : String))
|
||||||
|
(Vector Natural String Integer))
|
||||||
|
> (t '(query-row C "SELECT * FROM itunes"))
|
||||||
|
⊥
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
While light years away from LINQ, this technique provides at least basic
|
@; =============================================================================
|
||||||
safety guarantees for an existing library and demonstrates the generality of
|
@section[#:tag "sec:def-overview"]{Definitions and Let-bindings}
|
||||||
our technique.
|
|
||||||
|
|
||||||
|
Though we described each of the above transformations using in-line constant
|
||||||
|
values, our implementation cooperates with definitions and let bindings.
|
||||||
|
By means of an example, the difference @racket[(- n m)] in the following program
|
||||||
|
is compiled to the value 0.
|
||||||
|
|
||||||
@section{}
|
@racketblock[
|
||||||
@; identifier macros
|
> (define m (* 6 7))
|
||||||
@; rename transformers
|
> (let ([n m])
|
||||||
|
(- m n))
|
||||||
|
]
|
||||||
|
|
||||||
|
Conceptually, we accomplish this by applying known functions
|
||||||
|
@exact{${\tt f} \in \interp$} at definition sites and keeping a compile-time
|
||||||
|
mapping from identifiers to interpreted data.
|
||||||
|
Thanks to Racket's meta-programming tools, implementing this table in a way
|
||||||
|
that cooperates with aliases and lexical scope is simple.
|
||||||
|
|
||||||
|
@; TODO set!, for defines and lets
|
||||||
|
|
Loading…
Reference in New Issue
Block a user