trivial/icfp-2016/intro.scrbl
2016-06-27 15:59:59 -04:00

132 lines
6.1 KiB
Racket

#lang scribble/sigplan @onecolumn
@require["common.rkt"]
@title[#:tag "sec:intro"]{The Spirit and Letter of the Law}
Well-typed programs @emph{do} go wrong.
All the time, in fact:
@exact|{
\begin{SCodeFlow}\begin{RktBlk}\begin{SingleColumn}\RktSym{{\Stttextmore}}\RktMeta{}\mbox{\hphantom{\Scribtexttt{x}}}\RktMeta{}\RktPn{(}\RktSym{vector{-}ref}\RktMeta{}\mbox{\hphantom{\Scribtexttt{x}}}\RktMeta{}\RktPn{(}\RktSym{make{-}vector}\RktMeta{}\mbox{\hphantom{\Scribtexttt{x}}}\RktMeta{}\RktVal{2}\RktPn{)}\RktMeta{}\mbox{\hphantom{\Scribtexttt{x}}}\RktMeta{}\RktVal{3}\RktPn{)}\RktMeta{}
\RktMeta{}\RktSym{=={\Stttextmore}}\RktMeta{}\mbox{\hphantom{\Scribtexttt{x}}}\RktErr{vector-ref: index is out of range}
\RktMeta{}\RktSym{{\Stttextmore}}\RktMeta{}\mbox{\hphantom{\Scribtexttt{x}}}\RktMeta{}\RktPn{(}\RktSym{/}\RktMeta{}\mbox{\hphantom{\Scribtexttt{x}}}\RktMeta{}\RktVal{1}\RktMeta{}\mbox{\hphantom{\Scribtexttt{x}}}\RktMeta{}\RktVal{0}\RktPn{)}\RktMeta{}
\RktMeta{}\RktSym{=={\Stttextmore}}\RktMeta{}\mbox{\hphantom{\Scribtexttt{x}}}\RktErr{/: division by zero}
\RktMeta{}\RktSym{{\Stttextmore}}\RktMeta{}\mbox{\hphantom{\Scribtexttt{x}}}\RktMeta{}\RktPn{(}\RktSym{printf}\RktMeta{}\mbox{\hphantom{\Scribtexttt{x}}}\RktMeta{}\RktVal{"$\sim$s"}\RktPn{)}\RktMeta{}
\RktMeta{}\RktSym{=={\Stttextmore}}\RktMeta{}\mbox{\hphantom{\Scribtexttt{x}}}\RktErr{printf: format string requires 1 argument}
\end{SingleColumn}\end{RktBlk}\end{SCodeFlow}
}|
Of course, Milner's catchphrase was about preventing type errors.
The above are all @emph{value errors} that depend on properties not expressed
by Typed Racket's standard vector, integer, and string datatypes.
Even so, it is clear to the programmer that the expressions will go wrong.
Likewise, there are useful functions that many type systems cannot express.
Simple examples include a @racket[first] function for tuples of arbitrary size
and a @racket[curry] function for procedures that consume such tuples.
The standard work-around@~cite[fi-jfp-2000] is to maintain size-indexed
families of functions to handle the common cases, for instance:
@codeblock{
> (define (curry_3 f)
(λ (x) (λ (y) (λ (z) (f x y z)))))
}
@; Prelude> let first_3 (x, y, z) = x
These problems are well known, and are often used to motivate research on
dependently typed programming languages@~cite[a-icfp-1999].
Short of abandoning ship for a completely new type system, languages including
Haskell, Java, OCaml, and Typed Racket have seen proposals for detecting
certain values errors or expressing the polymorphism in functions
such as @racket[curry] with few (if any) changes to the existing type system@~cite[ddems-icse-2011 ew-haskell-2012 ks-plpv-2006 kkt-pldi-2016 lb-sigplan-2014].
What stands between these proposals and their adoption is the complexity or
annotation burden they impose on language users.
This pearl describes a low-complexity, annotation-free (@Secref{sec:experience})
technique for detecting
value errors and expressing polymorphism over values.
The key is to run a @emph{textualist}@note{A textualist interprets laws by
reading exactly the words on the page rather than by guessing the words' intended meaning.}
elaboration over programs before type-checking and propagate value information
evident from the program syntax to the type checker.
In terms of the first example in this section, our elaborator infers that the
@racket[vector-ref] at position @racket[3] is out of bounds because it knows that
@racket[(make-vector n)]
constructs a vector with @racket[n] elements.
We have implemented the technique as a Typed Racket library that shadows
functions such as @racket[make-vector] with textualist elaborators following
the guidelines stated in @Secref{sec:solution}.
The implementation makes essential use of Racket's macro system@~cite[fcdb-jfp-2012]
to reason locally, associate inferred data with bound identifiers, and
cooperate with the rules of lexical scope.
For the adventurous reader, @Secref{sec:implementation} describes the
main services provided by Racket's macros.
Nevertheless, Typed Clojure@~cite[clojure-macros],
Rust@~cite[rust-compiler-plugins], and Scala@~cite[ramho-hosc-2013]
could implement our approach just as well using those languages' powerful
macro and compiler plugin systems.
For a sense of the practical use-cases we envision, consider the function
@racket[regexp-match], which matches a regular expression pattern against
a string and returns either a list of matched substrings
or @racket[#false] if the match failed.
@racketblock[
> (regexp-match #rx"(.*) v\\. (.*),"
"Morrison v. Olson, 487 U.S. 654")
'("Morrison v. Olson" "Morrison" "Olson")
]
The parentheses in the regular expression delimit groups to match and
return.
In this example, there are two such groups.
We have written an elaborator for @racket[regexp-match] that will statically
parse its first argument, count groups, and refine the
result type of specific calls to @racket[regexp-match].
The elaborator also handles the common case where the regular expression
argument is a compile-time constant and respects @exact{$\alpha$}-equivalence.
Whereas Typed Racket will raise a type error on the following code because
it cannot be sure @racket[second] will produce a string,
importing our library convinces Typed Racket that the code will succeed.
@codeblock{
(define case-regexp #rx"(.*) v\\. (.*),")
(define rx-match regexp-match)
(define (get-plaintiff (s : String)) : String
(cond
[(rx-match case-regexp s)
=> second]
[else "J. Doe"]))
}
@Secref{sec:usage} has more examples.
@Secref{sec:conclusion} concludes.
@; =============================================================================
@parag{Lineage}
Our technique builds on the approach of Herman and Meunier, who first
demonstrated how Racket macros can propagate
data embedded in string values and syntax patterns to a
static analyzer@~cite[hm-icfp-2004].
Their illustrative examples were format strings, regular expressions,
and database queries.
Relative to their pioneering work, we adapt Herman and Meunier's
transformations to a typed language, suggest new applications, and
describe how to compose the results of analyses.