[icfp] checkpoint, mid-regexp

This commit is contained in:
ben 2016-03-15 14:34:20 -04:00
parent 4fcd414971
commit 0d13cda84b
4 changed files with 109 additions and 75 deletions

View File

@ -44,7 +44,7 @@ The standard work-around@~cite[fi-jfp-2000] is to maintain size-indexed
}
@; Prelude> let first_3 (x, y, z) = x
These problems are well known, and are often used to motive research on
These problems are well known, and are often used to motivate research on
dependently typed programming languages@~cite[a-icfp-1999].
Short of abandoning ship for a completely new type system, languages including
Haskell, OCaml, Java, and Typed Racket have seen proposals for detecting
@ -94,7 +94,7 @@ In this example, there are two groups.
We have written an elaborator for @racket[regexp-match] that will statically
parse its first argument, count these groups, and refine the
result type for specific calls to @racket[regexp-match].
The elaborator @emph{also} handles the common case where the regular expression
The elaborator also handles the common case where the regular expression
argument is a compile-time constant and respects @exact{$\alpha$}-equivalence.
@codeblock{
@ -107,7 +107,7 @@ The elaborator @emph{also} handles the common case where the regular expression
(define (get-plaintiff (s : String)) : String
(cond
[(r-m rx-case s)
=> cadr]
=> second]
[else "J. Doe"]))
}

View File

@ -117,3 +117,8 @@ SoundX -- cannot reporduce this because
[anti-TH](http://stackoverflow.com/questions/10857030/whats-so-bad-about-template-haskell)
---
Not great news for me / metaprogramming
[LMS](https://scala-lms.github.io//publications.html)
---
The good scala macro system.

View File

@ -27,8 +27,8 @@ If @exact|{${\tt p?} \in \interp$}| and @exact|{${\tt e} \in \emph{expr}$}|,
it may be useful to think of
@exact|{${\tt (p?~e)}$}| as @emph{evidence} that the expression @exact|{${\tt e}$}|
is recognized by @exact|{${\tt p?}$}|.
Alternatively, @exact|{${\tt (p?~e)}$}| is a kind of interpolant@~cite[c-jsl-1997]
representing details about @exact|{${\tt e}$}| relevant for program elaboration.
Alternatively, @exact|{${\tt (p?~e)}$}| is a kind of interpolant@~cite[c-jsl-1997],
representing key data embedded in @exact|{${\tt e}$}|.
Correct interpretation functions @exact|{${\tt p?}$}| obey three guidelines:
@itemize[
@ -100,3 +100,4 @@ If neither @exact|{${\tt e}$}| nor @exact|{${\tt e'}$}| type checks, then we hav
In a perfect world both would diverge, but the fundamental limitations of
static typing@~cite[fagan-dissertation-1992] and computability
keep us imperfect.
TODO TODO TODO Extra space hierExtra space hierExtra space hierExtra space

View File

@ -21,7 +21,7 @@ These elaborators are implemented in Typed Racket@~cite[TypedRacket], a macro-ex
typed language that compiles into Racket@~cite[plt-tr1].
An important component of Typed Racket's design is that all macros in a program
are fully expanded before type-checking begins.
This convention lets us implement our elaborators as macros that expand into
This protocol lets us implement our elaborators as macros that expand into
typed code.
@parag{Conventions}
@ -43,8 +43,14 @@ Such information is extremely important, especially for implementing @racket[def
} @item{
We use an infix @tt{:} to write explicit type annotations and casts,
for instance @racket[(x : Integer)].
These are normally @racket[(ann x Integer)] and @racket[(cast x Integer)],
respectively.
These normally have two different syntaxes, respectively
@racket[(ann x Integer)] and @racket[(cast x Integer)].
} @item{
TODO phantom item for space
TODO phantom item for space
TODO phantom item for space
TODO phantom item for space
TODO phantom item for space
} @item{
In @Secref{sec:sql}, @tt{sql} is short for @tt{postgresql}, i.e.
the code we present in that section is only implemented for the @tt{postgresql}
@ -58,9 +64,10 @@ In fact, this practice aligns with our implementation---once the interpretations
]
@; =============================================================================
@section{String Formatting}
@section{Format Strings}
@; TODO add note about the ridiculous survey figure? Something like 4/5 doctors
@; TODO note regexp is the first?
Format strings are the world's second most-loved domain-specific language (DSL).
@; @~cite[wmpk-algol-1968]
All strings are valid format strings; additionally, a format string may contain
@ -72,59 +79,77 @@ Racket follows the Lisp tradition@~cite[s-lisp-1990] of using a tilde character
For example, @racket[~s] converts any value to a string and @racket[~b] converts a
number to binary form.
@interaction[
(printf "binary(~s) = ~b" 7 7)
]
@exact|{
\begin{SCodeFlow}\begin{RktBlk}\begin{SingleColumn}\Scribtexttt{{\Stttextmore} }\RktPn{(}\RktSym{printf}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{"binary($\sim$s) = $\sim$b"}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{7}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{7}\RktPn{)}
\RktOut{binary(7) = 111}
\begin{SingleColumn}\end{SingleColumn}\end{SingleColumn}\end{RktBlk}\end{SCodeFlow}
}|
If the format directives do not match the arguments to @racket[printf], most
languages fail at run-time@~cite[a-icfp-1999].
This is a simple kind of value error that should be caught statically.
languages fail at run-time.
This is a simple kind of value error that could be caught statically.
@; TODO print errors nicer
@interaction[
(printf "binary(~s) = ~b" "7" "7")
]
@exact|{
\begin{SCodeFlow}\begin{RktBlk}\begin{SingleColumn}\Scribtexttt{{\Stttextmore} }\RktPn{(}\RktSym{printf}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{"binary($\sim$s) = $\sim$b"}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{"7"}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{"7"}\RktPn{)}
Detecting inconsistencies between a format string and its arguments is easy
provided we have a function @racket[fmt->types] @exact|{$\in \interp$}| for
reading types from a format string.
\RktErr{printf: format string requires argument of type $<$exact{-}number$>$}
\begin{SingleColumn}\end{SingleColumn}\end{SingleColumn}\end{RktBlk}\end{SCodeFlow}
}|
Detecting inconsistencies between a format string and its arguments is straightforward
if we define an interpretation @racket[fmt->types] @exact|{$\in \interp$}| for
reading types from a format string value.
In Typed Racket this function is rather simple because the most common
directives accept @code{Any} type of value.
Such are the joys of uniform syntax---printing is free.
directives accept @code{Any} type of value---in a language with uniform syntax,
printing comes for free.
@racketblock[
> (fmt->types "binary(~s) = ~b")
'[Any Integer]
> (fmt->types '(λ (x) x))
#false
]
@exact|{
\hfill\fbox{\RktMeta{fmt->types} $\in \interp$}
Now to use @racket[fmt->types] in a function @racket[t-printf] @exact|{$\in \trans$}|.
Given a call to @racket[printf], we validate the number of arguments and
add type annotations derived using @racket[fmt->types].
\begin{SCodeFlow}\begin{RktBlk}\begin{SingleColumn}\RktSym{{\Stttextmore}}\mbox{\hphantom{\Scribtexttt{x}}}\RktPn{(}\RktSym{fmt{-}{\Stttextmore}types}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{"binary($\sim$s) = $\sim$b"}\RktPn{)}
\RktVal{{\textquotesingle}}\RktVal{[}\RktVal{Any}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{Integer}\RktVal{]}
\RktSym{{\Stttextmore}}\mbox{\hphantom{\Scribtexttt{x}}}\RktPn{(}\RktSym{fmt{-}{\Stttextmore}types}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{{\textquotesingle}}\RktVal{(}\RktVal{$\lambda$}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{(}\RktVal{x}\RktVal{)}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{x}\RktVal{)}\RktPn{)}
\RktVal{\#false}\end{SingleColumn}\end{RktBlk}\end{SCodeFlow}
}|
Now to use @racket[fmt->types] in an elaboration.
Given a call to @racket[printf], we check the number of arguments and
add type annotations using the inferred types.
For all other syntax patterns, @racket[t-printf] is the identity transformation.
@racketblock[
> (t-printf '(printf "~a"))
> (t-printf '(printf "~b" "2"))
'(printf "~b" ("2" : Integer))
> (t-printf printf)
'printf
]
@exact|{
\hfill\fbox{$\elabf \in \interp$}
\begin{SCodeFlow}\begin{RktBlk}\begin{SingleColumn}\RktSym{{\Stttextmore}}\mbox{\hphantom{\Scribtexttt{x}}}\RktPn{(}\RktSym{t{-}printf}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{{\textquotesingle}}\RktVal{(}\RktVal{printf}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{"$\sim$a"}\RktVal{)}\RktPn{)}
\RktSym{$\perp$}
\RktSym{{\Stttextmore}}\mbox{\hphantom{\Scribtexttt{x}}}\RktPn{(}\RktSym{t{-}printf}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{{\textquotesingle}}\RktVal{(}\RktVal{printf}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{"$\sim$b"}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{"2"}\RktVal{)}\RktPn{)}
\RktVal{{\textquotesingle}}\RktVal{(}\RktVal{printf}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{"$\sim$b"}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{(}\RktVal{"2"}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{{\hbox{\texttt{:}}}}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{Integer}\RktVal{)}\RktVal{)}
\RktSym{{\Stttextmore}}\mbox{\hphantom{\Scribtexttt{x}}}\RktPn{(}\RktSym{t{-}printf}\mbox{\hphantom{\Scribtexttt{x}}}\RktSym{printf}\RktPn{)}
\RktVal{{\textquotesingle}}\RktVal{printf}\end{SingleColumn}\end{RktBlk}\end{SCodeFlow}
}|
The first example is rejected immediately as a syntax error.
The second is temporarily accepted, but will cause a static type error.
The second is a valid translation, but will lead to a static type error.
Put another way, the format string @racket{~b} specializes the type of
@racket[printf] from @racket[(String Any * -> Void)] to @racket[(String Integer -> Void)].
The third is slightly more interesting; it demonstrates that higher-order
uses of @racket[printf] default to the standard behavior.
The third example demonstrates that higher-order
uses of @racket[printf] default to the standard, unspecialized behavior.
@; =============================================================================
@section{Regular Expressions}
Moving now from the second most-loved DSL to the first, regular expressions
are often used to capture sub-patterns within a string.
Regular expressions are often used to capture sub-patterns within a string.
@racketblock[
> (regexp-match #rx"(.*)@(.*)" "toni@merchant.net")
@ -138,7 +163,7 @@ Inside the pattern, the parentheses delimit sub-pattern @emph{groups}, the dots
The second argument is a string to match against the pattern.
If the match succeeds, the result is a list containing the entire matched string
and substrings corresponding to each group captured by a sub-pattern.
If the match fails, Racket's @racket[regexp-match] returns @racket[#false].
If the match fails, @racket[regexp-match] returns @racket[#false].
@racketblock[
> (regexp-match #rx"-(2*)-" "111-222-3333")
@ -148,34 +173,39 @@ If the match fails, Racket's @racket[regexp-match] returns @racket[#false].
]
Certain groups can also fail to capture even when the overall match succeeds.
This can happen, for example, when a group is followed by a Kleene star.
This can happen when a group is followed by a Kleene star.
@racketblock[
> (regexp-match #rx"(a)*(b)" "b")
'("b" #f "b")
]
Therefore, a simple catch-all type for @racket[regexp-match] is
Therefore, a catch-all type for @racket[regexp-match] is fairly large:
@racket[(Regexp String -> (U #f (Listof (U #f String))))].
Using the general type, however, is cumbersome for simple patterns
Using this general type is cumbersome for simple patterns
where a match implies that all groups will successfully capture.
@;@(define tr-eval (make-base-eval '(require typed/racket/base racket/list)))
@racketblock[
> (define (get-domain [full-name : String]) : String
(cond
[(regexp-match #rx"(.*)@(.*)" full-name)
=> third]
[else "Match Failed"]))
]
@;Error: expected String, got (U #false String)
@; `third` could not be applied to arguments
@; Arguments: (Listof (U #false String))
@; Expected Result: String
@exact|{
\begin{SCodeFlow}\begin{RktBlk}\begin{SingleColumn}\RktSym{{\Stttextmore}}\mbox{\hphantom{\Scribtexttt{x}}}\RktPn{(}\RktSym{define}\mbox{\hphantom{\Scribtexttt{x}}}\RktPn{(}\RktSym{get{-}domain}\mbox{\hphantom{\Scribtexttt{x}}}\RktPn{[}\RktSym{full{-}name}\mbox{\hphantom{\Scribtexttt{x}}}\RktSym{{\hbox{\texttt{:}}}}\mbox{\hphantom{\Scribtexttt{x}}}\RktSym{String}\RktPn{]}\RktPn{)}\mbox{\hphantom{\Scribtexttt{x}}}\RktSym{{\hbox{\texttt{:}}}}\mbox{\hphantom{\Scribtexttt{x}}}\RktSym{String}
Analysing the parentheses contained in a regular expression pattern can often
determine the number of groups statically.
We implement this as a function @racket[rx->groups] @exact|{$\in \interp$}|
\mbox{\hphantom{\Scribtexttt{xxxx}}}\RktPn{(}\RktSym{cond}
\mbox{\hphantom{\Scribtexttt{xxxxx}}}\RktPn{[}\RktPn{(}\RktSym{regexp{-}match}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{\#rx"({\hbox{\texttt{.}}}*)@({\hbox{\texttt{.}}}*)"}\mbox{\hphantom{\Scribtexttt{x}}}\RktSym{full{-}name}\RktPn{)}
\mbox{\hphantom{\Scribtexttt{xxxxxx}}}\RktSym{={\Stttextmore}}\mbox{\hphantom{\Scribtexttt{x}}}\RktSym{third}\RktPn{]}
\mbox{\hphantom{\Scribtexttt{xxxxx}}}\RktPn{[}\RktSym{else}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{"Match Failed"}\RktPn{]}\RktPn{)}\RktPn{)}
\RktErr{Error: expected $<$String$>$, got $<$(U \#false String)$>$}
\end{SingleColumn}\end{RktBlk}\end{SCodeFlow}
}|
@; TODO
We implement a parentheses-counting interpretation that parses regular expressions
and returns the number of groups.
@todo{fbox} @racket[rx->groups] @exact|{$\in \interp$}|
@racketblock[
> (rx->groups #rx"(a)(b)(c)")
@ -186,12 +216,12 @@ We implement this as a function @racket[rx->groups] @exact|{$\in \interp$}|
#false
]
@; TODO can we not talk about casts?
The corresponding transformation
@racket[t-regexp] @exact|{$\in \trans$}|
inserts casts to refine the type of results
produced by @racket[regexp-match].
It also flags malformed groups in uncompiled regular expressions.
inserts casts to subtype the result of calls to @racket[regexp-match].
It also raises syntax errors when an uncompiled regular expression contains
unmatched parentheses.
@todo{fbox}
@racketblock[
> (t-regexp '(regexp-match #rx"(a)b" str))
@ -203,12 +233,10 @@ It also flags malformed groups in uncompiled regular expressions.
@; =============================================================================
@section{Procedure Arity}
@section{Anonymous Functions}
Anonymous functions are another value form whose representation contains
useful data.
By tokenizing symbolic λ-expressions, we can parse their domain
syntactically in a function @racket[fun->domain] @exact|{$\in \interp$}|
By tokenizing symbolic λ-expressions, we can interpret their domain
statically. @todo{fbox} @racket[fun->domain] @exact|{$\in \interp$}|
@racketblock[
> (fun->arity '(λ (x y z) (x (z y) y)))
@ -239,7 +267,7 @@ TODO same goes for zipWith in a language without polydots
@; =============================================================================
@section{Constant Folding}
@section{Numeric Constants}
The identity interpretation is useful for lifting constant values to
the compile-time environment.