[icfp] checkpoint, mid-regexp

This commit is contained in:
ben 2016-03-15 14:34:20 -04:00
parent 4fcd414971
commit 0d13cda84b
4 changed files with 109 additions and 75 deletions

View File

@ -44,7 +44,7 @@ The standard work-around@~cite[fi-jfp-2000] is to maintain size-indexed
} }
@; Prelude> let first_3 (x, y, z) = x @; Prelude> let first_3 (x, y, z) = x
These problems are well known, and are often used to motive research on These problems are well known, and are often used to motivate research on
dependently typed programming languages@~cite[a-icfp-1999]. dependently typed programming languages@~cite[a-icfp-1999].
Short of abandoning ship for a completely new type system, languages including Short of abandoning ship for a completely new type system, languages including
Haskell, OCaml, Java, and Typed Racket have seen proposals for detecting Haskell, OCaml, Java, and Typed Racket have seen proposals for detecting
@ -94,7 +94,7 @@ In this example, there are two groups.
We have written an elaborator for @racket[regexp-match] that will statically We have written an elaborator for @racket[regexp-match] that will statically
parse its first argument, count these groups, and refine the parse its first argument, count these groups, and refine the
result type for specific calls to @racket[regexp-match]. result type for specific calls to @racket[regexp-match].
The elaborator @emph{also} handles the common case where the regular expression The elaborator also handles the common case where the regular expression
argument is a compile-time constant and respects @exact{$\alpha$}-equivalence. argument is a compile-time constant and respects @exact{$\alpha$}-equivalence.
@codeblock{ @codeblock{
@ -107,7 +107,7 @@ The elaborator @emph{also} handles the common case where the regular expression
(define (get-plaintiff (s : String)) : String (define (get-plaintiff (s : String)) : String
(cond (cond
[(r-m rx-case s) [(r-m rx-case s)
=> cadr] => second]
[else "J. Doe"])) [else "J. Doe"]))
} }

View File

@ -117,3 +117,8 @@ SoundX -- cannot reporduce this because
[anti-TH](http://stackoverflow.com/questions/10857030/whats-so-bad-about-template-haskell) [anti-TH](http://stackoverflow.com/questions/10857030/whats-so-bad-about-template-haskell)
--- ---
Not great news for me / metaprogramming Not great news for me / metaprogramming
[LMS](https://scala-lms.github.io//publications.html)
---
The good scala macro system.

View File

@ -27,8 +27,8 @@ If @exact|{${\tt p?} \in \interp$}| and @exact|{${\tt e} \in \emph{expr}$}|,
it may be useful to think of it may be useful to think of
@exact|{${\tt (p?~e)}$}| as @emph{evidence} that the expression @exact|{${\tt e}$}| @exact|{${\tt (p?~e)}$}| as @emph{evidence} that the expression @exact|{${\tt e}$}|
is recognized by @exact|{${\tt p?}$}|. is recognized by @exact|{${\tt p?}$}|.
Alternatively, @exact|{${\tt (p?~e)}$}| is a kind of interpolant@~cite[c-jsl-1997] Alternatively, @exact|{${\tt (p?~e)}$}| is a kind of interpolant@~cite[c-jsl-1997],
representing details about @exact|{${\tt e}$}| relevant for program elaboration. representing key data embedded in @exact|{${\tt e}$}|.
Correct interpretation functions @exact|{${\tt p?}$}| obey three guidelines: Correct interpretation functions @exact|{${\tt p?}$}| obey three guidelines:
@itemize[ @itemize[
@ -100,3 +100,4 @@ If neither @exact|{${\tt e}$}| nor @exact|{${\tt e'}$}| type checks, then we hav
In a perfect world both would diverge, but the fundamental limitations of In a perfect world both would diverge, but the fundamental limitations of
static typing@~cite[fagan-dissertation-1992] and computability static typing@~cite[fagan-dissertation-1992] and computability
keep us imperfect. keep us imperfect.
TODO TODO TODO Extra space hierExtra space hierExtra space hierExtra space

View File

@ -21,7 +21,7 @@ These elaborators are implemented in Typed Racket@~cite[TypedRacket], a macro-ex
typed language that compiles into Racket@~cite[plt-tr1]. typed language that compiles into Racket@~cite[plt-tr1].
An important component of Typed Racket's design is that all macros in a program An important component of Typed Racket's design is that all macros in a program
are fully expanded before type-checking begins. are fully expanded before type-checking begins.
This convention lets us implement our elaborators as macros that expand into This protocol lets us implement our elaborators as macros that expand into
typed code. typed code.
@parag{Conventions} @parag{Conventions}
@ -43,8 +43,14 @@ Such information is extremely important, especially for implementing @racket[def
} @item{ } @item{
We use an infix @tt{:} to write explicit type annotations and casts, We use an infix @tt{:} to write explicit type annotations and casts,
for instance @racket[(x : Integer)]. for instance @racket[(x : Integer)].
These are normally @racket[(ann x Integer)] and @racket[(cast x Integer)], These normally have two different syntaxes, respectively
respectively. @racket[(ann x Integer)] and @racket[(cast x Integer)].
} @item{
TODO phantom item for space
TODO phantom item for space
TODO phantom item for space
TODO phantom item for space
TODO phantom item for space
} @item{ } @item{
In @Secref{sec:sql}, @tt{sql} is short for @tt{postgresql}, i.e. In @Secref{sec:sql}, @tt{sql} is short for @tt{postgresql}, i.e.
the code we present in that section is only implemented for the @tt{postgresql} the code we present in that section is only implemented for the @tt{postgresql}
@ -58,9 +64,10 @@ In fact, this practice aligns with our implementation---once the interpretations
] ]
@; ============================================================================= @; =============================================================================
@section{String Formatting} @section{Format Strings}
@; TODO add note about the ridiculous survey figure? Something like 4/5 doctors @; TODO add note about the ridiculous survey figure? Something like 4/5 doctors
@; TODO note regexp is the first?
Format strings are the world's second most-loved domain-specific language (DSL). Format strings are the world's second most-loved domain-specific language (DSL).
@; @~cite[wmpk-algol-1968] @; @~cite[wmpk-algol-1968]
All strings are valid format strings; additionally, a format string may contain All strings are valid format strings; additionally, a format string may contain
@ -72,59 +79,77 @@ Racket follows the Lisp tradition@~cite[s-lisp-1990] of using a tilde character
For example, @racket[~s] converts any value to a string and @racket[~b] converts a For example, @racket[~s] converts any value to a string and @racket[~b] converts a
number to binary form. number to binary form.
@interaction[ @exact|{
(printf "binary(~s) = ~b" 7 7) \begin{SCodeFlow}\begin{RktBlk}\begin{SingleColumn}\Scribtexttt{{\Stttextmore} }\RktPn{(}\RktSym{printf}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{"binary($\sim$s) = $\sim$b"}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{7}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{7}\RktPn{)}
]
\RktOut{binary(7) = 111}
\begin{SingleColumn}\end{SingleColumn}\end{SingleColumn}\end{RktBlk}\end{SCodeFlow}
}|
If the format directives do not match the arguments to @racket[printf], most If the format directives do not match the arguments to @racket[printf], most
languages fail at run-time@~cite[a-icfp-1999]. languages fail at run-time.
This is a simple kind of value error that should be caught statically. This is a simple kind of value error that could be caught statically.
@; TODO print errors nicer @exact|{
@interaction[ \begin{SCodeFlow}\begin{RktBlk}\begin{SingleColumn}\Scribtexttt{{\Stttextmore} }\RktPn{(}\RktSym{printf}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{"binary($\sim$s) = $\sim$b"}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{"7"}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{"7"}\RktPn{)}
(printf "binary(~s) = ~b" "7" "7")
]
Detecting inconsistencies between a format string and its arguments is easy \RktErr{printf: format string requires argument of type $<$exact{-}number$>$}
provided we have a function @racket[fmt->types] @exact|{$\in \interp$}| for
reading types from a format string. \begin{SingleColumn}\end{SingleColumn}\end{SingleColumn}\end{RktBlk}\end{SCodeFlow}
}|
Detecting inconsistencies between a format string and its arguments is straightforward
if we define an interpretation @racket[fmt->types] @exact|{$\in \interp$}| for
reading types from a format string value.
In Typed Racket this function is rather simple because the most common In Typed Racket this function is rather simple because the most common
directives accept @code{Any} type of value. directives accept @code{Any} type of value---in a language with uniform syntax,
Such are the joys of uniform syntax---printing is free. printing comes for free.
@racketblock[ @exact|{
> (fmt->types "binary(~s) = ~b") \hfill\fbox{\RktMeta{fmt->types} $\in \interp$}
'[Any Integer]
> (fmt->types '(λ (x) x))
#false
]
Now to use @racket[fmt->types] in a function @racket[t-printf] @exact|{$\in \trans$}|. \begin{SCodeFlow}\begin{RktBlk}\begin{SingleColumn}\RktSym{{\Stttextmore}}\mbox{\hphantom{\Scribtexttt{x}}}\RktPn{(}\RktSym{fmt{-}{\Stttextmore}types}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{"binary($\sim$s) = $\sim$b"}\RktPn{)}
Given a call to @racket[printf], we validate the number of arguments and
add type annotations derived using @racket[fmt->types]. \RktVal{{\textquotesingle}}\RktVal{[}\RktVal{Any}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{Integer}\RktVal{]}
\RktSym{{\Stttextmore}}\mbox{\hphantom{\Scribtexttt{x}}}\RktPn{(}\RktSym{fmt{-}{\Stttextmore}types}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{{\textquotesingle}}\RktVal{(}\RktVal{$\lambda$}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{(}\RktVal{x}\RktVal{)}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{x}\RktVal{)}\RktPn{)}
\RktVal{\#false}\end{SingleColumn}\end{RktBlk}\end{SCodeFlow}
}|
Now to use @racket[fmt->types] in an elaboration.
Given a call to @racket[printf], we check the number of arguments and
add type annotations using the inferred types.
For all other syntax patterns, @racket[t-printf] is the identity transformation. For all other syntax patterns, @racket[t-printf] is the identity transformation.
@racketblock[ @exact|{
> (t-printf '(printf "~a")) \hfill\fbox{$\elabf \in \interp$}
> (t-printf '(printf "~b" "2")) \begin{SCodeFlow}\begin{RktBlk}\begin{SingleColumn}\RktSym{{\Stttextmore}}\mbox{\hphantom{\Scribtexttt{x}}}\RktPn{(}\RktSym{t{-}printf}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{{\textquotesingle}}\RktVal{(}\RktVal{printf}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{"$\sim$a"}\RktVal{)}\RktPn{)}
'(printf "~b" ("2" : Integer))
> (t-printf printf) \RktSym{$\perp$}
'printf
] \RktSym{{\Stttextmore}}\mbox{\hphantom{\Scribtexttt{x}}}\RktPn{(}\RktSym{t{-}printf}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{{\textquotesingle}}\RktVal{(}\RktVal{printf}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{"$\sim$b"}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{"2"}\RktVal{)}\RktPn{)}
\RktVal{{\textquotesingle}}\RktVal{(}\RktVal{printf}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{"$\sim$b"}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{(}\RktVal{"2"}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{{\hbox{\texttt{:}}}}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{Integer}\RktVal{)}\RktVal{)}
\RktSym{{\Stttextmore}}\mbox{\hphantom{\Scribtexttt{x}}}\RktPn{(}\RktSym{t{-}printf}\mbox{\hphantom{\Scribtexttt{x}}}\RktSym{printf}\RktPn{)}
\RktVal{{\textquotesingle}}\RktVal{printf}\end{SingleColumn}\end{RktBlk}\end{SCodeFlow}
}|
The first example is rejected immediately as a syntax error. The first example is rejected immediately as a syntax error.
The second is temporarily accepted, but will cause a static type error. The second is a valid translation, but will lead to a static type error.
Put another way, the format string @racket{~b} specializes the type of Put another way, the format string @racket{~b} specializes the type of
@racket[printf] from @racket[(String Any * -> Void)] to @racket[(String Integer -> Void)]. @racket[printf] from @racket[(String Any * -> Void)] to @racket[(String Integer -> Void)].
The third is slightly more interesting; it demonstrates that higher-order The third example demonstrates that higher-order
uses of @racket[printf] default to the standard behavior. uses of @racket[printf] default to the standard, unspecialized behavior.
@; ============================================================================= @; =============================================================================
@section{Regular Expressions} @section{Regular Expressions}
Moving now from the second most-loved DSL to the first, regular expressions Regular expressions are often used to capture sub-patterns within a string.
are often used to capture sub-patterns within a string.
@racketblock[ @racketblock[
> (regexp-match #rx"(.*)@(.*)" "toni@merchant.net") > (regexp-match #rx"(.*)@(.*)" "toni@merchant.net")
@ -138,7 +163,7 @@ Inside the pattern, the parentheses delimit sub-pattern @emph{groups}, the dots
The second argument is a string to match against the pattern. The second argument is a string to match against the pattern.
If the match succeeds, the result is a list containing the entire matched string If the match succeeds, the result is a list containing the entire matched string
and substrings corresponding to each group captured by a sub-pattern. and substrings corresponding to each group captured by a sub-pattern.
If the match fails, Racket's @racket[regexp-match] returns @racket[#false]. If the match fails, @racket[regexp-match] returns @racket[#false].
@racketblock[ @racketblock[
> (regexp-match #rx"-(2*)-" "111-222-3333") > (regexp-match #rx"-(2*)-" "111-222-3333")
@ -148,34 +173,39 @@ If the match fails, Racket's @racket[regexp-match] returns @racket[#false].
] ]
Certain groups can also fail to capture even when the overall match succeeds. Certain groups can also fail to capture even when the overall match succeeds.
This can happen, for example, when a group is followed by a Kleene star. This can happen when a group is followed by a Kleene star.
@racketblock[ @racketblock[
> (regexp-match #rx"(a)*(b)" "b") > (regexp-match #rx"(a)*(b)" "b")
'("b" #f "b") '("b" #f "b")
] ]
Therefore, a simple catch-all type for @racket[regexp-match] is Therefore, a catch-all type for @racket[regexp-match] is fairly large:
@racket[(Regexp String -> (U #f (Listof (U #f String))))]. @racket[(Regexp String -> (U #f (Listof (U #f String))))].
Using the general type, however, is cumbersome for simple patterns Using this general type is cumbersome for simple patterns
where a match implies that all groups will successfully capture. where a match implies that all groups will successfully capture.
@;@(define tr-eval (make-base-eval '(require typed/racket/base racket/list))) @exact|{
@racketblock[ \begin{SCodeFlow}\begin{RktBlk}\begin{SingleColumn}\RktSym{{\Stttextmore}}\mbox{\hphantom{\Scribtexttt{x}}}\RktPn{(}\RktSym{define}\mbox{\hphantom{\Scribtexttt{x}}}\RktPn{(}\RktSym{get{-}domain}\mbox{\hphantom{\Scribtexttt{x}}}\RktPn{[}\RktSym{full{-}name}\mbox{\hphantom{\Scribtexttt{x}}}\RktSym{{\hbox{\texttt{:}}}}\mbox{\hphantom{\Scribtexttt{x}}}\RktSym{String}\RktPn{]}\RktPn{)}\mbox{\hphantom{\Scribtexttt{x}}}\RktSym{{\hbox{\texttt{:}}}}\mbox{\hphantom{\Scribtexttt{x}}}\RktSym{String}
> (define (get-domain [full-name : String]) : String
(cond
[(regexp-match #rx"(.*)@(.*)" full-name)
=> third]
[else "Match Failed"]))
]
@;Error: expected String, got (U #false String)
@; `third` could not be applied to arguments
@; Arguments: (Listof (U #false String))
@; Expected Result: String
Analysing the parentheses contained in a regular expression pattern can often \mbox{\hphantom{\Scribtexttt{xxxx}}}\RktPn{(}\RktSym{cond}
determine the number of groups statically.
We implement this as a function @racket[rx->groups] @exact|{$\in \interp$}| \mbox{\hphantom{\Scribtexttt{xxxxx}}}\RktPn{[}\RktPn{(}\RktSym{regexp{-}match}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{\#rx"({\hbox{\texttt{.}}}*)@({\hbox{\texttt{.}}}*)"}\mbox{\hphantom{\Scribtexttt{x}}}\RktSym{full{-}name}\RktPn{)}
\mbox{\hphantom{\Scribtexttt{xxxxxx}}}\RktSym{={\Stttextmore}}\mbox{\hphantom{\Scribtexttt{x}}}\RktSym{third}\RktPn{]}
\mbox{\hphantom{\Scribtexttt{xxxxx}}}\RktPn{[}\RktSym{else}\mbox{\hphantom{\Scribtexttt{x}}}\RktVal{"Match Failed"}\RktPn{]}\RktPn{)}\RktPn{)}
\RktErr{Error: expected $<$String$>$, got $<$(U \#false String)$>$}
\end{SingleColumn}\end{RktBlk}\end{SCodeFlow}
}|
@; TODO
We implement a parentheses-counting interpretation that parses regular expressions
and returns the number of groups.
@todo{fbox} @racket[rx->groups] @exact|{$\in \interp$}|
@racketblock[ @racketblock[
> (rx->groups #rx"(a)(b)(c)") > (rx->groups #rx"(a)(b)(c)")
@ -186,12 +216,12 @@ We implement this as a function @racket[rx->groups] @exact|{$\in \interp$}|
#false #false
] ]
@; TODO can we not talk about casts?
The corresponding transformation The corresponding transformation
@racket[t-regexp] @exact|{$\in \trans$}| inserts casts to subtype the result of calls to @racket[regexp-match].
inserts casts to refine the type of results It also raises syntax errors when an uncompiled regular expression contains
produced by @racket[regexp-match]. unmatched parentheses.
It also flags malformed groups in uncompiled regular expressions.
@todo{fbox}
@racketblock[ @racketblock[
> (t-regexp '(regexp-match #rx"(a)b" str)) > (t-regexp '(regexp-match #rx"(a)b" str))
@ -203,12 +233,10 @@ It also flags malformed groups in uncompiled regular expressions.
@; ============================================================================= @; =============================================================================
@section{Procedure Arity} @section{Anonymous Functions}
Anonymous functions are another value form whose representation contains By tokenizing symbolic λ-expressions, we can interpret their domain
useful data. statically. @todo{fbox} @racket[fun->domain] @exact|{$\in \interp$}|
By tokenizing symbolic λ-expressions, we can parse their domain
syntactically in a function @racket[fun->domain] @exact|{$\in \interp$}|
@racketblock[ @racketblock[
> (fun->arity '(λ (x y z) (x (z y) y))) > (fun->arity '(λ (x y z) (x (z y) y)))
@ -239,7 +267,7 @@ TODO same goes for zipWith in a language without polydots
@; ============================================================================= @; =============================================================================
@section{Constant Folding} @section{Numeric Constants}
The identity interpretation is useful for lifting constant values to The identity interpretation is useful for lifting constant values to
the compile-time environment. the compile-time environment.