samth-dissertation/scheme-type-multi.tex
Sam Tobin-Hochstadt 9c7a001a36 init
2017-07-10 13:02:10 -04:00

487 lines
20 KiB
TeX

\newcommand\Strust{\textsc{trust}}
\begin{schemeregion}
\section{Typing Modules}
\label{sect:type-multi}
Type-checking a typed module is more complicated than type-checking an
isolated definition or expression. Module bodies may refer to
variables that are neither primitive nor locally-defined, but imported
from other modules. Furthermore, module exports must be protected from
misuse in other modules, both typed and untyped.
As with a single definition or expression, type-checking a module
involves fully expanding the contents of the module and then analyzing
the result. Typed Scheme uses the module transformer hook
to type-check the contents of the module.
The variable protocol handles variables whose definitions or bindings
occur within the body of the module, but typing imported variables
requires additional communication between typed modules. The revised
protocol affects the way a typed module's exports are compiled.
There are three kinds of module interactions that typed modules can
participate in:
\begin{enumerate}
\item A typed module requires an untyped module.
\item A typed module requires another typed module.
\item An untyped module requires a typed module.
\end{enumerate}
The first case simply requires a method of importing untrusted code in
such a way that it cannot break the type system's invariants, which
demands appropriate input from the programmer. The other two cases
determine the behavior of a typed module's exports. Those two cases
essentially demand different behaviors from a typed module depending
on its use context.
This section explains how Typed Scheme interacts with the module
system. We begin with the simplest case, a typed module importing
untyped code. This case can be explained in terms of just the import
statement. Then we consider the case of a typed module importing
another typed module, and we develop the basic typed-module framework.
Finally, we show how to extend the behavior of exports to support the
case of importing a typed module into an untyped context.
\subsection{Untyped to Typed}
Typed modules cannot use untyped modules without additional protection.%
\footnote{However, typed modules can safely import untyped
\emph{macro} libraries (such as \scheme{match}) if the macros do not
expand into untyped, non-primitive variables.}
%
Instead, typed modules use a special \scheme{require/typed} form to
import names at specific types. The \scheme{require/typed} form wraps
the untyped imports with contracts~\cite{ff:ho-contracts} that enforce
the supplied types via runtime checks. It also adds the name to the type
environment with the specified type.
For example, the following use of \scheme{require/typed} imports the
\scheme{find-files} procedure from a standard library
module:\footnote{The \scheme|Path| of this library is a filesystem
path, not the paths of chapter~\ref{chap:occur-extend}.}
\begin{schemedisplay}
(require/typed scheme/file
[find-files ((Path -> Boolean) Path -> (Listof Path))])
\end{schemedisplay}
It is equivalent to the following code fragment:
\begin{schemedisplay}
(require (rename-in scheme/file unsafe-find-files find-files))$^{\mbox{\scriptsize \Strust}}$
(define: find-files : ((Path -> Boolean) Path -> (Listof Path))
(contract (type->contract
((Path -> Boolean) Path -> (Listof Path)))
unsafe-find-files
'find-files
'<typed-scheme>)$^{\mbox{\scriptsize \Strust}}$)
\end{schemedisplay}
The $\Strust$ annotation indicates a syntax property that directs the
type-checker to accept the labeled expression as-is.
%
The \scheme{contract} expression wraps the unsafe version of the
\scheme{find-files} procedure with a contract derived from the given
type. The last two arguments indicate the parties involved in the
contract; if something goes wrong, one of the parties is blamed.
The \scheme{find-files} contract checks the procedure's arguments and
result. If the untyped version of \scheme{find-files} returns a
non-path result, the contract catches it and blames
\scheme{'find-files} before the faulty value can interfere with the
typed program.
%
The first argument contract is itself a higher-order contract, so the
contract system wraps the function passed to \scheme{find-files} with
a contract corresponding to the \scheme{(Path -> Boolean)} type. This
contract prevents the untyped \scheme{find-files} from calling the
function with faulty arguments; if it does so, the contract system
raises an error and blames \scheme{'find-files} for the violation.
%
The second argument contract is a first-order contract. It can only be
violated if typed code supplies an argument of the wrong type, which
cannot happen if the type system is sound.
%
Finally, if \scheme{find-files} were to return something other than a
list of paths, the contract system would stop the program and thus
protect the typed code that expects to process the result.
%% Some types have no contract rep, like polymorphic types
%% prob. also unions of function types.
\subsection{Typed to Typed}
Typed Scheme installs a \scheme{HPmodule-begin} macro that first
performs the normal module expansion (using \scheme{local-expand}),
analyzes the result, and produces a module body that follows a new
\emph{module variable protocol}, which provides the type-checker with
the types of module variables:
\begin{schemedisplay}
(define-syntax (module-begin stx)
(syntax-case stx ()
[(module-begin form ...)
(type-check-module-body
(local-expand #'(#%plain-module-begin form ...)
'module-begin
null))]))
\end{schemedisplay}
Unlike the type-checking procedure for top-level forms,
\scheme{type-check-module-body} not only type-checks the module body;
it also transforms the code to produce the module body.
When one typed module requires another typed module, type-checking the
first module requires knowing the types associated with the all of the
definitions of the second module. The type-checker needs the types for
all of the definitions, even the unexported ones, because an imported
macro can expand into references to the unexported variables of the
module it was defined in.
%
This requires a new protocol, the module variable protocol.
Let us consider the protocol mechanisms introduced in
section~\ref{sect:protocols}.
%
An imported identifier does not carry any syntax properties, so syntax
properties alone are insufficient.
%
Static binding provides a partial solution: instead of directly
providing a variable, a typed module could instead provide a macro
that expands into a use of the actual variable. The macro would place
a type annotation on the reference as a syntax property.
%
The problem with the static binding approach is that it annotates only
the references that cross the public import/export boundary.
% FIXME: make sure to explain this point in section 3.
Variable references introduced by imported macros, however, do not go
through the static binding mechanism; they refer directly to the
module variables.
%
Since Typed Scheme aims to support macros, static binding is not
a viable approach.
That leaves compile-time side effects. We extend the type environment
table to include all known typed-module definitions instead of just
primitives and local definitions. A typed module relies on the global
type environment to contain types for all variables that appear within
its body, and it guarantees that its client modules have access to its
own type associations.
\begin{quotation}\noindent
\textbf{The Module Variable Protocol:}
During the compilation of a typed module, the global type environment
contains bindings for all definitions in all typed modules
transitively required by the module being compiled.
\end{quotation}
Since a module's contributions to the global type environment need to
be present during the compilation of every module that depends on it,
we use the persistent effect pattern described in
section~\ref{sect:syntax:persistent}. In addition to verifying the
correctness of the module's contents, the
\scheme{type-check-module-body} procedure also appends compile-time
type declarations to the end of the module.
%
We illustrate the effect of the module transformer on the following
modules:
\begin{schemedisplay}
langts ;; one
(provide one)
(: one Number)
(define one 1)
langts ;; plus
(provide plus1)
(: plus1 (Number -> Number))
(define (plus1 n)
(+ n one))
\end{schemedisplay}
The first module passes the type-checker, which also adds a type
declaration for \scheme{one} to the end of the compiled module:
\begin{schemedisplay}
(compiled-module one
(require typed-scheme)
(provide one)
(define one 1)
(begin-for-syntax
(declare-type! #'one (typeKW Number))))
\end{schemedisplay}
The reference to \scheme{declare-type!} was inserted by a macro from
the \scheme{typed-scheme} module. Even though \scheme{one} does not
import the \scheme{env} module directly, the procedure is available
indirectly through \scheme{typed-scheme}. Since \scheme{typed-scheme}
imports \scheme{env} via \scheme{for-syntax}, it is correct to
use \scheme{declare-type!} within the compile-time part of
\scheme{one}.
When the compiler encounters the \scheme{plus} module, the module
system invokes the compile-time part of \scheme{typed-scheme},
initializing the global type environment with the primitive bindings
only. Then, when the compiler encounters the import of \scheme{one} in
the module body, it invokes the compile-time part of the \scheme{one}
module, which loads its type declaration for \scheme{one} into
the type environment.
The \scheme{plus} module includes just one new definition, and the
module transformer adds the corresponding declaration to the module:
\begin{schemedisplay}
(compiled-module plus
(require typed-scheme)
(provide plus)
(define plus (lambda (n) (+ n 1)))
(begin-for-syntax
(declare-type! #'plus (typeKW (Number -> Number)))))
\end{schemedisplay}
The two modules are able to communicate using \scheme{typed-scheme}'s
type environment because the compile-time parts of the \scheme{one}
module and the \scheme{plus} module share a single invocation of
\scheme{typed-scheme} and thus a single invocation of the \scheme{env}
module.
Figures~\ref{fig:typed-scheme-module} and~\ref{fig:type-check-module}
show the implementation of typed modules and the module variable
protocol.
\begin{figure}[p!]
\begin{schemedisplay}
langs ;; typed-scheme
(require (for-syntax type-check))
(provide (rename-out module-begin HPmodule-begin)
(rename-out top-interaction HPtop-interaction)
(except-out (all-from-out scheme)
HPmodule-begin HPtop-interaction)
define:
lambda:)
(define-syntax (module-begin stx)
(syntax-case stx ()
[(module-begin form ...)
(type-check-module-body
(local-expand #'(#%plain-module-begin form ...)
'module-begin
null))]))
(define-syntax top-interaction ELIDED)
(define-syntax define: ELIDED)
(define-syntax lambda: ELIDED)
\end{schemedisplay}
\caption{The \variablefont{typed-scheme} module}
\label{fig:typed-scheme-module}
\end{figure}
\begin{figure}[p!]
\begin{schemedisplay}
langs ;; context
(provide typed-context?)
;; typed-context? : (box-of boolean)
;; True when the module being \emph{compiled} is a typed module.
(define typed-context? (box #f))
langs ;; typed-scheme
ELIDED
(require (for-syntax context))
(define-syntax (module-begin stx)
(syntax-case stx ()
[(module-begin form ...)
(begin
(set-box! typed-context #t)
(type-check-module-body
(local-expand #'(#%plain-module-begin form ...)
'module-begin
null)))]))
ELIDED
\end{schemedisplay}
\caption{Modified \variablefont{typed-scheme} module}
\label{fig:new-ts-mod}
\end{figure}
\begin{figure}[p!]
\begin{schemedisplay}
langs ;; type-check
(require env)
(provide (all-defined-out))
;; type-check-top-level : syntax $\rightarrow$ void
(define (type-check-top-level form) ELIDED)
;; type-check-module-body : syntax $\rightarrow$ syntax
(define (type-check-module-body form)
(syntax-case form ()
[(module-begin top-level-form ...)
(let ([def-types
(get-definition-types (syntax->list #'(top-level-form ...)))])
(for ([def def-types])
(declare-type! (binding-id def) (binding-type def)))
(for-each type-check-module-level-form
(syntax->list #'(top-level-form ...)))
;; Generate declarations to reload types into the
;; global type environment
(with-syntax ([(type-declaration ...)
(map binding->type-declaration def-types)])
#'(module-begin top-level-form ... type-declaration ...)))]))
;; type-check-module-level-form : syntax $\rightarrow$ void
(define (type-check-module-level-form form) ELIDED)
;; type-check-expression : syntax environment $\rightarrow$ type
(define (type-check-expression expr env) ELIDED)
;; get-definition-types : (list-of syntax) $\rightarrow$ (list-of binding)
(define (get-definition-types forms)
(if (null? forms)
null
(syntax-case (car forms) (define)
[(define name rhs)
(cons (make-binding #'name (get-id-type #'name))
(get-definition-types (cdr forms)))]
[_ (get-definition-types (cdr forms))])))
;; get-id-type : identifier $\rightarrow$ type
(define (get-id-type id) ELIDED)
;; binding$\rightarrow$type-declaration : binding $\rightarrow$ syntax
(define (binding->type-declaration b)
(with-syntax ([id (binding-id b)]
[type-expr (type->type-expression (binding-type b))])
#'(begin-for-syntax (declare-type! #'id type-expr))))
;; type$\rightarrow$type-expression : type $\rightarrow$ syntax
(define (type->type-expression type) ELIDED)
\end{schemedisplay}
\caption{Type Checker}
\label{fig:type-check-module}
\end{figure}
\subsection{Typed to Untyped}
When a typed module is imported into another typed module, it must
provide its definitions and load the type declarations into the global
type environment. The type-checker ensures that the exported values
are used safely, so there is no need for run-time checking or
wrapping.
In contrast, when a typed module is imported into an untyped module,
it should protect its exports so that the untyped context cannot
destroy the type invariants. As in the ``untyped to typed'' case, we use
contracts to enforce the type constraints of the definitions. For any
defined variable, it is a simple matter to generate a definition that
wraps the variable in the protection of the appropriate contract.
%
For example, the \scheme{plus} module above has a \scheme{plus1}
procedure with type \scheme{(Number -> Number)}. Given that information,
we can generate \scheme{defensive-plus1}:
\begin{schemedisplay}
(define/contract defensive-plus1
(type->contract (Number -> Number))
plus1)
\end{schemedisplay}
\noindent
The \scheme{define/contract} form is like a definition that uses
\scheme{contract} explicitly, except that it automatically computes
the blame parties.
A typed module, then, needs to provide one set of definitions to typed
contexts and another set of definitions to untyped contexts.
%
Of course, no module can actually change the contents of its
\scheme{provide} clauses once it is compiled. Instead, it can provide
a set of \emph{indirection} macros that choose whether to expand into
the trusting or defensive versions of exported names, assuming the macros
can determine whether the importing context is typed or untyped. PLT
Scheme provides \emph{rename transformers} as a convenient way of
writing such identifier-to-identifier translations.
Continuing the \scheme{plus} module example, the module transformer
rewrites
\begin{schemedisplay}
(provide plus1)
\end{schemedisplay}
into the following indirection definition and renamed-provide clause:
\begin{schemedisplay}
(define-syntax export-plus1
(if ELIDED ;; Will it be used in a typed context?
(make-rename-transformer #'plus1)
(make-rename-transformer #'defensive-plus1)))
(provide (rename export-plus1 plus1))
\end{schemedisplay}
The indirection definitions depend on some way of determining whether
the context they are imported into is typed or untyped. The context
that matters is the main module currently being compiled. If the require
chain includes intervening modules, they have already been compiled,
and references within the compiled modules are already resolved to the
right version of the exports. Thus, the problem boils down to
determining whether the main module currently being compiled is a typed
module.
The property that distinguishes a typed module is that it specifies
\scheme{typed-scheme} as its language module, and thus its module body
is under the control of the typed module transformer. Given that, it
is critical to understand the exact order of events in the compilation
process:
\begin{enumerate}
\item
The compiler invokes the initial language module's compile-time
part.\footnote{Although this invocation occurs prior to any
compilation of a typed module, it cannot be used to determine
whether compilation is occurring in a typed context, since the
Typed Scheme module can be required from untyped as well as typed
modules. }
\item
Then, it executes the initial language module's module transformer on
the body of the module being compiled.
\item
As the compiler encounters \scheme{require}s in the module's body, it
invokes the compile-time parts of the relevant modules.
\end{enumerate}
In particular, the execution of the module transformer precedes the
execution of any of the indirection definitions in compiled typed
modules. The Typed Scheme module transformer can therefore set a flag
indicating that the module being compiled is a typed module, and the
indirection definitions can simply check the value of the flag.
%
Figure~\ref{fig:new-ts-mod} presents the modified \scheme{typed-scheme} module.
The \scheme{type-check} module also adds \scheme{(require context)} so
that the indirection definitions it inserts can refer to
\scheme{typed-context?}.
The following program illustrate how the flag works. We add an untyped
\scheme{main} module to the \scheme{one} and \scheme{plus} modules
from our earlier examples.
\begin{schemedisplay}
langts ;; one
(provide one)
(: one Number)
(define one 1))
langts ;; plus
(require one)
(provide plus1)
(: plus1 (Number -> Number))
(define (plus1 x)
(+ x one)))
langs ;; main
(require plus)
(display (plus1 41)) (newline)
\end{schemedisplay}
The compiler processes the typed \scheme{one} module first, creating
the context-dependent indirection definition for the exported variable
\scheme{one}.
%
When the compiler encounters the typed \scheme{plus} module,
it first invokes the compile-time part of \scheme{typed-scheme}. That,
in turn, causes the invocation of the \scheme{context} module,
including a new \scheme{typed-context?} box initialized to
false. Executing the Typed Scheme \scheme{HPmodule-begin} macro sets
the value in the \scheme{typed-context?} box to true. Subsequently,
when the compiler encounters the \scheme{(require one)} form in the
module body, it invokes \scheme{one}'s compile-time part. Since the
\scheme{typed-context?} variable is set to true, the indirections are
set to the typed variants, and the compiler resolves uses of the
imported names to the unwrapped definitions.
The compilation of the \scheme{main} module proceeds differently. When
the compiler encounters the \scheme{(require plus)} form, it invokes
\scheme{plus}'s compile-time part, which invokes
\scheme{typed-scheme}'s compile-time part and invokes
\scheme{context}. This creates a fresh \scheme{typed-context?} box
initialized to false, just as before. The box's value is never changed
to true, however, because Typed Scheme's \scheme{HPmodule-begin} macro
is not used in the expansion of the \scheme{main} module. Thus when
\scheme{plus}'s indirection definitions are executed, they point to
the contract-wrapped variants. Thus the occurrence of \scheme{plus1}
in the \scheme{main} module is wrapped in code to verify the type of
its argument.
\end{schemeregion}