307 lines
14 KiB
TeX
307 lines
14 KiB
TeX
\newbfop\dlsint{int}
|
|
\def\ii{\dlsproctype{\dlsint}{\dlsint}}
|
|
|
|
\begin{schemeregion}
|
|
|
|
This chapter presents a formal account of the integration of typed and
|
|
untyped code in a single program. For simplicity, we consider only a
|
|
simple type system and module system. Further, we restrict the
|
|
language to have only one typed module.
|
|
|
|
\section{Relationship to Typed Scheme}
|
|
|
|
% Of course, the implementation
|
|
% of Typed Scheme works with the full type system as well as with
|
|
% arbitrary combinations of typed and untyped modules.
|
|
|
|
This chapter presents a formalism that admits only simple types.
|
|
Further, each module may contain exactly one expression, and other modules
|
|
are referred to implicitly. Finally, there may only be one typed
|
|
module in the system. This formalism, initially designed and
|
|
published prior to the implementation of Typed Scheme, is thus
|
|
inadequate for direct use in Typed Scheme.
|
|
|
|
Therefore, in the full Typed Scheme language, several changes are
|
|
made. First, a program may consist of an arbitrary combination of
|
|
arbitrarily many typed and untyped modules. Second, each module
|
|
explicitly specifies which modules it depends upon. Third, and most
|
|
significantly, the importing typed module must specify the types to be
|
|
assigned to any imports from untyped modules.
|
|
|
|
The implemented system has a number of advantages. First, the type
|
|
specifications for imports provide checked documentation of the
|
|
expected behavior of the untyped code. Second, as in the wrapper
|
|
modules presented in this chapter, they provide a localized target of
|
|
blame. Therefore, when the untyped module fails to live up to the
|
|
contract assigned to it, instead of blaming the untyped module, which
|
|
may not be at fault, the type specification is blamed instead. Third,
|
|
it does not require the complex contract inference process from this
|
|
chapter for determining the desired type of imported untyped identifiers.
|
|
|
|
Of course, the crucial result of this chapter,
|
|
theorem~\ref{thm:dlsmain}, is still applicable to the resulting
|
|
system, despite these changes. Developing a formalism that more
|
|
accurately captures the design of the Typed Scheme implementation
|
|
remains future work.
|
|
|
|
\section{An Informal Tour}
|
|
|
|
To make things simple, we assume that a program is a sequence of
|
|
modules followed by a ``main expression.''
|
|
|
|
The evaluation of such a program starts with the main expression. Evaluation
|
|
proceeds as usual in an expression-oriented language. When the evaluation
|
|
process encounters a reference to a module, it imports the appropriate
|
|
expression from there and evaluates it.
|
|
|
|
To keep things as simple as possible, we work with a typed variant of
|
|
the untyped language where all binding positions
|
|
come with type declarations. In short, we migrate to
|
|
an explicitly typed language that is otherwise syntactically and semantically
|
|
equivalent to the untyped one.
|
|
|
|
In a program that mixes typed and untyped modules, evaluation proceeds
|
|
as before. This implies that the values of the typed language are
|
|
those of the untyped language (and vice
|
|
versa). Figures~\ref{fig:syntax-untyped},
|
|
\ref{fig:syntax-mixed}, \ref{fig:syntax-runtime},
|
|
and~\ref{fig:dls-reduction} present the formal syntax and semantics of our
|
|
model, though this section liberally adds features to provide
|
|
intuition; section~\ref{sec:dls-formal} explains these figures in detail.
|
|
|
|
\begin{figure}[t]
|
|
$$
|
|
\begin{altgrammar}
|
|
P & ::= & \overvec{M}\ \e{} & \mbox{Programs} \\
|
|
M & ::= & \modu{f}{v} & \mbox{Modules} \\
|
|
e & ::= & n \alt \dlsabs{\x{}}{\e{}} \alt x \alt f \alt \comb{e}{e} \alt \ifz{e}{e}{e} &
|
|
\mbox{Source Expressions} \\
|
|
\end{altgrammar}
|
|
$$
|
|
|
|
\caption{Scripting Language Syntax} \label{fig:syntax-untyped}
|
|
\end{figure}
|
|
|
|
\begin{figure}[t]
|
|
$$
|
|
\begin{altgrammar}
|
|
P & ::= & \overvec{M}\ \e{} & \mbox{Programs} \\
|
|
M & ::= & M_u \alt M_t & \mbox{Modules} \\
|
|
M_u & ::= & \modu{f}{v} & \mbox{Untyped Modules} \\
|
|
% M_c & ::= & \mc{f}{c}{v} & \mbox{Contracted Modules} \\
|
|
M_t & ::= & \mt{f}{t}{v_t} & \mbox{Typed Modules} \\
|
|
t &::= & \dlsint \alt {\dlsproctype t t} &\mbox{Types}\\
|
|
v &::=& n \alt \dlsabs{\x{}}{\e{}} &
|
|
\mbox{Untyped Values} \\
|
|
v_t &::=& n \alt \Tabs{\x{}}{t}{\e{t}} & \mbox{Typed Values} \\
|
|
%v_m &::=& n \alt \Tabs{\x{}}{t}{\e{m}} \alt \dlsabs{\x{}}{\e{m}} & \mbox{Mixed Values} \\
|
|
e & ::= & v \alt x \alt f \alt \comb{e}{e} \alt \ifz{e}{e}{e} &
|
|
\mbox{Untyped Expressions} \\
|
|
\e{t} & ::= & v_t \alt x \alt f \alt \comb{e_t}{e_t} \alt \ifz{e_t}{e_t}{e_t}
|
|
& \mbox{Typed Expressions} \\
|
|
%\e{m} & ::= & v_m \alt x \alt f \alt \comb{e_m}{e_m} \alt \ifz{e_m}{e_m}{e_m}
|
|
%& \mbox{Mixed Expressions} \\
|
|
\end{altgrammar}
|
|
$$
|
|
|
|
\caption{Migrated Language Syntax} \label{fig:syntax-mixed}
|
|
\end{figure}
|
|
|
|
|
|
\begin{figure}[tp]
|
|
$$
|
|
\begin{altgrammar}
|
|
c &::= & \dlsint \alt {\proccnt c c} \alt {\barrcnt c c} &
|
|
\hspace{-0.2in}\mbox{Contracts} \supset \mbox{Types}\\
|
|
l &::=& \Tabs{\x{}}{t}{\e{}}^f \alt \dlsabs{\x{}}{\e{}}^f \alt
|
|
\cast{\barrcnt{c}{c}}{f}{v}&
|
|
\hspace{-0.2in}\mbox{Abstractions} \\
|
|
v &::=& n^f \alt l &
|
|
\hspace{-0.2in}\mbox{Runtime Values}\\
|
|
e & ::= & v \alt x^f \alt f^f \alt \comb{e}{e}^f \alt \ifz{e}{e}{e}^f \alt \cast{c}{f}{e} &
|
|
\hspace{-0.2in}\mbox{Runtime Expressions} \\
|
|
E & ::= & [] \alt \comb{E}{e} \alt \comb{v}{E} \alt \ifz{E}{e}{e} \alt \cast{c}{f}{E}&
|
|
\hspace{-0.2in}\mbox{Contexts}\\
|
|
\end{altgrammar}
|
|
$$
|
|
\caption{Runtime Syntax} \label{fig:syntax-runtime}
|
|
\end{figure}
|
|
|
|
\newcommand{\dosub}[1]{[v^f/x^g]{#1}}
|
|
|
|
\begin{figure}
|
|
$$
|
|
\begin{array}{lclr}
|
|
\redrule{\comb{{\dlsabs{x}{e^f}}^g}{{v}^h}^i}{[v^h/x^g]e^f}{Subst} \\
|
|
\redrule{\comb{{\Tabs{x}{t}{e^f}}^g}{{v}^h}^i}{[v^h/x^g]e^f}{TypedSubst} \\
|
|
% \redrule{\comb{\Tabs{x}{t}{e}}{v}}{[v/x]e}{TypedSubst} \\
|
|
\redrule{{\comb{n^g}{v^h}}^f}{\blame{f}}{App-Error} \\
|
|
\redrule{\ifz{0^f}{e_1^g}{e_2^h}^i}{e_1^g}{if0-True} \\
|
|
\redrule{\ifz{v^f}{e_1^g}{e_2^h}^i}{e_2^h}{if0-False} \\
|
|
\redrule{\cast{\dlsint}{g}{n^f}}{n^f}{Int-Int} \\
|
|
\redrule{\cast{\dlsint}{g}{l^f}}{\blame{g}}{Int-Lam} \\
|
|
% \redrule{\cast{\any}{g}{v}}{v}{Any} \\
|
|
\redrule{\cast{\proccnt{c_1}{c_2}}{g}{n^f}}{\blame g}{Lam-Int} \\
|
|
% \redrule{\cast{\orcnt{\proccnt{c_1}{c_2}}{c}}{g}{n}}{\cast{c}{g}{n}}{Lam-IntOr} \\
|
|
\redruletwoline{\cast{\proccnt{c_1}{c_2}}{g}{l^f}}{\cast{\barrcnt{c_1}{c_2}}{g}{l^f}}{Lam-Lam} \\
|
|
% \redrule{\cast{\orcnt{\proccnt{c_1}{c_2}}{c}}{g}{v}}{\cast{\barrcnt{c_1}{c_2}}{g}{v}}{Lam-LamOr} \\
|
|
\redruletwoline{{\comb{\cast{\barrcnt{c_1}{c_2}}{g}{l^h}}{w^i}}^f}
|
|
{\cast{c_2}{g}{{\comb{l^h}{\cast{c_1}{f}{w^i}}}^f}}{Split} \\
|
|
% here begin modules
|
|
\redruletwoline{\ldots\modu{f}{v^f}\ldots\context{f^g}}{\ldots\modu{f}{v^f}\ldots\context{v^f}}{Lookup} \\
|
|
% \redruletwoline{\ldots\mc{f}{c}{v}\ldots\context{f}}{\ldots\mc{f}{c}{v}\ldots\context{\cast{c}{f}{v}}}{Lookup-Contract} \\
|
|
\redruletwoline{\ldots\mt{f}{t}{v^f}\ldots\context{f^g}\mbox{\ where $g \neq f$\hspace*{-0.1in}}}{\ldots\mt{f}{t}{v^f}\ldots\context{\cast{t}{g}{v^f}}}{Lookup-Type} \\
|
|
\redruletwoline{\ldots\mt{f}{t}{v^f}\ldots\context{f^f}}{\ldots\mt{f}{t}{v^f}\ldots\context{v^f}}{Lookup-Type-Self} \\
|
|
\end{array}
|
|
$$
|
|
$$
|
|
\begin{array}{lcl}
|
|
\dosub{x^g} & = &v^f\\
|
|
\dosub{\comb{e_1^{h_1}}{ e_2^{h_2}}}^i & = & ([v^f/x^g]e_1^{h_1} [v^f/x^g]e_2^{h_2})^i\\
|
|
\dosub{\ifz{e_1^{h_1}}{ e_2^{h_2}}{ e_3^{h_3}}}^i & = &
|
|
\ifz{\dosub{e_1^{h_1}}}{\dosub{e_2^{h_2}}}{\dosub{e_3^{h_3}}}^i\\
|
|
\dosub{\cast{t}{h}{e^{i}}} & = & \cast{t}{h}{\dosub{e^i}}\\
|
|
\dosub{\dlsabs{y}{e^{h_1}}^{h_2}} & = & {\dlsabs{y}{\dosub{e^{h_1}}}^{h_2}}\\
|
|
\dosub{\Tabs{y}{t}{e^{h_1}}^{h_2}} & = & {\Tabs{y}{t}{\dosub{e^{h_1}}}^{h_2}}\\
|
|
\end{array}
|
|
$$
|
|
\caption{Reduction Rules} \label{fig:dls-reduction}
|
|
\end{figure}
|
|
|
|
Given an untyped modular program, the first step of interlanguage
|
|
migration is to turn one module into a typed module. We assume that this
|
|
step simply adds types to all binding positions of the module, including
|
|
the module exports. While porting to Typed Scheme is not always this
|
|
easy, the additional changes sometimes required do not affect the
|
|
modeling process.
|
|
% Naturally, we don't believe that this is possible in
|
|
% reality. Instead, we expect that programmers rewrite their modules in a
|
|
% meaning-preserving manner---possibly using software analysis tools---so
|
|
% that they can add type declarations where needed. Still, because the goal
|
|
% of migration is to change the program without changing its meaning and our
|
|
% goal is to understand the effect of the overall process, we consider it
|
|
% justified for a first framework to simplify this step as much as possible.
|
|
|
|
After the chosen module has been rewritten in the typed version of the
|
|
language, we need to check the types and infer from them how the typed
|
|
module is going to interact with the others, which remain untyped.
|
|
Consider the following simple program:
|
|
\begin{exmp}
|
|
\begin{schemedisplay}
|
|
(module f (int -> int) (lambda ([x : int]) (g x)))
|
|
(module g 999)
|
|
(f 5)
|
|
\end{schemedisplay}
|
|
\label{ex:dlsone}
|
|
\end{exmp}
|
|
|
|
It consists of two modules: the first is presumably a module that has
|
|
been rewritten in the typed language, the second one is still in the
|
|
untyped language. Also, the first one exports a function from integers
|
|
to integers; the second one exports a simple integer.
|
|
|
|
If we were to evaluate this program as is, it would eventually attempt
|
|
to apply \scheme|999| to \scheme|5| via the application $\comb{g}{x}$ in the
|
|
typed module. In other words, the typed portion of the program would
|
|
trigger a run-time error, which, assuming proper source tracking,
|
|
would tell the programmer that the typed module went wrong. We
|
|
model this source tracking with labels on each expression in the
|
|
original program, which indicate the module the expression comes
|
|
from.
|
|
|
|
A different view of the problem is that when one module changes, the
|
|
rest of the program has to play by new rules, too. In this case, the
|
|
very fact that the export from \scheme|g|, the second module, is used as
|
|
a function in the typed module establishes an agreement between the
|
|
two modules. This agreement, however, is informal (and unuttered) and
|
|
is neither checked nor monitored during run-time. The evaluation
|
|
therefore results in a run-time
|
|
error seemingly due to the typed module.
|
|
|
|
Thus our first lesson is that informal agreements don't play well with the
|
|
goal of introducing types. To reap the benefits of types, we must not
|
|
only have agreements, we must state them and enforce them. This line
|
|
of reasoning naturally suggests the use of behavioral contracts in the
|
|
spirit of \citet{ff:ho-contracts}. More precisely, we assume that an
|
|
interlanguage migration process may add contracts at any point where
|
|
untyped and typed modules interact.\footnote{In the implementation of
|
|
Typed Scheme, these contracts are inserted at the point where the
|
|
untyped module is \emph{required}, as described in
|
|
chapter~\ref{sect:type-multi}.}
|
|
|
|
For our running example we would expect that migration changes the program
|
|
as follows:
|
|
\begin{exmp}
|
|
\begin{schemedisplay}
|
|
(module f (int -> int)
|
|
(lambda ([x : int])
|
|
({(integer? -> integer?) <=$^{\scheme|g|}$ g} x)))
|
|
(module g 999)
|
|
(f 5)
|
|
\end{schemedisplay}
|
|
\label{ex:dlsone-res}
|
|
\end{exmp}
|
|
Put differently, we can infer from the types of the first module that
|
|
the second module must provide a function from integers to
|
|
integers. In our framework, we express this fact with a contract at
|
|
the module reference.
|
|
|
|
The example has two implications for interlanguage migration and
|
|
its formal model. First, the language must also include optional
|
|
contracts at module boundaries. Second, a type checker for the typed
|
|
variant of the language must not only enforce the rules of the type
|
|
system, it must also infer the contracts that these type annotations
|
|
imply for the remaining modules.
|
|
|
|
\label{sec:types-as-contracts}
|
|
Unfortunately, there are yet more problems. Consider this second
|
|
program, which applies an $\ii{}$ function to
|
|
\scheme|#f|, a boolean.
|
|
\begin{exmp}
|
|
\begin{schemedisplay}
|
|
(module h (int -> int)
|
|
(lambda ([y : int])
|
|
(let ((g (lambda ([x : int]) (+ x 10))))
|
|
(+ (g y) (g 10)))))
|
|
(h #f)
|
|
\end{schemedisplay}
|
|
\label{ex:hmod}
|
|
\end{exmp}
|
|
Since our evaluator ignores types, the boolean value
|
|
flows into the typed module without any objections, only to cause havoc
|
|
there. Again, the typed module appears to have gone wrong.
|
|
|
|
In this case, the solution is to interpret the types on the module exports
|
|
as contracts so that the evaluator monitors how the other modules use
|
|
functional exports from the typed module. For flat types such as $\dlsint$,
|
|
the values that flow into typed functions are checked immediately; for
|
|
functional values, the contracts are distributed over the domain and range
|
|
of the function until flat values show up
|
|
\cite{ff:ho-contracts}. Technically, the types become
|
|
contracts on external references to the module \scheme|h|, and are
|
|
interpreted as runtime checks, or casts, which specify the party to
|
|
be blamed if they fail. Considering the main expression from
|
|
example~\ref{ex:hmod}, with $l$ as the body of module \scheme|h|,
|
|
\begin{schemedisplay}
|
|
(h #f)$^{Main}$
|
|
\end{schemedisplay}
|
|
steps to
|
|
\begin{schemedisplay}
|
|
({(integer? -> integer?) <=$^{\scheme|h|}$ $l^h$} #f)$^{Main}$
|
|
\end{schemedisplay}
|
|
which steps to
|
|
\begin{schemedisplay}
|
|
{integer? <=$^{\scheme|h|}$ ($l^h$ {integer? <=$^{\scheme|Main|}$ #f})}
|
|
\end{schemedisplay}
|
|
At this point it has become clear that \scheme|#f| is a bad value, and the
|
|
evaluator can abort the execution blaming the main expression for supplying
|
|
bad values to the typed module.
|
|
|
|
The rest of the chapter presents a formal model of this migration
|
|
process. The focus of our presentation concerns the derivation of
|
|
contracts via type checking; the use of these contracts
|
|
to protect the typed module; and last but not least a theorem that proves that
|
|
typed modules can't go wrong in this setting.
|
|
|
|
\end{schemeregion}
|