\newbfop\dlsint{int} \def\ii{\dlsproctype{\dlsint}{\dlsint}} \begin{schemeregion} This chapter presents a formal account of the integration of typed and untyped code in a single program. For simplicity, we consider only a simple type system and module system. Further, we restrict the language to have only one typed module. \section{Relationship to Typed Scheme} % Of course, the implementation % of Typed Scheme works with the full type system as well as with % arbitrary combinations of typed and untyped modules. This chapter presents a formalism that admits only simple types. Further, each module may contain exactly one expression, and other modules are referred to implicitly. Finally, there may only be one typed module in the system. This formalism, initially designed and published prior to the implementation of Typed Scheme, is thus inadequate for direct use in Typed Scheme. Therefore, in the full Typed Scheme language, several changes are made. First, a program may consist of an arbitrary combination of arbitrarily many typed and untyped modules. Second, each module explicitly specifies which modules it depends upon. Third, and most significantly, the importing typed module must specify the types to be assigned to any imports from untyped modules. The implemented system has a number of advantages. First, the type specifications for imports provide checked documentation of the expected behavior of the untyped code. Second, as in the wrapper modules presented in this chapter, they provide a localized target of blame. Therefore, when the untyped module fails to live up to the contract assigned to it, instead of blaming the untyped module, which may not be at fault, the type specification is blamed instead. Third, it does not require the complex contract inference process from this chapter for determining the desired type of imported untyped identifiers. Of course, the crucial result of this chapter, theorem~\ref{thm:dlsmain}, is still applicable to the resulting system, despite these changes. Developing a formalism that more accurately captures the design of the Typed Scheme implementation remains future work. \section{An Informal Tour} To make things simple, we assume that a program is a sequence of modules followed by a ``main expression.'' The evaluation of such a program starts with the main expression. Evaluation proceeds as usual in an expression-oriented language. When the evaluation process encounters a reference to a module, it imports the appropriate expression from there and evaluates it. To keep things as simple as possible, we work with a typed variant of the untyped language where all binding positions come with type declarations. In short, we migrate to an explicitly typed language that is otherwise syntactically and semantically equivalent to the untyped one. In a program that mixes typed and untyped modules, evaluation proceeds as before. This implies that the values of the typed language are those of the untyped language (and vice versa). Figures~\ref{fig:syntax-untyped}, \ref{fig:syntax-mixed}, \ref{fig:syntax-runtime}, and~\ref{fig:dls-reduction} present the formal syntax and semantics of our model, though this section liberally adds features to provide intuition; section~\ref{sec:dls-formal} explains these figures in detail. \begin{figure}[t] $$ \begin{altgrammar} P & ::= & \overvec{M}\ \e{} & \mbox{Programs} \\ M & ::= & \modu{f}{v} & \mbox{Modules} \\ e & ::= & n \alt \dlsabs{\x{}}{\e{}} \alt x \alt f \alt \comb{e}{e} \alt \ifz{e}{e}{e} & \mbox{Source Expressions} \\ \end{altgrammar} $$ \caption{Scripting Language Syntax} \label{fig:syntax-untyped} \end{figure} \begin{figure}[t] $$ \begin{altgrammar} P & ::= & \overvec{M}\ \e{} & \mbox{Programs} \\ M & ::= & M_u \alt M_t & \mbox{Modules} \\ M_u & ::= & \modu{f}{v} & \mbox{Untyped Modules} \\ % M_c & ::= & \mc{f}{c}{v} & \mbox{Contracted Modules} \\ M_t & ::= & \mt{f}{t}{v_t} & \mbox{Typed Modules} \\ t &::= & \dlsint \alt {\dlsproctype t t} &\mbox{Types}\\ v &::=& n \alt \dlsabs{\x{}}{\e{}} & \mbox{Untyped Values} \\ v_t &::=& n \alt \Tabs{\x{}}{t}{\e{t}} & \mbox{Typed Values} \\ %v_m &::=& n \alt \Tabs{\x{}}{t}{\e{m}} \alt \dlsabs{\x{}}{\e{m}} & \mbox{Mixed Values} \\ e & ::= & v \alt x \alt f \alt \comb{e}{e} \alt \ifz{e}{e}{e} & \mbox{Untyped Expressions} \\ \e{t} & ::= & v_t \alt x \alt f \alt \comb{e_t}{e_t} \alt \ifz{e_t}{e_t}{e_t} & \mbox{Typed Expressions} \\ %\e{m} & ::= & v_m \alt x \alt f \alt \comb{e_m}{e_m} \alt \ifz{e_m}{e_m}{e_m} %& \mbox{Mixed Expressions} \\ \end{altgrammar} $$ \caption{Migrated Language Syntax} \label{fig:syntax-mixed} \end{figure} \begin{figure}[tp] $$ \begin{altgrammar} c &::= & \dlsint \alt {\proccnt c c} \alt {\barrcnt c c} & \hspace{-0.2in}\mbox{Contracts} \supset \mbox{Types}\\ l &::=& \Tabs{\x{}}{t}{\e{}}^f \alt \dlsabs{\x{}}{\e{}}^f \alt \cast{\barrcnt{c}{c}}{f}{v}& \hspace{-0.2in}\mbox{Abstractions} \\ v &::=& n^f \alt l & \hspace{-0.2in}\mbox{Runtime Values}\\ e & ::= & v \alt x^f \alt f^f \alt \comb{e}{e}^f \alt \ifz{e}{e}{e}^f \alt \cast{c}{f}{e} & \hspace{-0.2in}\mbox{Runtime Expressions} \\ E & ::= & [] \alt \comb{E}{e} \alt \comb{v}{E} \alt \ifz{E}{e}{e} \alt \cast{c}{f}{E}& \hspace{-0.2in}\mbox{Contexts}\\ \end{altgrammar} $$ \caption{Runtime Syntax} \label{fig:syntax-runtime} \end{figure} \newcommand{\dosub}[1]{[v^f/x^g]{#1}} \begin{figure} $$ \begin{array}{lclr} \redrule{\comb{{\dlsabs{x}{e^f}}^g}{{v}^h}^i}{[v^h/x^g]e^f}{Subst} \\ \redrule{\comb{{\Tabs{x}{t}{e^f}}^g}{{v}^h}^i}{[v^h/x^g]e^f}{TypedSubst} \\ % \redrule{\comb{\Tabs{x}{t}{e}}{v}}{[v/x]e}{TypedSubst} \\ \redrule{{\comb{n^g}{v^h}}^f}{\blame{f}}{App-Error} \\ \redrule{\ifz{0^f}{e_1^g}{e_2^h}^i}{e_1^g}{if0-True} \\ \redrule{\ifz{v^f}{e_1^g}{e_2^h}^i}{e_2^h}{if0-False} \\ \redrule{\cast{\dlsint}{g}{n^f}}{n^f}{Int-Int} \\ \redrule{\cast{\dlsint}{g}{l^f}}{\blame{g}}{Int-Lam} \\ % \redrule{\cast{\any}{g}{v}}{v}{Any} \\ \redrule{\cast{\proccnt{c_1}{c_2}}{g}{n^f}}{\blame g}{Lam-Int} \\ % \redrule{\cast{\orcnt{\proccnt{c_1}{c_2}}{c}}{g}{n}}{\cast{c}{g}{n}}{Lam-IntOr} \\ \redruletwoline{\cast{\proccnt{c_1}{c_2}}{g}{l^f}}{\cast{\barrcnt{c_1}{c_2}}{g}{l^f}}{Lam-Lam} \\ % \redrule{\cast{\orcnt{\proccnt{c_1}{c_2}}{c}}{g}{v}}{\cast{\barrcnt{c_1}{c_2}}{g}{v}}{Lam-LamOr} \\ \redruletwoline{{\comb{\cast{\barrcnt{c_1}{c_2}}{g}{l^h}}{w^i}}^f} {\cast{c_2}{g}{{\comb{l^h}{\cast{c_1}{f}{w^i}}}^f}}{Split} \\ % here begin modules \redruletwoline{\ldots\modu{f}{v^f}\ldots\context{f^g}}{\ldots\modu{f}{v^f}\ldots\context{v^f}}{Lookup} \\ % \redruletwoline{\ldots\mc{f}{c}{v}\ldots\context{f}}{\ldots\mc{f}{c}{v}\ldots\context{\cast{c}{f}{v}}}{Lookup-Contract} \\ \redruletwoline{\ldots\mt{f}{t}{v^f}\ldots\context{f^g}\mbox{\ where $g \neq f$\hspace*{-0.1in}}}{\ldots\mt{f}{t}{v^f}\ldots\context{\cast{t}{g}{v^f}}}{Lookup-Type} \\ \redruletwoline{\ldots\mt{f}{t}{v^f}\ldots\context{f^f}}{\ldots\mt{f}{t}{v^f}\ldots\context{v^f}}{Lookup-Type-Self} \\ \end{array} $$ $$ \begin{array}{lcl} \dosub{x^g} & = &v^f\\ \dosub{\comb{e_1^{h_1}}{ e_2^{h_2}}}^i & = & ([v^f/x^g]e_1^{h_1} [v^f/x^g]e_2^{h_2})^i\\ \dosub{\ifz{e_1^{h_1}}{ e_2^{h_2}}{ e_3^{h_3}}}^i & = & \ifz{\dosub{e_1^{h_1}}}{\dosub{e_2^{h_2}}}{\dosub{e_3^{h_3}}}^i\\ \dosub{\cast{t}{h}{e^{i}}} & = & \cast{t}{h}{\dosub{e^i}}\\ \dosub{\dlsabs{y}{e^{h_1}}^{h_2}} & = & {\dlsabs{y}{\dosub{e^{h_1}}}^{h_2}}\\ \dosub{\Tabs{y}{t}{e^{h_1}}^{h_2}} & = & {\Tabs{y}{t}{\dosub{e^{h_1}}}^{h_2}}\\ \end{array} $$ \caption{Reduction Rules} \label{fig:dls-reduction} \end{figure} Given an untyped modular program, the first step of interlanguage migration is to turn one module into a typed module. We assume that this step simply adds types to all binding positions of the module, including the module exports. While porting to Typed Scheme is not always this easy, the additional changes sometimes required do not affect the modeling process. % Naturally, we don't believe that this is possible in % reality. Instead, we expect that programmers rewrite their modules in a % meaning-preserving manner---possibly using software analysis tools---so % that they can add type declarations where needed. Still, because the goal % of migration is to change the program without changing its meaning and our % goal is to understand the effect of the overall process, we consider it % justified for a first framework to simplify this step as much as possible. After the chosen module has been rewritten in the typed version of the language, we need to check the types and infer from them how the typed module is going to interact with the others, which remain untyped. Consider the following simple program: \begin{exmp} \begin{schemedisplay} (module f (int -> int) (lambda ([x : int]) (g x))) (module g 999) (f 5) \end{schemedisplay} \label{ex:dlsone} \end{exmp} It consists of two modules: the first is presumably a module that has been rewritten in the typed language, the second one is still in the untyped language. Also, the first one exports a function from integers to integers; the second one exports a simple integer. If we were to evaluate this program as is, it would eventually attempt to apply \scheme|999| to \scheme|5| via the application $\comb{g}{x}$ in the typed module. In other words, the typed portion of the program would trigger a run-time error, which, assuming proper source tracking, would tell the programmer that the typed module went wrong. We model this source tracking with labels on each expression in the original program, which indicate the module the expression comes from. A different view of the problem is that when one module changes, the rest of the program has to play by new rules, too. In this case, the very fact that the export from \scheme|g|, the second module, is used as a function in the typed module establishes an agreement between the two modules. This agreement, however, is informal (and unuttered) and is neither checked nor monitored during run-time. The evaluation therefore results in a run-time error seemingly due to the typed module. Thus our first lesson is that informal agreements don't play well with the goal of introducing types. To reap the benefits of types, we must not only have agreements, we must state them and enforce them. This line of reasoning naturally suggests the use of behavioral contracts in the spirit of \citet{ff:ho-contracts}. More precisely, we assume that an interlanguage migration process may add contracts at any point where untyped and typed modules interact.\footnote{In the implementation of Typed Scheme, these contracts are inserted at the point where the untyped module is \emph{required}, as described in chapter~\ref{sect:type-multi}.} For our running example we would expect that migration changes the program as follows: \begin{exmp} \begin{schemedisplay} (module f (int -> int) (lambda ([x : int]) ({(integer? -> integer?) <=$^{\scheme|g|}$ g} x))) (module g 999) (f 5) \end{schemedisplay} \label{ex:dlsone-res} \end{exmp} Put differently, we can infer from the types of the first module that the second module must provide a function from integers to integers. In our framework, we express this fact with a contract at the module reference. The example has two implications for interlanguage migration and its formal model. First, the language must also include optional contracts at module boundaries. Second, a type checker for the typed variant of the language must not only enforce the rules of the type system, it must also infer the contracts that these type annotations imply for the remaining modules. \label{sec:types-as-contracts} Unfortunately, there are yet more problems. Consider this second program, which applies an $\ii{}$ function to \scheme|#f|, a boolean. \begin{exmp} \begin{schemedisplay} (module h (int -> int) (lambda ([y : int]) (let ((g (lambda ([x : int]) (+ x 10)))) (+ (g y) (g 10))))) (h #f) \end{schemedisplay} \label{ex:hmod} \end{exmp} Since our evaluator ignores types, the boolean value flows into the typed module without any objections, only to cause havoc there. Again, the typed module appears to have gone wrong. In this case, the solution is to interpret the types on the module exports as contracts so that the evaluator monitors how the other modules use functional exports from the typed module. For flat types such as $\dlsint$, the values that flow into typed functions are checked immediately; for functional values, the contracts are distributed over the domain and range of the function until flat values show up \cite{ff:ho-contracts}. Technically, the types become contracts on external references to the module \scheme|h|, and are interpreted as runtime checks, or casts, which specify the party to be blamed if they fail. Considering the main expression from example~\ref{ex:hmod}, with $l$ as the body of module \scheme|h|, \begin{schemedisplay} (h #f)$^{Main}$ \end{schemedisplay} steps to \begin{schemedisplay} ({(integer? -> integer?) <=$^{\scheme|h|}$ $l^h$} #f)$^{Main}$ \end{schemedisplay} which steps to \begin{schemedisplay} {integer? <=$^{\scheme|h|}$ ($l^h$ {integer? <=$^{\scheme|Main|}$ #f})} \end{schemedisplay} At this point it has become clear that \scheme|#f| is a bad value, and the evaluator can abort the execution blaming the main expression for supplying bad values to the typed module. The rest of the chapter presents a formal model of this migration process. The focus of our presentation concerns the derivation of contracts via type checking; the use of these contracts to protect the typed module; and last but not least a theorem that proves that typed modules can't go wrong in this setting. \end{schemeregion}