233 lines
11 KiB
TeX
233 lines
11 KiB
TeX
\begin{schemeregion}
|
|
\section{The Formal Framework}\label{sec:dls-formal}
|
|
|
|
This section formally describes the interlanguage migration framework.
|
|
\subsection{Syntax}
|
|
|
|
Our scripting language is a simplified version of the language of
|
|
\citet{mff:popl06}, augmented with types and typed
|
|
modules. It consists of the lambda calculus enriched with numeric
|
|
constants and a conditional, as well as casts, modules, and
|
|
contracts. The initial syntax used in the original program is specified in
|
|
figure \ref{fig:syntax-untyped}. After the migration step, the syntax
|
|
is more complicated: see figure~\ref{fig:syntax-mixed}. The runtime
|
|
system collapses some distinctions and adds casts, which are specified
|
|
in figure \ref{fig:syntax-runtime}.
|
|
|
|
\noindent {\bf Modules} Our language has a simple first-order module system,
|
|
in which each module consists of a name and a value. The module
|
|
exports its value via its name. Modules can also be \emph{typed}.
|
|
Typed modules have a top-level type, and contain only typed
|
|
values, $v_t$ in figure~\ref{fig:syntax-mixed}.
|
|
|
|
\noindent {\bf Contracts and Casts} The contracts allow the base $\dlsint$
|
|
contract, as well as function contracts. Function
|
|
contracts have the Findler--Felleisen semantics \cite{ff:ho-contracts}.
|
|
At run-time, contracts turn into checks, which we express with casts.
|
|
Syntactically, a cast $\cast {c} {e} {m}$ combines a contract with an
|
|
expression and a label for a module, which it blames for the contract
|
|
violation if the check fails.
|
|
|
|
Casts are not part of the source language, but the state space of the
|
|
transformation and runtime system.
|
|
|
|
\noindent {\bf Types} The types of the typed fragment of the language are just
|
|
the base type $\dlsint$ and function types. Importantly, every type is
|
|
syntactically also a contract.
|
|
|
|
\noindent {\bf Expressions} The language contains three kinds of
|
|
expressions: typed, untyped, and mixed. Typed expressions occur only
|
|
in the typed module. Untyped expressions occur in all other modules.
|
|
Mixed expressions can contain typed and untyped subexpressions and
|
|
appear only in the main expression during reduction. The main
|
|
expression is initially an untyped closed expression, making it also a
|
|
mixed expression.
|
|
|
|
\subsection{Semantics}
|
|
|
|
For the dynamic semantics, we assume that every expression (including
|
|
values) in the original program has been labeled with the name of its
|
|
original source module. The syntax of this annotation is given in
|
|
figure~\ref{fig:syntax-runtime}.
|
|
Contracts and casts, which do not appear in the
|
|
original program, do not receive labels. The main expression is
|
|
labeled with \scheme|Main|, assumed not to be the name of any module.
|
|
This labeling is necessary for appropriate
|
|
blame assignment when a dynamic error occurs. It corresponds to an
|
|
annotation pass for source location tracking. For clarity, we omit
|
|
these labels wherever they are not needed.
|
|
|
|
The dynamic semantics is defined in figure \ref{fig:dls-reduction} as a
|
|
reduction semantics, and
|
|
again follows \citet{mff:popl06}, with additions for types and
|
|
typed modules.
|
|
Reduction takes place in the context of modules, which are
|
|
not altered during reduction. The module context is used only in the
|
|
{\sc Lookup} rules. The relation $\rightarrow$ is the
|
|
one-step (standard) reduction relation, with $\dlsreduces$ as its reflexive,
|
|
transitive closure, using the set of evaluation contexts defined in
|
|
figure~\ref{fig:syntax-runtime}.
|
|
|
|
|
|
Rules that do not refer explicitly to the context are implicitly wrapped in
|
|
$E[-]$ on both sides, with the exception of the rules that reduce to
|
|
$\blame{f}$, which discard the evaluation context. Rules are read in
|
|
order, from top to bottom.
|
|
|
|
The reduction rules fall into the following categories:
|
|
|
|
\begin{itemize}
|
|
\item The rules that look up module references all refer explicitly to
|
|
the module context. The {\sc Lookup} rule refers to untyped
|
|
modules, and simply substitutes the body of the module for the
|
|
reference. The {\sc Lookup-Contract} and {\sc Lookup-Type} rules
|
|
retrieve the appropriate expression and wrap it in a contract. The
|
|
contract wrapped around typed module bodies is necessary so that typed
|
|
expressions are never used in incorrect ways, even when the untyped
|
|
modules refer to the typed module. This check is not necessary when
|
|
the typed module refers to itself, and is thus omitted in the {\sc
|
|
Lookup-Type-Self} rule~\cite{ff:ho-contracts}.
|
|
\item
|
|
The rules for the core are straightforward.
|
|
These include {\sc Subst} and {\sc TypedSubst}, which perform
|
|
$\beta_v$-reduction on untyped and typed abstractions, respectively.
|
|
{\sc IF0-True} and {\sc IF0-False} are also simple.
|
|
\item
|
|
{\sc App-Error} is the one runtime error that does not involve a
|
|
contract or a cast. If a number is in the application position of
|
|
an application, clearly the invariants of the language have been
|
|
violated. We blame the source of the application for this error.
|
|
|
|
The remaining rules handle contracts and casts.
|
|
\item {\sc Int-Int} passes numbers through $\dlsint$ contracts unchanged.
|
|
|
|
\item {\sc Int-Lam} and {\sc Lam-Int} represent contract failures and
|
|
blame the appropriate party, as labeled on the cast. $v^f$ in the
|
|
{\sc Int-Lam} rule cannot be an integer, otherwise {\sc Int-Int}
|
|
would have applied.
|
|
|
|
\item {\sc Lam-Lam} blesses an arrow contract applied to an
|
|
abstraction, turning it into a blessed arrow contract. In contrast
|
|
to {\sc Int-Int}, we must keep the contract around for later use.
|
|
The resulting expression, while still a cast, is also a syntactic
|
|
value. The value must be an abstraction (or blessed cast applied to
|
|
an abstraction), all numbers match the {\sc Lam-Int} rule.
|
|
|
|
\item The {\sc Split} rule breaks a blessed arrow
|
|
contract into its positive and negative halves, and places them around
|
|
the argument and the entire application. The creation of two new
|
|
casts, with appropriate blame assignment, is the key to proper
|
|
contract checking for higher-order functions~\cite{ff:ho-contracts}.
|
|
The blame for the negative position is taken from the label of the
|
|
application itself, and the blame for the positive position is taken
|
|
from the label of the cast.
|
|
|
|
\end{itemize}
|
|
|
|
\renewcommand\tau{a}
|
|
\newmeta\tty{t}
|
|
|
|
\subsection{Adding Type Declarations}
|
|
|
|
The first step in interlanguage migration requires
|
|
the programmer to change one module from the untyped language to the
|
|
typed one. In our system, this involves adding types to every
|
|
variable binding
|
|
and to the module export as a whole.
|
|
|
|
Once this module is annotated, the new program is referred to as
|
|
$P^\tau$, where $\tau$ is the name of the now-typed module.
|
|
|
|
Since the simply-typed $\lambda$-calculus has a straightforward
|
|
type-soundness theorem, we might expect a similar one to hold for
|
|
migrated programs,
|
|
provided the type annotations are self-consistent. Sadly, this is not the case.
|
|
For example, the typed module named $\tau$ might refer to some
|
|
other module, which could provide an arbitrary value or raise a
|
|
run-time error. The other modules may contain outright errors, such
|
|
as $\comb{3}{4}$, as well as untypeable expressions such as
|
|
$\dlsabs{x}{\ifz{x}{1}{\dlsabs{y}{y}}}$; we do not rule these out, since the
|
|
programmer is only adding types to one module. Furthermore, since we
|
|
do not typecheck the other modules, they may use the typed module in
|
|
ways that do not accord with its type. Because of these
|
|
possibilities, the migration process must protect the typed module
|
|
from its untyped brethren.
|
|
|
|
\subsection{Inferring Contracts from Types}
|
|
|
|
In order to protect the typed module when it refers to untyped ones,
|
|
we apply a transformation to the program that expresses the implicit
|
|
agreements between the typed and untyped modules as contracts.
|
|
Our transformation
|
|
examines the references to other modules in the typed module and from these uses
|
|
guesses contracts that become obligations of those other modules. For
|
|
example, in the following program, the context makes it obvious that
|
|
$g$ must have type $\ii$:
|
|
\begin{schemedisplay}
|
|
(module f int (g 5))
|
|
\end{schemedisplay}
|
|
The transformation would therefore add a cast using the contract $\ii$ to the
|
|
reference to $g$ (no other casts make the module type at \scheme|int|).
|
|
|
|
\input{dls-figure}
|
|
|
|
The type system in figure~\ref{fig:trivial-transform} formalizes this
|
|
intuition. Its rules define the judgement
|
|
$$\ttgproves{e_t}{t}{e_t'}$$ which states that given program $P$ in type environment
|
|
$\Gamma$, expression $e_t$ has type $t$ and is transformed to expression
|
|
$e_t'$.
|
|
The rules are similar to those of the simply typed
|
|
$\lambda$-calculus, propagating and combining the
|
|
types and transformed expressions from their constituents, except
|
|
for the module variable reference rules: {\sc T-ModVarSelf} and
|
|
{\sc{T-ModVar}}. The former checks references to the typed
|
|
module itself. The latter checks a reference to an untyped module.
|
|
This module can be given any type consistent with the type of the
|
|
surrounding typed expression, but that type is converted to a contract
|
|
and a cast is added to check that the contract is upheld.
|
|
|
|
Because of the non-determinism of the {\sc{T-ModVar}} rule, these rules do not
|
|
naturally map onto a syntax-directed type checker.
|
|
They do define a logic program, however, which potentially produces
|
|
many solutions, each satisfying the desired type, and including a
|
|
matching set of inserted contracts for checking the other modules in
|
|
the program. For example, consider the following program:
|
|
|
|
\begin{schemedisplay}
|
|
(module f int (g h))
|
|
\end{schemedisplay}
|
|
|
|
There are many possible sets of contracts that could be generated by
|
|
our system. A simple one requires that $h$ be an $\dlsint$ and that $g$
|
|
have the contract $\ii$. Of course, there are infinitely many
|
|
possibilities. While a real system for migration would use
|
|
programmer input and static analysis to choose one solution, our
|
|
soundness theorem holds for all of them.
|
|
|
|
|
|
\subsection{Summary of the transformation}
|
|
|
|
The transformation we have just described is $MT$, for
|
|
migration transformation.
|
|
$MT(P)$ is the set of possible results of transforming $P$ as follows:
|
|
\begin{enumerate}
|
|
\item The programmer chooses one module $M = \modu{a}{e}$ named $a$ from $P$
|
|
and adds types to this module, so that it is now a typed module
|
|
according to the grammar of figure~\ref{fig:syntax-mixed}, producing
|
|
the new module
|
|
$M' = \mt{a}{t}{e'}$.
|
|
\item Using the type system described in figure
|
|
\ref{fig:trivial-transform}, transform the body of $M'$ into $M''$.
|
|
Formally, if $\ttproves{\emptyset}{e'}{\t{}}{e''}$, then $M'' = \mt{a}{t}{e''}$.
|
|
\end{enumerate}
|
|
\noindent
|
|
%The new program, which is $P$ with $M$ replaced by $M''$, will be
|
|
%called $P^a$.
|
|
|
|
Since step 2 in this process is nondeterministic, the set $MT(P)$ may
|
|
contain many distinct elements. We prove in the next section that
|
|
every element $Q$ of this set is sound with respect to the type
|
|
system and blame assignment.
|
|
|
|
\end{schemeregion}
|