samth-dissertation/dls-formalism.tex

\begin{schemeregion}
\section{The Formal Framework}\label{sec:dls-formal}

This section formally describes the interlanguage migration framework.
\subsection{Syntax}

Our scripting language is a simplified version of the language of
\citet{mff:popl06}, augmented with types and typed
modules.  It consists of the lambda calculus enriched with numeric
constants and a conditional, as well as casts, modules, and
contracts. The initial syntax used in the original program is specified in
figure \ref{fig:syntax-untyped}.  After the migration step, the syntax
is more complicated: see figure~\ref{fig:syntax-mixed}.  The runtime
system collapses some distinctions and adds casts, which are specified
in figure \ref{fig:syntax-runtime}.

\noindent {\bf Modules} Our language has a simple first-order module system,
in which each module consists of a name and a value. The module
exports its value via its name. Modules can also be \emph{typed}.
Typed modules have a top-level type, and contain only typed
values, $v_t$ in figure~\ref{fig:syntax-mixed}.

\noindent {\bf Contracts and Casts} The contracts allow the base $\dlsint$
contract, as well as function contracts.  Function
contracts have the Findler--Felleisen semantics \cite{ff:ho-contracts}.
At run-time, contracts turn into checks, which we express with casts.
Syntactically, a cast $\cast {c} {e} {m}$ combines a contract with an
expression and a label for a module, which it blames for the contract
violation if the check fails.

Casts are not part of the source language, but the state space of the
transformation and runtime system.

\noindent {\bf Types} The types of the typed fragment of the language are just
the base type $\dlsint$ and function types. Importantly, every type is
syntactically also a contract.

\noindent {\bf Expressions} The language contains three kinds of
expressions: typed, untyped, and mixed.  Typed expressions occur only
in the typed module.  Untyped expressions occur in all other modules.
Mixed expressions can contain typed and untyped subexpressions and
appear only in the main expression during reduction.  The main
expression is initially an untyped closed expression, making it also a
mixed expression.

\subsection{Semantics}

For the dynamic semantics, we assume that every expression (including
values) in the original program has been labeled with the name of its
original source module. The syntax of this annotation is given in
figure~\ref{fig:syntax-runtime}.
 Contracts and casts, which do not appear in the
original program, do not receive labels.  The main expression is
labeled with \scheme|Main|, assumed not to be the name of any module.
  This labeling is necessary for appropriate
blame assignment when a dynamic error occurs. It corresponds to an
annotation pass for source location tracking.  For clarity, we omit
these labels wherever they are not needed.

The dynamic semantics is defined in figure \ref{fig:dls-reduction} as a
reduction semantics, and
again follows \citet{mff:popl06}, with additions for types and
typed modules.
  Reduction takes place in the context of modules, which are
not altered during reduction.  The module context is used only in the
{\sc Lookup} rules.  The relation $\rightarrow$ is the
one-step (standard) reduction relation, with $\dlsreduces$ as its reflexive,
transitive closure, using the set of evaluation contexts defined in
figure~\ref{fig:syntax-runtime}.


Rules that do not refer explicitly to the context are implicitly wrapped in
$E[-]$ on both sides, with the exception of the rules that reduce to
$\blame{f}$, which discard the evaluation context.  Rules are read in
order, from top to bottom.

The reduction rules fall into the following categories:

\begin{itemize}
\item The rules that look up module references all refer explicitly to
  the module context. The {\sc Lookup} rule refers to untyped
  modules, and simply substitutes the body of the module for the
  reference.  The {\sc Lookup-Contract} and {\sc Lookup-Type} rules
  retrieve the appropriate expression and wrap it in a contract.  The
  contract wrapped around typed module bodies is necessary so that typed
  expressions are never used in incorrect ways, even when the untyped
  modules refer to the typed module.  This check is not necessary when
  the typed module refers to itself, and is thus omitted in the {\sc
    Lookup-Type-Self} rule~\cite{ff:ho-contracts}.
\item
  The rules for the core are straightforward.
  These include {\sc Subst} and {\sc TypedSubst}, which perform
  $\beta_v$-reduction on untyped and typed abstractions, respectively.
  {\sc IF0-True} and {\sc IF0-False} are also simple.
\item
  {\sc App-Error} is the one runtime error that does not involve a
  contract or a cast.  If a number is in the application position of
  an application, clearly the invariants of the language have been
  violated.  We blame the source of the application for this error.

The remaining rules handle contracts and casts.
\item  {\sc Int-Int} passes numbers through $\dlsint$ contracts unchanged.

\item {\sc Int-Lam} and {\sc Lam-Int} represent contract failures and
  blame the appropriate party, as labeled on the cast.  $v^f$ in the
  {\sc Int-Lam} rule cannot be an integer, otherwise {\sc Int-Int}
  would have applied.

\item {\sc Lam-Lam} blesses an arrow contract applied to an
  abstraction, turning it into a blessed arrow contract.  In contrast
  to {\sc Int-Int}, we must keep the contract around for later use.
  The resulting expression, while still a cast, is also a syntactic
  value.  The value must be an abstraction (or blessed cast applied to
  an abstraction), all numbers match the {\sc Lam-Int} rule.

\item The {\sc Split} rule breaks a blessed arrow
  contract into its positive and negative halves, and places them around
  the argument and the entire application.  The creation of two new
  casts, with appropriate blame assignment, is the key to proper
  contract checking for higher-order functions~\cite{ff:ho-contracts}.
  The blame for the negative position is taken from the label of the
  application itself, and the blame for the positive position is taken
  from the label of the cast.

\end{itemize}

\renewcommand\tau{a}
\newmeta\tty{t}

\subsection{Adding Type Declarations}

The first step in interlanguage migration requires
the programmer to change one module from the untyped language to the
typed one.  In our system, this involves adding types to every
variable binding
and to the module export as a whole.

Once this module is annotated, the new program is referred to as
$P^\tau$, where $\tau$ is the name of the now-typed module.

Since the simply-typed $\lambda$-calculus has a straightforward
type-soundness theorem, we might expect a similar one to hold for
migrated programs,
provided the type annotations are self-consistent.  Sadly, this is not the case.
For example, the typed module named $\tau$ might refer to some
other module, which could provide an arbitrary value or raise a
run-time error.  The other modules may contain outright errors, such
as $\comb{3}{4}$, as well as untypeable expressions such as
$\dlsabs{x}{\ifz{x}{1}{\dlsabs{y}{y}}}$; we do not rule these out, since the
programmer is only adding types to one module.  Furthermore, since we
do not typecheck the other modules, they may use the typed module in
ways that do not accord with its type.  Because of these
possibilities, the migration process must protect the typed module
from its untyped brethren.

\subsection{Inferring Contracts from Types}

In order to protect the typed module when it refers to untyped ones,
we apply a transformation to the program that expresses the implicit
agreements between the typed and untyped modules as contracts.
  Our transformation
examines the references to other modules in the typed module and from these uses
guesses contracts that become obligations of those other modules.  For
example, in the following program, the context makes it obvious that
$g$ must have type $\ii$:
\begin{schemedisplay}
(module f int (g 5))
\end{schemedisplay}
The transformation would therefore add a cast using the contract $\ii$ to the
reference to $g$ (no other casts make the module type at \scheme|int|).

\input{dls-figure}

The type system in figure~\ref{fig:trivial-transform} formalizes this
intuition.  Its rules define the judgement
$$\ttgproves{e_t}{t}{e_t'}$$ which states that given program $P$ in type environment
$\Gamma$, expression $e_t$ has type $t$ and is transformed to expression
$e_t'$.
The rules are similar to those of the simply typed
$\lambda$-calculus, propagating and combining the
types and transformed expressions from their constituents, except
for the module variable reference rules: {\sc T-ModVarSelf} and
{\sc{T-ModVar}}.  The former  checks references to the typed
module itself.  The latter checks a reference to an untyped module.
This module can be given any type consistent with the type of the
surrounding typed expression, but that type is converted to a contract
and a cast is added to check that the contract is upheld.

Because of the non-determinism of the {\sc{T-ModVar}} rule, these rules do not
naturally map onto a syntax-directed type checker.
They do define a logic program, however, which potentially produces
many solutions, each satisfying the desired type, and including a
matching set of inserted contracts for checking the other modules in
the program.  For example, consider the following program:

\begin{schemedisplay}
(module f int (g h))
\end{schemedisplay}

There are many possible sets of contracts that could be generated by
our system.  A simple one requires that $h$ be an $\dlsint$ and that $g$
have the contract $\ii$.  Of course, there are infinitely many
possibilities.  While a real system for migration would use
programmer input and static analysis to choose one solution, our
soundness theorem holds for all of them.


\subsection{Summary of the transformation}

The transformation we have just described is $MT$, for
migration transformation.
$MT(P)$ is the set of possible results of transforming $P$ as follows:
\begin{enumerate}
\item The programmer chooses one module $M = \modu{a}{e}$ named $a$ from $P$
  and adds types to this module, so that it is now a typed module
  according to the grammar of figure~\ref{fig:syntax-mixed}, producing
  the new module
  $M' = \mt{a}{t}{e'}$.
\item Using the type system described in figure
  \ref{fig:trivial-transform}, transform the body of $M'$ into $M''$.
  Formally, if $\ttproves{\emptyset}{e'}{\t{}}{e''}$, then $M'' = \mt{a}{t}{e''}$.
\end{enumerate}
\noindent
%The new program, which is $P$ with $M$ replaced by $M''$, will be
%called $P^a$.

Since step 2 in this process is nondeterministic, the set $MT(P)$ may
contain many distinct elements.  We prove in the next section that
every element $Q$ of this set is sound with respect to the type
system and blame assignment.

\end{schemeregion}