samth-dissertation/dls-theorems.tex
Sam Tobin-Hochstadt 9c7a001a36 init
2017-07-10 13:02:10 -04:00

389 lines
16 KiB
TeX

{
\section{Soundness}
Before we can prove that our migration transformation is sound, we
must first define what soundness means for a partially typed system.
It cannot mean absence of runtime errors, since not all modules are
necessarily typed. All we can say instead is that the \emph{typed}
modules do not go wrong.
\subsection{Soundness for Mixed Programs}
\renewcommand\tau{a}
Soundness for interlanguage migration is a relation
between the program before and after migration.
Intuitively, this relation states that the two programs
agree when they both produce values and that the typed module never
produces a type error at runtime. We say that a program $P$ with exactly one typed module
named $a$, is \emph{partially typed at $a$}, or simply \emph{partially
typed} when the name of the typed module is not relevant.
\begin{definition}[Soundness]
$\related{P}{Q}$, where $Q$ is partially typed at $a$ iff:
\begin{enumerate}
\item If $P \dlsreduces v$ then there exists $v'$ where $Q \dlsreduces v'$ with $\similar{v}{v'}$ or $Q
\dlsreduces \blame{g}$ with $g \neq \tau$.
\item If $P \dlsreduces \blame{h}$ then there exists $g$ where $Q \dlsreduces \blame{g}$
with $g \neq \tau$.
\item If $P$ reduces forever, then $Q$ reduces forever or
there exists $g$ where $Q \dlsreduces \blame{g}$
with $g \neq \tau$.
\end{enumerate}
\end{definition}
\noindent
This definition relies on the \emph{similarity} relation $\similar{v}{v'}$, which
states that $v'$ is the same as $v$, with the possible addition of
types, contracts, and casts.
\begin{figure}
$$
\begin{array}{c}
\inferrule[Programs]{\overline{\similar{M}{M'}} \\
\similar{e}{e'}}{\similar{\overline{M}\ e}{\overline{M'}\ e'}} \vspace*{4mm}\\
\inferrule[Modules]{}{\similar{ \modu{f}{e}}{ \modu{f}{e} }}\\
\similar{ \modu{f}{e} }{\mc{f}{c}{e}} \vspace{1mm} \\
\inferrule{\similar e {e'}}{\similar{ \modu{f}{e} }{\mt{f}{t}{e'}}} \vspace{4mm} \\
\inferrule[Expressions]{}{\similar x x} \qquad {\similar f f} \qquad {\similar n n} \vspace{1mm} \\
\inferrule{\similar {\e{}}{ e'}}
{\similar{\e{}}{ \cast{c}{f}{e'}}} \qquad
\inferrule{\similar{\e1}{\e1'} \\ \similar{\e2}{\e2'}}{\similar{\comb{\e1}{\e2}}
{\comb{\e1'}{\e2'}}} \vspace{1mm} \\
\inferrule{\similar e e'}{\similar{\dlsabs{x}{e}}{\dlsabs{x}{e'}}} \qquad
\inferrule{\similar e e'}{\similar{\dlsabs{x}{e}}{\Tabs{x}{t}{e'}}} \vspace{1mm} \\
\inferrule{\similar{\e1}{\e1'} \\ \similar{\e2}{\e2'} \\\similar{\e3}{\e3'}}
{\similar{\ifz {e_1} {e_2} {e_3}}{\ifz {e_1'} {e_2'} {e_3'}}} \vspace{4mm}\\
\inferrule[Contexts]{\similar{e}{e'} \Rightarrow \similar{E[e]}{E'[e']}}{\similar{E}{E'}}
\end{array}
$$
\caption{Similarity} \label{fig:similarity}
\end{figure}
In figure~\ref{fig:similarity}, this relation is defined formally and
extended to modules and to programs. For programs,
$\similar{P}{Q}$ states that $P$ and $Q$ are syntactically identical,
ignoring casts, contracts and types.
We say that a system for typed migration is {\it sound} if
the migrated program is always in the $\trianglerighteq$ relation to
the original program. For the $MT$ transformation, this requires that
every element of $MT(P)$ is related to $P$.
This captures our intuition as to how typed migration should work:
that once we have migrated, we have proven the absence of errors in
the typed module. Further, if we get an answer, it is related to the
original answer. Since our reduction system tracks where errors occur, we are
able to express this statement formally.
\subsection{Soundness of our system}
Proving soundness for our system is a multi-step process. First, we
establish that the system resulting from the transformation agrees
with the original one, when
errors are ignored. This is established through a simple simulation relation
between programs. Second, we prove that programs in $MT(P)$ never
blame the typed module.
For the first of these steps, we again make use of the similarity property
mentioned above and defined in figure~\ref{fig:similarity}.
This relation between an untyped program (respectively, module or
expression) and a partially typed one, states syntactically that the
two programs $P$ and $Q$ are similar, written
$\similar{P}{Q}$, if they are identical ignoring contracts, types
and casts.
Similarity satisfies three lemmas:
\begin{lemma}
If $\similar{P}{Q}$ and $P \dlsreduces w$ and $Q \dlsreduces w'$ then $\similar{w}{w'}$.
\end{lemma}
\begin{lemma}
If $\similar{P}{Q}$ and $P$ reduces forever then $Q$ reduces
forever or $Q \dlsreduces \blame{f}$ for some $f$.
\end{lemma}
\begin{lemma}
If $\similar{P}{Q}$ and $P \dlsreduces \blame{f}$ for some $f$ then $Q
\dlsreduces \blame{g}$ for some $g$.\end{lemma}
\noindent\textbf{Proof Sketch}
These three lemmas all follow from similar simulation arguments. If
$\similar{e_1}{e_2}$, then there are three possibilities:
\begin{enumerate}
\item $e_2 = E[\cast{c}{f}{v}]$ Then either $e_2 \rightarrow
\blame{f}$ or $e_2 \rightarrow e_2'$ where $\similar{e_1}{e_2'}$.
This can be seen by simple inspection of the reduction rules for
casts.
\item $e_2 = E[\comb{\cast{\barrcnt{c_1}{c_2}}{f}{l}}{w}]$. Then $e_2
\rightarrow e_2'$ and $\similar{e_1}{e_2'}$
\item $e_1 = E[r_1]$ and $e_2 = E'[r_2]$ where $\similar{r_1}{r_2}$
and $\similar{E}{E'}$. Then $r_1
\rightarrow r_1'$ and $r_2 \rightarrow r_2'$ where
$\similar{r_1'}{r_2'}$ or $r_1 \rightarrow \blame{f}$ and $r_2
\rightarrow \blame{g}$. That the hypothesis holds is true from the
definition of similarity and the grammar for $E[]$. The fact that
the redexes reduce to similar terms or to errors can be seen from inspection of
the reduction rules where the redex is not a cast or the application
of a (blessed arrow) cast to a value.
Additionally, by the compositionality of similarity, placing
similar values into similar contexts produces similar terms.
Therefore, $\similar{E[r'_1]}{E'[r'_2]}$.
\end{enumerate}
Given this, similarity is consistently maintained by reduction, which
is all we need for the three lemmas. \qed
These lemmas are insufficient to establish soundness, since
they make no claim about who is blamed for an error. We therefore
introduce the notion of \emph{valid} expressions to prove that the
typed module is never blamed.
\paragraph{Validity}
Validity is a property of \emph{mixed} terms, those that combine code
from both typed and untyped modules (and the original main
expression). Before defining validity, we first present a new type
system for mixed terms which ensures that mixed terms use their typed subcomponents
appropriately. This type system has the judgement
$$
\mtypej{e}{t}
$$
which states that mixed expression $e$ has type $t$ in program $P$
and environment
$\Gamma$.
The rules of the type system are given in figure~\ref{fig:mixed-types}.
Note that untyped terms are not inspected by this judgement---instead,
their casts are simply trusted.
% Validity implies that
% there are no applications of numbers where the application is labeled
% with $\tau$, and also that every cast blaming $\tau$ is applied to a
% term that satisfies that cast.
\begin{figure*}[htbp]
$$
\inferrule[VT-Var]{}{\mtypej{x}{\Gamma(x)}} \qquad
\inferrule[VT-Cast]{e \mbox{\ closed}}{\mtypej{\cast{t}{f}{\e{}}}{t}} \qquad
$$
$$
\inferrule[VT-BlessedCast]{}{\mtypej{\cast{\barrcnt{t_1}{t_2}}{f}{v}}{\dlsproctype{t_1}{t_2}}} \qquad
\inferrule[VT-Int]{}{\mtypej{n}{\dlsint}}
$$
$$
\inferrule[VT-App]{{\mtypej{\e1}{\dlsproctype{t_1}{t_2}}} \\ % \\ separates premises
{\mtypej{\e2}{t_1}}}
{ {\mtypej {\comb{\e1}{\e2}} {t_2}}} \qquad
$$
$$
\inferrule[VT-If0]{{\mtypej{\e1}{t_1}} \\
{\mtypej{\e2}{t_2}} \\
{\mtypej{\e3}{t_2}}}
{ {\mtypej {\ifz{\e1}{\e2}{\e3}}{t_2}}}
$$
$$
\inferrule[VT-Abs]{\mtypejg{\Gamma,\hastype{x}{t}}{\e{}}{s}}
{\mtypej{\Tabs{x}{t}{\e{}}}{\dlsproctype t s}}
$$
$$
\inferrule[VT-TypeMod]{\mbox{$\mt{\tau}{t}{e} \in P$}}{\mtypej{\tau^{\tau}}{t}}
$$
\caption{Consistency Type System}\label{fig:mixed-types}
\end{figure*}
There a several rules to note in the type system. First, the rule
{\sc VT-Cast} does not ensure that its argument is well-typed.
Therefore, it applies even where the argument is an untyped term, and
the cast is protecting the context of the cast from its argument.
Second, the {\sc VT-BlessedCast} rule is necessary so that blessed
casts can appear during reduction, even though they are not part of
the syntax of types. Third, we allow the typed module to be used
without a cast in rule {\sc VT-TypeMod}. Such module references are
still protected from the untyped world, because they are within a
typed expression.
We now define two important properties of mixed terms.
\begin{definition}
A mixed term $e$ is \emph{consistent at $t$ in $P$} iff
$\mtypejg{\emptyset}{e}{t}$, or simply \emph{consistent} if there is
some $t$ such that $\mtypejg{\emptyset}{e}{t}$.
\end{definition}
\noindent
Terms may be consistent even if they do not originate in a typed module,
or even if some of their subterms are not consistent. For example,
$\mtypejg{\emptyset}{\cast{\dlsint}{f}{\dlsabs{x}{\comb{3}{x}}}}{\dlsint}$ for
any $f$ (including $\tau$), even
though the expression is patently erroneous.
Based on this definition, only some kinds of terms can be consistent:
typed abstractions, numbers, casts and applications of two consistent
terms.
\begin{definition}
A mixed term $e^f$ is \emph{originally typed in $P^\tau$} iff $f = \tau$.
\end{definition}
\noindent
This definition gives us a handle on those terms that came from the
original typed module. These are the terms that must not trigger errors
during the execution of the program. No guarantees are made about the
subterms of originally typed terms, however, in the case where they have a
different label.
With these definitions, we can define validity, the property that we use for our
central lemma. This property ensures both that numbers are
not in the operator position of a typed application, and that typed
terms satisfy any immediately surrounding casts. Maintenance of these
two properties is sufficient to ensure that $\tau$ is never blamed. The
third portion of the definition states that there is always a syntactic
barrier between consistent and inconsistent portions of the expression,
with the exception of numbers. This is the mechanism that is central
to maintaining the other two during reduction.
\begin{definition}
A closed mixed term $e_m$ is \emph{valid in program $P$} where $P$
has typed module $a$ iff all of the following hold:
\begin{enumerate}
\item every originally typed subexpression $d^\tau \in e_m$ is either consistent
or of the form ${\comb{\dlsabs{x}{e}}{e'}}^\tau$
\item for every subexpression of the form $\cast{t}{\tau}{e}$, $e$
is consistent at $t$.
\item every consistent term $d \in e_m$ is either a number, a cast, or the
immediate subterm of a consistent term or $d = e_m$.
\end{enumerate}
\end{definition}
Note that since the initial main expression $e$ in a program $P$ is
closed, it is also valid, since it does not contain any
originally typed subexpression or casts.
\begin{lemma}
\label{lem:central}
If a mixed term $e_m$ is valid in program $Q \in MT(P)$ (where $Q$
has typed module $a$) for
some $P$, and $e_m \rightarrow e_m'$ then $e_m'$
is valid in program $Q$.
\end{lemma}
\noindent\textbf{Proof Sketch}
This proof proceeds by cases on the reduction rule that takes $e_m$ to
$e_m'$. The important cases are {\sc Split}, {\sc Subst}, {\sc
TypedSubst} and {\sc LookupTyped}. We show two of these here.
Consider case {\sc Split}. Then $r =
{\comb{\cast{\barrcnt{c_1}{c_2}}{g}{v}}{w}}^f$ and $r' =
\cast{c_2}{g}{\comb{v}{\cast{c_1}{f}{w}}^f}$. We consider the three
components of validity in turn.
\begin{enumerate}
\item Originally-typed
subexpressions of $r'$ occur only in $v$ and $w$, or if the
resulting application has label $\tau$ (that is, that $f = \tau$).
If $v$ or $w$ contain
originally typed subexpression, these subexpressions must have been
originally typed in $r$, and so by
hypothesis they are still consistent.
Since $v$ is the argument to a blessed function cast, it must be an
abstraction, and therefore the application satisfies condition 1 (if
$f = a$).
If the redex is part of
a larger originally typed expression, then $r$ must have been consistent, and
must have had type $c_2$. But $\mtypejg[Q]{\emptyset}{r'}{c_2}$, so this
property is maintained.
\item Consider first the inner cast.
If $w$ is originally typed, then it must have been consistent (since the only
other possibility is not a value). Therefore, the whole application
must have been consistent, and thus $\mtypejg[Q]{\emptyset}{w}{c_1}$, which is
precisely the desired type.
If $g = \tau$, then we must also show that
$\mtypejg[Q]{\emptyset}{{\comb{v}{\cast{c_1}{f}{w}}^f}}{c_2}$. But then by
assumption, $\mtypejg[Q]{\emptyset}{v}{\dlsproctype{c_1}{c_2}}$ and
$\cast{c_1}{f}{w}$ trivially has type $c_1$.
If $r$ is the argument to a cast, we merely
must show that the type is preserved by reduction. But the original
type of $r$ must have been $c_2$, which is also the type of $r'$.
\item Both casts trivially satisfy this case. Thus we have to
consider $v$, $w$, and the application. Both the application and
$w$ are immediate arguments to a cast. If $v$ is consistent, then it
must be been a typed abstraction, since it is the argument of a
blessed arrow contract, and untyped abstractions are not consistent.
If it is a type-annotated abstraction, it must have label $\tau$,
as required by the grammar.
Thus, by hypothesis, it must satisfy its cast, and have type
$\dlsproctype{c_1}{c_2}$. Therefore, since the operand is a cast to
$c_2$, $\mtypejg{}{\comb{v}{\cast{c_1}{f}{w}}}{c_2}$.
\end{enumerate}
This concludes the case.
Consider case {\sc Subst}. Then $r = \comb{\dlsabs{x}{e}}{v}$ and $r'
= [v/x]e$. We again consider the three components of validity.
First, note that references to $x$ in $e$ cannot be either originally
typed or part of a consistent expression (without an intervening
cast), since $x$ is bound by an untyped abstraction.
\begin{enumerate}
\item First, any
originally typed and consistent expressions in $v$ remain so in
$r'$. This leaves the originally typed expressions in $e$. Choose
one of these, $d^a$. By hypothesis, either \mtypej{d^a}{t} or $d^a
= \comb{\dlsabs{y}{e_1}}{e_2}$. In the former case, $x$
is not in the free variables of $d^a$, since it cannot occur outside
of a cast, and {\sc VT-Cast} requires that the cast argument be
closed. Thus, $[v/x]d^a = d^a$. In the latter case, $[v/x]d^a =
\comb{\dlsabs{y}{[v/x]e_1}}{[v/x]e_2}$ which is of the appropriate
form.
\item By a similar argument, if $e$ has a subexpression of the form
$\cast{t}{a}{e'}$, then $e'$ is consistent and $[v/x]e' = e'$.
\item First, note that neither $v$ nor $e$ is consistent unless they are numbers or
casts, since they are not immediate subterms of consistent terms.
Thus consistent subterms of $v$ and $e$ that are not numbers or
casts must be proper subterms. The consistent subterms of $v$ are
preserved in $r'$. Further, since no reference to $x$ in the body
of $e'$ is consistent, the consistent subterms of of $e$ are
maintained.
\end{enumerate}
This concludes the case. The others are proved in a similar way. \qed
Given this lemma, we can now show that blame is never assigned to the
typed module.
\begin{lemma}[MT never blames the typed module]
If $P^\tau \in MT(P)$, and $P^\tau \dlsreduces \blame{g}$ then $g \neq
\tau$.
\label{lem:no-blame}
\end{lemma}
\noindent\textbf{Proof Sketch}
The only way we could ever reduce to \blame{\tau} is if
${\comb{n}{v}}^\tau$ is the redex or if a cast fails and blames $\tau$.
However, since the main expression is originally untyped, and thus
valid, and remains valid by lemma~\ref{lem:central}, this is impossible. \qed
We can now conclude the main theorem of this chapter.
\begin{theorem}[Soundness of MT]
\label{thm:dlsmain}
If $P^\tau \in MT(P)$ then $\related{P}{P^\tau}$.
\end{theorem}
\noindent\textbf{Proof Sketch}
First, note that if $\ttgproves{e}{t}{e'}$ then
$\similar{e}{e'}$. Therefore, if $P^\tau \in MT(P)$, then
$\similar{P}{P^\tau}$.
%
From this, if $P \dlsreduces v$ and $P^\tau \dlsreduces v'$ then
$\similar{v}{v'}$.
Similarly, if $P$ reduces forever, then $P^\tau \not\dlsreduces v$ for any
$v$. Therefore, lemma~\ref{lem:no-blame}, stating that if an error occurs,
the blame is assigned to one of the untyped modules, suffices for the
proof. \qed
}