samth-dissertation/prior.tex
Sam Tobin-Hochstadt 9c7a001a36 init
2017-07-10 13:02:10 -04:00

266 lines
13 KiB
TeX

\begin{schemeregion}
\label{chap:prior}
The design of Typed Scheme builds on extensive prior work. This
chapter surveys the most important strands. The first
section discusses the extensive history of static checking for
untyped languages, in particular LISP and Scheme. While Typed Scheme
differs significantly from all of these prior efforts, their design
choices and experience informed the choices made in creating Typed
Scheme.
%
The second section reviews prior work on designing multi-language
systems, especially those that use contracts at the boundaries.
%
The third section describes previous solutions to the
problem of typechecking a language with macros,
as it has been explored both in the context of static checkers and soft
typing.
%
The final section covers two key innovative technologies that
Typed Scheme builds on--runtime software contracts and module
systems.
\section{Static Checking for Scheme and LISP}
The history of programming languages knows many attempts to add or to
use type information in conjunction with untyped languages.
In 1976, \citet{c:typed-lisp} proposed a type system to
simplify the operation and use of Boyer-Moore-style theorem
provers~\cite{boyer-moore97}.
Since then, the designers of Common Lisp~\cite{cl2} as well as many
other languages have included type
declarations in such languages, often to help compilers, sometimes to
assist programmers. From the late 1980s until recently, people have
studied soft
typing~\cite{cf:pldi91,awl:popl94,wc:toplas97,hr:fpca95,ff:toplas99,mff:popl06},
a form of type inference to assist programmers debug their programs
statically. This work has mainly been in the context of Scheme but
has also been applied to Python~\cite{some-mit-ms-thesis} and Erlang~\cite{mw:erlang}.
%Recently,
%the slogan of ``gradual typing'' has resurrected the LISP-style
%annotation mechanisms and has had a first impact with its tentative
%inclusion in Perl6~\cite{tang}.
This section surveys this body of work, starting with previous static
type systems for Scheme and LISP.
\subsection{Static Type Systems}
Many researchers have developed systems that brought together both
static type systems and LISP or Scheme programming, usually to
make it easier to reason about programs.
In 1976, Cartwright presented TYPED
LISP~\citeyearpar{c:typed-lisp}. However, in his paper (section 5) he
describes having to abandon the policy of rejecting type-incorrect
programs because the types of variables in conditionals had overly
broad types, which is precisely the issue that Typed Scheme and
occurrence typing address. TYPED LISP also did not consider the
problem of interoperation with untyped code.
Wand's Semantic Prototyping System (or SPS)~\cite{wand84} included a type
system for Scheme programs, designed for implementing denotational
models of languages. The type system did not include any features
for handling typical Scheme idioms, nor for interoperating soundly
with untyped code.
Haynes~\cite{h:infer} designed a type system called Infer based on
row types for Scheme. His system did not accommodate Scheme
programming idioms in the way
that Typed Scheme does, and aimed for complete type inference
for all programs, a non-goal of the Typed Scheme project. However,
his system does support restricted forms of variable-arity
polymorphism~\cite{dh:var-ar}. This system is discussed in detail in
section~\ref{sec:esop-related}.
\citet{m:nisp} developed Nisp, an unsound type system for Common
LISP. It allows the description and checking of many types similar to
those in Typed Scheme, but does not provide checking for typical LISP
idioms. It also does not provide sound interaction with untyped
code.
Leavens~\cite{leavens} designed an ML-like type system for Scheme,
but made no attempt to accommodate Scheme idioms, or to integrate with
untyped code.
The Bigloo Scheme compiler~\cite{bigloo} provides a simple type
system for Scheme, primarily for the purpose of optimization. This
system does not have safe interoperation with untyped code, nor does
it accommodate Scheme idioms.
The Strongtalk system~\cite{strongtalk} includes a sophisticated
typechecker for
Smalltalk programs, designed by analysis of existing Smalltalk
idioms. The system was therefore able to accomodate large bodies of
existing Smalltalk code, including portions of the then-standard
library. However, Strongtalk did not attempt to check the behavior
of still-untyped portions of a program.
\subsection{Soft Typing}
The goal of the soft typing research agenda is to provide an optional type
checker for programs in untyped languages. One key premise is that
programmers shouldn't have to write down any type definitions or type
declarations. Soft typing should work via type inference only.
Another premise is that soft type systems should never prevent
programmers from running any program. If the type
checker encounters an ill-typed program, it should insert run-time checks that
restore typability and ensure that the invariants of the type system
are not violated.
Naturally, a soft type system should minimize these insertions of
run-time checks. Furthermore, since these insertions represent potential
failures of type invariants, a good soft type system must allow programmers
to inspect the sites of these run-time checks to determine whether they
represent genuine errors or weaknesses of the type system.
Such soft typing systems are complex
and brittle, however. On one hand, these systems may infer extremely large types
for seemingly simple expressions, greatly confusing the
programmer, either the original or the maintenance
programmer who has taken on old code. On the other hand, a
small syntactic change to a program without semantic consequences can
introduce vast changes into the types of both nearby and remote
expressions. Experiments with undergraduates---representative of
average programmers---suggest that only the very best understood the tools
well enough to make sense of the inferred types and to exploit them for
the assigned tasks. For the others, these tools turned into time sinks with
little benefit.
Roughly speaking, soft typing systems fall into one of two classes, depending
on the kind of underlying inference system. The first soft type
systems~\cite{cf:pldi91,wc:toplas97,hr:fpca95, h:dynamic-article,
am:static-types-dynamic-language} used inference engines
based on Hindley-Milner though with extensible record types. These systems
are able to type many actual Scheme programs, including those using
outlandish-looking recursive datatypes. Unfortunately, these systems
severely suffer from the general Hindley-Milner error-recovery
problem. That is, when the type system signals a type error, it is
extremely difficult---often impossible---to decipher its meaning and to
fix it.
% \footnote{Fagan gives an example of the \scheme|taut| function, which
% checks if arbitrary-arity boolean functions are tautologies. This example
% works in Typed Scheme as well.}
In response to this error-recovery problem, others built inference systems
based on Shivers's control-flow analyses~\citeyearpar{shivers-thesis} and
Aiken's and Heintze's set-based analyses~\cite{awl:popl94,nh:sba}. Roughly
speaking, these soft typing systems introduce sets-of-values constraints
for variables and values and propagate them via a generalized
transitive-closure propagation~\cite{awl:popl94,ff:toplas99}. In this
world, it is easy to communicate to a programmer how a value might flow
into a particular operation and violate a type invariant, thus eliminating
one of the major problems of Hindley-Milner based soft
typing~\cite{ffkwf:mrspidey}.
My experience and evaluation, as well as that of other users of soft
typing systems, suggest that Typed Scheme works
well compared to soft typing. First, programmers can easily
convert entire modules with just a few type declarations and annotations
to function headers. Second, assigning explicit types and rejecting programs
actually pinpoints errors better than soft typing systems, where
programmers must always keep in mind that the type inference system is
conservative. Third, soft typing systems do not support type
abstractions well. Starting from an explicit,
static type system for an untyped language should help introduce these
features and deploy them as needed.
The prior soft typing research inspired this work on occurrence typing. These
systems employed simplistic \scheme{if}-splitting rules that performed a case
analysis for types based on the syntactic predicates in the test
expression. This idea was derived from \citet{c:typed-lisp}'s
\texttt{typecase} construct (also see below) and---due to its
usefulness---inspired the generalization to occurrence typing.
The major advantage of soft
typing over an explicitly typed Scheme is that it does not require any
assistance from the programmer. In the future, I hope to borrow
techniques from soft typing for automating some of the conversion process
from untyped modules to typed modules.
\section{Interlanguage Interoperability}
Smooth operation between typed and untyped code is essential to Typed
Scheme. The design of this interoperation relies heavily on prior
work on language interoperation. \citet{gff:oopsla05} developed
ProfessorJ, which combines PLT Scheme and Java. Their
interoperability strategy relies on custom \emph{mirrors}, generated
specifically for each class, which dynamically check the invariants of
methods. They also extend each language with the features of the
other, using the Java-like class system of PLT
Scheme~\cite{fff:classes}, and adding first-class functions to their
implementation of Java. Their system also automatically generates
contracts as part of the mirrors for each class.
\citet{mf:multilang-toplas} consider
interlanguage interoperability between a typed and untyped lambda
calculus, and demonstrate the such interoperability corresponds to
contracts, although their work does not generate contracts. They
claim, but do not prove, that only untyped portions of the program can
be blamed. However, their system has only two parties.
\section{Implementing Types in Scheme}
The implementation of typecheckers for Scheme poses several
challenges. First, the typechecker must somehow cope with
user-written macros. Second, the typechecker must accommodate
pre-existing code in a sound manner, preferably without requiring
changes to the untyped Scheme implementation. Third, several systems,
like Typed Scheme, have implemented the type system via the underlying
macro system.
The SPS system mentioned above used macros to implement the
typechecker, although without hygiene. Typed Scheme scales up the
approach of SPS to a modern macro and module system, enabling the
seamless integration between typed and untyped code.
Flanagan's static analyzer for DrScheme~\cite{ffkwf:mrspidey} analyzed
expanded programs and used syntax source information to display the
analysis results in the program editor. The analyzer included a macro
protocol that let programmers annotate their programs with hints for
the analyzer.
The Ziggurat project~\cite{fs:ziggurat-jfp} has investigated
alternate approaches to static analysis and other program observations
in the presence of macros. Analyses in Ziggurat are implemented as
methods on expression nodes, and derived expression forms (that is,
macros) may either override analysis methods with special behavior or
defer to the default analysis of the macro's expansion. Ziggurat
represents a new approach to defining macro protocols, and it is as
yet unclear how the Ziggurat approach compares with the mechanisms
described here.
\section{Contracts and Modules}
The design and implementation of Typed Scheme builds heavily on prior
work in contracts and module systems.
Runtime software contracts have a long history, but traditionally did
not support two features necessary for the design of Typed Scheme.
First, Scheme, and thus Typed Scheme, makes extensive use of
higher-order language constructs, which must therefore be accommodated
by the contract system. Second, Typed Scheme isolates typed from
untyped code, and thus needs a mechanism for determining which portion
of the program violated a contract.
These two shortcomings are addressed by Findler's contract
system~\cite{ff:ho-contracts}, as implemented in PLT
Scheme~\cite{plt-tr2009-reference-v4.2.2}. This system provides
guarantees that ensure that the appropriate portion of code is blamed
for violating a contract, even when the violation is of a higher-order
contract and takes place arbitrarily later in the execution of the
program. Typed Scheme builds directly on this system, automatically
generating contracts from types.
Typed Scheme also relies on a module system that provides strong
abstraction barriers, in order to prevent unauthorized access by the
untyped code to unprotected portions of the typed program. For Typed
Scheme, this module system is the PLT module
system~\cite{f:modules,plt-tr2009-reference-v4.2.2}. This system
provides strong abstraction barriers for both values and macros. It
also allows Typed Scheme to be implemented entirely as a library,
using the techniques described in chapter~\ref{chap:impl}.
\end{schemeregion}