samth-dissertation/icse-exp.tex
Sam Tobin-Hochstadt 9c7a001a36 init
2017-07-10 13:02:10 -04:00

550 lines
24 KiB
TeX

\newcommand\onestar{\multicolumn{1}{c|}{$\star$}}
\newcommand\onestarlast{\multicolumn{1}{c||}{$\star$}}
\newcommand\twostar{\multicolumn{1}{c|}{$\star\star$}}
\newcommand\threestar{\multicolumn{1}{c|}{$\star\star\star$}}
\begin{schemeregion}
Assessing whether Typed Scheme truly provides an easy transition from
untyped to typed code
demands practical experience.\footnote{For the development of Typed
Scheme, several thousand lines of textbook Scheme code were used
as the initial data set for determining what idioms to consider.
Since such code tends to be illustrative of idioms, it is well
suited for this purpose but not for evaluation. Additionally, a
system should never be evaluated for performance on its own
training data. } This chapter presents the result
of porting several scripts, libraries, and complete applications
from PLT Scheme to Typed Scheme. The chosen examples are
representative of working code in that they are created by experienced
programmers and they have been in use for a significant amount of
time. Similarly, the process itself is representative: for all of the
examples,
all communication between the original creator of the program and the
``porter'' was
disallowed.
The first section discusses the selected examples and the
particular problems that they pose. The second section
describes the changes that had to be made for the selected programs
to typecheck. Most changes simply added type annotations. Some required changes to the code to
appease the typechecker, but other
changes fixed bugs, or handled errors
properly where they had been ignored. In particular, a description of all the
changes made to the code of the largest example is given in section~\ref{sec:ss-changes}.
The third section
provides a quantitative account of the complexity involved in the
porting process, as well as a subjective account.
\section{Ported Programs} % FIXME - new title
\label{s:prog}
The chosen programs run the gamut from widely distributed
libraries to single-use scripts that rely heavily on PLT Scheme
features.
%
Together, they total over 5000 lines of code. In addition,
over 1000 lines of code in the PLT Scheme standard library is written
in Typed Scheme, including wrapper modules for many libraries that
are used in the selected programs.
\paragraph{Squadron Scramble}
This application, created by Matthias Felleisen, is a version of the
multi-player card game Squadron Scramble~\cite{squadron-scramble},
which resembles Rummy. The
original untyped implementation consists of 10 PLT Scheme modules,
totaling 2200 lines of implementation code, plus 500 lines of
unit tests. The program maintains data about the set of cards in an
external XML file, which it reads on startup.
\paragraph{Scheme Code Metrics}
This script is designed for applying syntactic metrics to collections
of PLT Scheme modules and files. It traverses directory trees,
looking for source files, and analyzes the files found to
determine the frequency of various programming constructs.
The program does not have a test suite; it is used by its author on a
regular basis. It consists of a single module of just over
400 lines.
\paragraph{Money Management}
This program, also developed by Matthias Felleisen, helps with balancing
checkbooks, and has been used for this purpose for the past decade. It
is not released for any other use, nor does it have a test suite.
During its extensive use, the program has seemed bug-free over the
past several years. The program consists of one script and one
library module, totaling just under 400 lines.
\paragraph{Spam Filtering}
This script is used to analyze the contents of an
spam folder. It has been used only by its creator, and
consists of one PLT Scheme module of 311 lines.
\paragraph{System Administration}
This library is designed for handling system administration tasks,
with particular emphasis on those relating to Subversion repository administration.
It consists of 15 modules and one data file, totaling more than 1200 lines.
\paragraph{Random Number Generation}
This is an old version of the PLT Scheme implementation of the
widely-used SRFI 27 library for random number and bit generation. It
has been in wide distribution along with PLT Scheme for several years.
It consists of a single library module of almost 600 lines.
\section{Program Changes}
Porting this wide variety of programs poses a number of common
challenges and brings similar benefits to the programs
involved. Most changes are simply providing types, but some require
changes to the code as executed.
\subsection{Annotations}
The vast majority of the changes required are simple type annotations.
The most common is providing the type of a top-level function
definition. Often, this simply involves translating an existing
comment to a statically-checked type annotation. For example, the
untyped code:
\begin{schemedisplay}
;; Player Hand $\rightarrow$ Void
(define (clean-up player hand)
(set-player-hand! player hand))
\end{schemedisplay}
\noindent
becomes
\begin{schemedisplay}
(: clean-up (player Hand -> Void))
(define (clean-up player hand)
(set-player-hand! player hand))
\end{schemedisplay}
\noindent
Almost all top-level definitions are annotated in this manner, without
requiring changes to the definition of the function.
The other common site of annotations is structure definitions. The
fields of a structure must be annotated with types, so that Typed
Scheme can determine the types of the structure functions.
For example, the untyped code:
\begin{schemedisplay}
;; \scheme|Administrator| = \scheme|(make-administrator [Listof IPlayer])|
(define-struct administrator (iplayers))
\end{schemedisplay}
becomes
\begin{schemedisplay}
(define-struct: administrator ([iplayers : (Listof iplayer)])
\end{schemedisplay}
\noindent
Again, as the example shows, this is often a translation of a
pre-existing comment.
Two other forms of required annotation are rare. One is the definition of
new type aliases:
\begin{schemedisplay}
(define-type-alias Raw-Group (Pair Symbol (Listof Raw-Student)))
\end{schemedisplay}
\noindent
This specifies that the type name \scheme|Raw-Group| is equivalent to
a pair of a symbol and list of \scheme|Raw-Student|s. Type aliases
are useful for simplifying and shortening type annotations.
Finally, when an untyped library is used, the type of imported values
must be specified with the \scheme|require/typed| form.
For example, this require of the untyped library \scheme|srfi/13|:
\begin{schemedisplay}
(require srfi/13)
\end{schemedisplay}
becomes, in the typed version:
\begin{schemedisplay}
(require/typed srfi/13
[string-pad-right (String Number Char -> String)])
\end{schemedisplay}
\noindent
The \scheme|require/typed| form explicitly specifies the type of each
imported identifier, allowing Typed Scheme to generate the appropriate
contracts as described in chapter~\ref{chap:integrate}.
In many cases, the PLT Scheme standard library provides a typed
version of existing libraries.\footnote{These libraries were contributed in part
by Yinso Chen, Felix Klock, Ivan Gazeau, and David van Horn.}
In that case, the new \scheme|require|
statement would be
\scheme|(require typed/srfi/13)|, and no annotations would be required
for the \scheme|string-pad-right| function.
All four of these annotation forms merely specify the programmers'
understanding of the type structure of their program, and
provide executable form for documentation that is often already in the
source code.
Some other annotations are required. Typed Scheme's inference for
polymorphic functions does not handle the case where polymorphic
values are used as the argument to polymorphic functions. In this
case, one of the functions must be explicitly instantiated at the
appropriate type. In some other cases, local type inference cannot
determine the type of a variable, and it must be explicitly
annotated. Finally, the bound variables of $\lambda$-abstractions used as
the argument to polymorphic functions must be explicitly annotated,
since local inference is also unable to determine the desired type.
Fortunately, all of these occur rarely in the programs ported---see
section~\ref{sec:stats} for precise measurements.
\subsection{Improvements}
The first improvement concerns simple mistakes, such as passing the
wrong number of arguments to a function. The chosen examples contain
only five such errors, four of which are in example uses of data
structures, rather than in critical code paths. This is not
surprising because these programs had been in significant prior use.
In the context of finding simple errors, type systems are mostly
useful during the actual programming process. Such simple mistakes do
turn up in comments, which are not exercised by testing, and Typed
Scheme discovered several instances where the comments about a
procedure disagreed with its implementation.
The second category of problems concerns
handling of erroneous input. This manifests itself in three forms:
\begin{itemize}
\item Many scripts, especially those originally intended to be used
only once, are not designed for robustness. Therefore, the
possibility of failures is ignored to simplify the programming
task. In PLT Scheme, the simplest case is when a script uses a
function that produces \scheme{#f} to indicate failure, but the script
does not deal with such results. For example, the
expression
\scheme|(regexp-match "([^<, ]*)@acm\\.org" s)| produces a list
containing the email address in the string \scheme{s} if it
succeeds, and \scheme{#f} if it fails. Many scripts that use such
library functions ignore the possibility of failure.
Our solution is to provide an \scheme{assert} function, which
translates \scheme{#f} into an exception. This allows programmers
to compactly state assumptions in a
type-correct way. For example, the expression
\begin{schemedisplay}
(assert (regexp-match "([^<, ]*)@acm\\.org" s))
\end{schemedisplay}
always
produces a list---or, if the regular expression matching fails,
raises an error. \scheme|assert| has the type
\begin{schemedisplay}
(All (X) ((U X #f) -> X))
\end{schemedisplay}
\item A similar case occurs when a function \scheme{f} uses a function \scheme{g}
with a restricted domain. Ideally, \scheme{f} should check the
applicability of \scheme{g} and raise an error if the condition isn't
met. Unfortunately, many scripts ignore such cases, and thus fail
late and in unexpected ways. For example, the \scheme|/| function
requires at least one argument. One function in the code base
applied \scheme|/| to a list of arguments, even when that list
might be empty. To address this, the Typed Scheme versions
of our samples make types as precise as possible for functions such as
\scheme{g} and thus force errors to be discovered as early as possible,
with informative error messages. In the example case, the
function was changed to always take a minimum of one argument.
\item The most serious problem is due to input obtained from an
external source and assumed to be of a certain shape. For
example, the Squadron Scramble game keeps information about the
cards of the game in an external XML file. When the data
is read in, it is assumed to be of the required format, rather
than being parsed. The resulting data is used directly as input
to remainder of the computation, although
the required invariants are not checked. Indeed, the documentation for the code
contains a schema for the XML format, but this is not integrated into
the system.
These assumptions about input can be pervasive in large portions
of code, as in Squadron Scramble. If they are, it
can be helpful to factor out all of the input-handling code into a
module that remains untyped, exporting only functions that produce
the relevant results in a type-correct faction. In
simple cases, additional error checking suffices.
\end{itemize}
Our third category of improvements is due to the elimination of
type-inconsistent assignment statements. For example, one program
always creates a data structure using an \scheme{init}
function that constructs the structure with potentially
type-incorrect data and then mutates it to be correct. This pattern
has two problems. On one hand, it assumes that all the instances of
this data structure are created using the \scheme{init} function, but
this is not guaranteed. On the other hand, it makes the invariant
harder to discover and maintain by future programmers.
In such cases, our Typed Scheme programs forgo mutation, and ensure
that the data structure is always correct by construction. Another
example of this is a list variable that is later initialized by
mutation. In two examples, the initial value was \scheme|#f|, a value
which is not acceptable to the other uses of the variable. The
necessary change is simply to initialize the variable with
\scheme|'()| instead.
\subsection{Limitations of Typed Scheme}
Unfortunately, not all changes that Typed Scheme requires
are clear-cut improvements to the program. Our experiments have
pinpointed three weaknesses in our approach. This section
discusses a few categories of such problems and how they affect the
porting process.
First, some facts about a program that are obvious to the programmer
can not be proved by typechecker. For example, this
\scheme|cond| expression covers all possibilities (provided that
\scheme|x| is an integer), even though Typed
Scheme cannot prove it:
\begin{schemedisplay}
(cond [(= x 0) ...]
[(< x 0) ...]
[(> x 0) ...])
\end{schemedisplay}
\noindent
Fortunately, it is easy to rewrite this expression to use
\scheme|else|, at which point Typed Scheme is able to check it
correctly.
Another example of this is that sometimes information is known
about the relationship between the types of two variables: if
\scheme|x| is \scheme|#f|, then \scheme|y| is not. While this can be
expressed in some cases via the logical system describe in
chapter~\ref{chap:occur-extend}, it is not always possible. In this
case, additional dynamic checks must be added to the program.
Second, Typed Scheme cannot, in general, typecheck macros that
generate hidden recursive code. In most cases, local type
inference is sufficient to infer the types of the generated loop
variables. When this is not the case, however, the programmer must
rewrite the code with an explicit loop, which can then be given an
explicit type annotation. This is a rare problem,
but it causes significant difficulty to the porter when it occurs.
Third, some functions are used in special ways in PLT Scheme, such
that the type system handles them specially. An example is the
\scheme|list| function, which generates fixed-length lists when
applied to a known set of arguments. When used to generate
fixed-length lists in a higher-order context, Typed Scheme is unable
to apply the special type rule. The workaround is to $\eta$-expand
the \scheme|list| function for the particular number of arguments,
which allows Typed Scheme to apply the type rule for \scheme|list| to
the explicit application. Another example of such a function is the
{\variablefont{values}} function.
Fourth, in some cases, occurrence typing does not properly track the
uses of type predicates. For example, the expression
\scheme|(andmap number? xs)| should verify
that \scheme|xs| is a list of numbers:
However, this is not currently the case for Typed Scheme. The porting
effort found a few of these examples, which I hope to address in
future work.
\subsection{Changes in Squadron Scramble}
\label{sec:ss-changes}
The Squadron Scramble game is the largest of our experiments. This
section describes all of the non-annotation changes required for
Squadron Scramble to typecheck. A total of twelve changes were
required to the code of the game, some of which fixed bugs, and some
of which worked around limitations of Typed Scheme.
\paragraph{Workarounds}
By far the most significant change was factoring the XML reading code
into a separate, untyped file, and then using \scheme|require/typed|
to import the relevant values into Typed Scheme with appropriate
types. This involved creating a new file of 30 lines, all taken from
one of the existing files. Two values from the new file were then
\scheme|require/typed| back into the original file.\footnote{The use
of untyped external data sources was the original motivation of \citet{abadi}}
In two places, the same datatype was assumed to be covered by two
predicates in the untyped code. However, Typed Scheme could not prove
that these predicates were sufficient to cover the data type---in
fact, the untyped program assumed that all data fit a format that was
never checked. The Typed Scheme code added an additional error case
in both places.
In one place, a function produced multiple values of correlated
types. Typed Scheme is not able to keep track of this correlation,
necessitating an additional dynamic check.
As mentioned, Typed Scheme does not yet propagate type information
from the \scheme|andmap| function, which necessitated an additional
check in one place. Similarly, Typed Scheme does not allow the
programmer to express that the user-written \scheme|set?| predicate
verifies that the argument is a list when it is a set. Finally, the
expansion of \scheme|or| uses a temporary variable, which does not fit
with the rules for $\cf$ given in chapter~\ref{chap:occur}. Manually
expanding the \scheme|(or A B)| expression into \scheme|(if A #t B)|
is sufficient to work around this limitation. Each of these occurred
precisely once in Squadron Scramble (and in none of the other
experiments).
\paragraph{Bug Fixes}
In one location, the \scheme|string->number| procedure was used, while
neglecting that the result might be \scheme|#f| if the provided string
cannot be parsed as a number. The \scheme|assert| procedure converts
this possibility into an exception, as described above.
Four similar functions, all used as examples, call a constructor
function with the wrong number of arguments. Fixing these was very
simple.
\section{Statistics and Results}
\label{sec:stats}
\begin{figure}[t]
\begin{tabular}{|l|r|r|r|r|r|r||r|}
\hline
\mbox{} & Squad & Metrics & Acct & Spam & Sys & Rand & Total \\
\hline
{\sc Lines} & 2369 & 511 & 407 & 315 & 1290 & 618 & 5510 \\ \hline
{\sc Increase} & 7\% & 25\% & 7\% & 6\% & 1\% & 3\% & 7\% \\ \hline \hline
{\sc Useful Ann} & 96 & 51 & 25 & 16 & 53 & 26 & 267 \\ \hline
{\sc \scheme|l:| Ann} & 34 & 22 & 2 & 8 & 22 & 0 & 88 \\ \hline
{\sc Other Ann} & 12 & 7 & 1 & 1 & 0 & 1 & 22 \\ \hline
{\scheme|define-struct:|} & 16 & 2 & 0 & 0 & 4 & 1 & 23 \\ \hline
{\sc Type Alias} & 13 & 7 & 0 & 2 & 2 & 3 & 27 \\ \hline
{\scheme|require/typed|} & 3 & 1 & 0 & 0 & 5 & 0 & 9 \\ \hline
{\sc Ann/100 Line} & 7.3 & 18 & 6.9 & 8.6 & 6.7 & 5.0 & 7.9 \\ \hline\hline
% total annotations: 436
{\sc Fixes} & 5 & 3 & 4 & 5 & 8 & 0 & 25 \\ \hline
{\sc Fixes/100 Line} & 0.21 & 0.59 & 0.98 & 1.6 & 0.62 & 0.0 & 0.45 \\ \hline
{\sc Problems} & 7 & 4 & 3 & 1 & 0 & 1 & 16 \\ \hline
{\sc Prob/100 Line} & 0.29 & 0.78 & 0.73 & 0.32 & 0.0 & 0.16 & 0.29 \\ \hline\hline
{\sc Difficulty} & \twostar & \threestar & \twostar & \onestar & \onestar & \onestarlast & \\ \hline
\end{tabular}
\caption{Statistics}
\label{f:stats}
\end{figure}
Figure~\ref{f:stats} presents the quantitative results of our measurement
effort.
Each column represents one of the programs described in
section~\ref{s:prog}. The rows quantify the size of the programs,
and the effort required to port them to Typed Scheme:
\begin{itemize}
\item {\sc Lines:} This is the total line count for each program
after porting, including comments and blank lines.
\item {\sc Increase:} This is the percentage increase in number of
lines from the untyped version to the typed version. The {\bf
Metrics} program is an outlier here---many of its types are quite
large and were formatted on multiple lines.
\item {\sc Useful Ann:} This the count of annotations of variables
that recorded valuable design information about the program. It
includes top-level functions, local functions, top-level constants
in which the untyped program provided a comment specifying the type,
and specifications of the types of mutable data, such as hash
tables.
\item \scheme|lambda:| {\sc Ann} This is the count of uses of the
\scheme|lambda:| form, which annotates the bound variables of a
\scheme|lambda| with types. With the exception of a handful of
cases, this is necessary only
when an anonymous function is provided as the argument to a
polymorphic function such as \scheme|map|.
\item {\sc Other Ann:} This is a count of all other variable
annotations and instantiations in the program, which do not appear
to record useful design information.
\item \scheme|define-struct:| This is the number of uses of
\scheme|define-struct:| in the typed version of the program. In all
cases, this is the same as the number of \scheme|define-struct|s in
the original program.
\item {\sc Type Alias:} This is the number of type aliases defined in
the typed version of the program.
\item \scheme|require/typed|: This is the number of identifiers
imported with \scheme|require/typed|, each of which requires a type
to be specified.
\item {\sc Ann/100 Line:} This is the number of annotations, including
structs, aliases, and \scheme|require|s, per 100 lines of the typed
version.
\item {\sc Fixes:} This is the number of changes to the program that
fixed actual or potential errors in the original code.
\item {\sc Fixes/100 Line:} The number of such fixes per 100 lines of
the typed program.
\item {\sc Problems:} This is the number of changes to the program that
are merely to work around the typechecker. These varied widely in
lines of code: moving XML parsing to a separate file affected close
to 100 lines, whereas other changes involved the addition of a
single function call.
\item {\sc Prob/100 Line:} The number of such workarounds per 100 lines of
the typed program.
\item {\sc Difficulty:} Finally, the last row indicates the subjective
difficulty, as experienced by the programmer(s) doing the port of the
program. One star is easiest, and three represent the most
difficult. Here, the {\bf Metrics} system is the most
difficult, largely due to its complex data structures. In
particular, one type of structure is used for multiple different
purposes in the program, which makes it difficult to express its
intended type in Typed Scheme. Some of the complexity of these data
structures is reflected in the significant increase in the size of
the program.
In contrast, the programs that were simplest to port have a simple
type structure, mostly dealing with only one type or having no interesting data
structures. These distinctions, and not the
quantifiable labor effort, ultimately make the greatest difference in porting
effort.
The {\bf Squadron Scramble} game implementation fell in between. It does not
pose serious conceptual problems, and the code is meticulously
commented and maintained. But it is a code artifact of significant
size, and thus has its own idiosyncrasies, such as the use of an
external XML file to store data. Therefore, it required more than a
trivial amount of effort to port. Ultimately, the statistical
metrics indicate that {\bf Squadron Scramble} was representative of the
overall sample.
\end{itemize}
\subsection{Interpretation}
This quantitative measurement provides a guide to the effort required to port programs
written in PLT Scheme to Typed Scheme. It indicates that the program
increases in size by less than a tenth, that annotations are not overly frequent,
and that content changes to the running code are rarely
required.
These numbers can only give a rough guide to the effort of
porting programs to Typed Scheme---the most difficult portion of any
programming task is understanding the nature of the problem and the
form of the correct solution. In some cases, the porting programmer
must develop a sophisticated understanding of the workings of the
system. No type system can offer guidance on
the ultimate nature of this task. But the estimates offered
here show that Typed Scheme requires a reasonably small effort,
especially given the benefits for future program maintenance.
\end{schemeregion}