361 lines
15 KiB
TeX
361 lines
15 KiB
TeX
\begin{schemeregion}
|
|
\section{Modules, or You Want it When, Again?}
|
|
\label{sect:modules}
|
|
|
|
The PLT Scheme module system~\cite{f:modules} allows programmers to
|
|
group definitions, use imports and exports to control the
|
|
scope of names, and specify the dependencies between modules. The
|
|
presence of macros complicates the notion of dependence between
|
|
modules.
|
|
|
|
In the presence of procedural macros, a compiler must execute parts of
|
|
a program in order to deal with the remainder of the program. This
|
|
blurs the line between compilation and execution.
|
|
%
|
|
In particular, an interpreter may draw the line in a different place
|
|
than the compiler, requiring programmers to debug their compiled
|
|
program after they have already debugged their interpreted program. To
|
|
eliminate this potential for inconsistency, the PLT Scheme module
|
|
system require explicit module dependencies and, based on these,
|
|
provides uniform behavior in both interactive and
|
|
batch-compilation mode.
|
|
|
|
% Modules are compilation units, and every module must be compiled
|
|
% before it can be used. Modules contain declarations of their direct
|
|
% dependencies. When a module is compiled, the module system uses those
|
|
% declarations to determine the portions of existing modules that must
|
|
% be executed to support the compilation of the current module.
|
|
% %
|
|
% If a macro transformer depends on a value definition, the macro's
|
|
% module must declare a ``for-syntax'' dependency on the value
|
|
% definition's module.
|
|
% %
|
|
% Scoping rules prevent access from macros to undeclared run-time
|
|
% dependencies, and the compiler creates separate instantiations of
|
|
% declared dependencies to prevent interference across separate
|
|
% compilations.
|
|
|
|
\subsection{Split environments}
|
|
|
|
Syntactically, a module declaration contains a
|
|
module reference specifying the language that the module is written
|
|
in, the module's name, and a sequence of definitions and expressions.
|
|
In our examples, the module's name is left implicit, and provided in
|
|
a comment. In the PLT Scheme implementation, the name is taken from
|
|
the filename.
|
|
\begin{schemedisplay}
|
|
hlang initial-language ;; module-name
|
|
module-contents $\cdots$
|
|
\end{schemedisplay}
|
|
Denotationally, a module consists of two code parts (plus a dependency
|
|
specification): a compile-time
|
|
component and a run-time component. The compile-time part consists of
|
|
the syntax definitions. The run-time part consists of ordinary
|
|
definitions and expressions.
|
|
|
|
The compiler keeps separate environments for the compile-time
|
|
expressions and run-time expressions. If a module defines a procedure
|
|
as a run-time value, a macro transformer in the same module cannot
|
|
\emph{use} that procedure; the binding is unavailable in the
|
|
compile-time phase. The macro can, of course, expand into code that
|
|
\emph{refers} to the procedure. Likewise, a binding in the
|
|
compile-time phase cannot be used in the run-time phase. This phase
|
|
separation permits the compiler to compile a module without also
|
|
executing its entire contents.%
|
|
\footnote{The same name may have (possibly distinct) meanings in both phases
|
|
simultaneously. For example, modules written in the \scheme{scheme}
|
|
language automatically import all primitive bindings into both
|
|
phases.}
|
|
|
|
The two environments yield two kinds of module dependencies and thus two
|
|
distinct module import forms. The plain \scheme{require} form imports
|
|
bindings into the environment for run-time expressions, and the
|
|
\scheme{for-syntax} variant imports bindings into the
|
|
environment for compile-time expressions.
|
|
|
|
\begin{figure}
|
|
\begin{schemedisplay}
|
|
langs ;; macro-util
|
|
(provide check-for-duplicate-identifier)
|
|
(define (check-for-duplicate-identifier ids) ELIDED)
|
|
|
|
langs ;; rec
|
|
(require (for-syntax macro-util))
|
|
(define-syntax (recur stx)
|
|
(syntax-case stx ()
|
|
[(recur name ([var init] ...) . body)
|
|
(begin
|
|
(check-for-duplicate-identifier #'(var ...))
|
|
#'(letrec ([name (lambda (var ...) . body)])
|
|
(name init ...)))]))
|
|
(define (build-list n f)
|
|
(recur loop ([i 0])
|
|
(if (< i n)
|
|
(cons (f i) (loop (+ i 1)))
|
|
null)))
|
|
\end{schemedisplay}
|
|
\caption{Four kinds of references}
|
|
\label{fig:four-references}
|
|
\end{figure}
|
|
|
|
Macros bridge the gap between the two phases. The implementation of a
|
|
macro is a compile-time expression, but the macro definition extends
|
|
the environment for run-time expressions. To understand this idea, it
|
|
is important to distinguish between the notions of macro versus value
|
|
bindings from the notions of environments for compile-time versus run-time
|
|
expressions.
|
|
|
|
The modules in Figure~\ref{fig:four-references} illustrate the four
|
|
different possibilities.
|
|
%
|
|
In the context of the \scheme{rec} module,
|
|
\scheme{check-for-duplicate-identifier} is a value binding in the
|
|
compile-time environment; thus, it is available for use in the body of
|
|
the \scheme{recur} macro definition. Even though
|
|
\scheme{check-duplicate-identifier} is a ``compile-time procedure,''
|
|
it is not a macro. In fact, it cannot be used in run-time expressions
|
|
at all.
|
|
%
|
|
In contrast, \scheme{recur} is a macro binding in the run-time
|
|
environment. It is bound to a compile-time value, but the
|
|
binding is available to run-time expressions such as the definition of
|
|
\scheme{build-list}.
|
|
%
|
|
The occurrence of \scheme{syntax-case} refers to a macro binding in
|
|
the compile-time environment. Finally, the definition of
|
|
\scheme{build-list} creates a value binding in the run-time
|
|
environment.
|
|
|
|
Compilation of a module involves executing its dependencies%
|
|
\footnote{If the module depends on modules that are not already
|
|
compiled, they are automatically compiled when the dependency is
|
|
detected.} and expanding uses of macros in the module's body. The
|
|
dependencies include the compile-time part of the module's initial
|
|
language module, the compile-time part of every module imported with
|
|
\scheme{require}, and both compile-time and run-time parts of every
|
|
module imported with \scheme{for-syntax} inside \scheme|require|.
|
|
|
|
The rules for compilation (and also for invoking a module's compile
|
|
time part) are as follows:
|
|
\begin{itemize}
|
|
\item
|
|
For every \scheme{require} import, including the initial language
|
|
module, invoke that module's compile-time part in the same phase.
|
|
\item
|
|
For every \scheme{for-syntax} import, invoke that module's
|
|
compile-time and run-time parts in the next higher phase.
|
|
\end{itemize}
|
|
If a module is imported twice, once with plain \scheme{require} and once with
|
|
\scheme{for-syntax}, the two corresponding invocations of the
|
|
module are separate. They do not share mutable state. The module
|
|
system uses phase numbers to distinguish the different instances.
|
|
%
|
|
Finally, a module is invoked only once per phase, per
|
|
compilation. Multiple modules that depend on a single module in the
|
|
same phase share a single invocation of that module and its state.
|
|
|
|
\subsection{Compilation independence}
|
|
|
|
True separate compilation is impossible in a module system that
|
|
supports the import and export of macros. Instead, the module system
|
|
has a principle of compilation independence:
|
|
\begin{quote}
|
|
Compiling a module depends only on the compiled forms of the
|
|
modules that it (transitively) requires.
|
|
\end{quote}
|
|
This principle has two consequences:
|
|
\begin{itemize}
|
|
\item The compilation of two modules, neither of which transitively
|
|
requires the other, should produce the same two results no matter
|
|
which is compiled first, or whether they are compiled in parallel.
|
|
\item The compilation of a module does not depend on side effects that
|
|
occurred during the compilation of modules that it transitively
|
|
requires. This has important implications for the use of
|
|
side-effects at compile time.
|
|
\end{itemize}
|
|
|
|
The compiler effectively creates a new store for each module
|
|
that it compiles. Each compilation gets a new execution of all
|
|
supporting module code.
|
|
%
|
|
Since the result of the compilation process is nothing but a body of
|
|
code, the states of mutable variables and objects created during the
|
|
compilation process of any module are discarded at the end.
|
|
|
|
The pair of modules in figure~\ref{fig:side-effect}
|
|
illustrates the interaction between
|
|
side-effects and compilation.
|
|
\begin{figure}
|
|
\begin{schemedisplay}
|
|
langs ;; storage
|
|
(define storage '())
|
|
(define (add! x) (set! storage (cons x storage)))
|
|
(provide storage add!)
|
|
|
|
langs ;; memory
|
|
(require (for-syntax storage))
|
|
(define-syntax (remember stx)
|
|
(syntax-case stx ()
|
|
[(remember sym)
|
|
(begin (add! (syntax->datum #'sym))
|
|
(with-syntax ([syms storage])
|
|
#`(begin (display (quote syms))
|
|
(newline))))]))
|
|
(remember a)
|
|
(remember b)
|
|
\end{schemedisplay}
|
|
\caption{Side Effects and Compilation}
|
|
\label{fig:side-effect}
|
|
\end{figure}
|
|
The first module defines two variables. The second module accesses the
|
|
variables at compile time, so it imports the first module via
|
|
\scheme{for-syntax}. It defines a \scheme{remember} macro that
|
|
adds a symbol to the remembered list and generates code to print out
|
|
the updated list of remembered symbols. Then it uses the macro
|
|
twice. At the end of compiling the \scheme{memory} module, the
|
|
\scheme{storage} variable has the value \schemeresult{(b a)}. Executing the
|
|
\scheme{memory} module prints out the lists \schemeresult{(a)} and
|
|
\schemeresult{(b a)}, as expected.
|
|
|
|
Consider the following addition to the program:
|
|
\begin{schemedisplay}
|
|
langs ;; inspect-storage
|
|
(require storage)
|
|
(require memory)
|
|
(display storage) (newline)
|
|
\end{schemedisplay}
|
|
When this module is executed, the last line it prints out is
|
|
\schemeresult{()}, not \schemeresult{(b a)}, because the
|
|
\emph{run-time} instance of the \scheme{storage} module is distinct
|
|
from the \emph{compile-time} instance. That is, side-effects do not
|
|
cross phases.
|
|
|
|
Now consider this further addition to the program:
|
|
\begin{schemedisplay}
|
|
langs ;; more
|
|
(require memory)
|
|
(remember c)
|
|
\end{schemedisplay}
|
|
When this module is executed, the last line it prints out is
|
|
\schemeresult{(c)}, not \schemeresult{(c b a)}.
|
|
%
|
|
% This result often surprises macro programmers. Many of them expect the
|
|
% final line to be \schemeresult{(c b a)}. It seems to them as if the
|
|
% effects in \scheme{memory} occur and are subsequently unwound behind
|
|
% their backs. Programming with compile-time side effects can result in
|
|
% unexpected behavior---or lack of behavior---unless programmers
|
|
% recognize the forgetful nature of the compilation process.
|
|
%
|
|
The reason that the \scheme{(remember c)} in \scheme{more} prints just
|
|
\schemeresult{(c)} is that \scheme{more} was compiled with a fresh
|
|
instance of \scheme{storage} (initially the empty list), and because
|
|
executing the compile-time part of \scheme{memory} does not change
|
|
that value. The variable is updated during \emph{macro expansion};
|
|
the side-effects are not present in the compiled form of
|
|
\scheme{memory}:
|
|
\begin{schemedisplay}
|
|
(compiled-module memory
|
|
(require scheme)
|
|
(require (for-syntax storage))
|
|
(define-syntax (remember stx) ELIDED)
|
|
(begin (display '(a)) (newline))
|
|
(begin (display '(b a)) (newline)))
|
|
\end{schemedisplay}
|
|
\input{fig-module-instances}
|
|
Figure~\ref{fig:module-invocations} shows all of the module invocations
|
|
involved in compiling and executing the program \scheme|more|. Each box represents
|
|
a module invocation, and the text at the bottom of each box indicates
|
|
what parts of the module are executed. Each column represents a shared
|
|
store; effects in one column are not visible in another column.
|
|
|
|
The furthest left column simply represents the compilation of
|
|
\scheme|storage|---this module has no \scheme|for-syntax|
|
|
dependencies, and so its compilation triggers no computation in other
|
|
modules. The second column is the compilation of \scheme|memory|,
|
|
which requires first running the compile-time portions of the
|
|
\scheme|storage| module, since \scheme|memory| requires
|
|
\scheme|storage| \scheme|for-syntax|,
|
|
then expanding any macros in the \scheme|memory| module. The first two
|
|
columns are performed since \scheme|storage| and \scheme|memory| are
|
|
both dependencies of \scheme|more|.
|
|
Third, the \scheme|more| module is compiled. This requires running
|
|
the compile-time portion of \scheme|memory| (which is
|
|
\scheme|require|d by \scheme|more|) and therefore the compile- and run-time
|
|
portions of \scheme|storage| (which is \scheme|require|d
|
|
\scheme|for-syntax| by \scheme|memory|). Finally, the fourth column
|
|
is the final runtime, which invokes both the compile- and run-time
|
|
portions of \scheme|more| and \scheme|memory|, as well as
|
|
\scheme|storage|.
|
|
|
|
\subsection{Persistent effects}
|
|
\label{sect:syntax:persistent}
|
|
|
|
The compilation rules of the module system require the development of
|
|
a design pattern for expressing persistent effects.
|
|
%
|
|
Since compile-time side effects are transient, only the code in the compiled
|
|
module is permanent. Thus, the way to express a persistent effect is
|
|
to make it part of the module:
|
|
\begin{schemedisplay}
|
|
langs ;; memory.v2
|
|
(require (for-syntax storage))
|
|
(define-syntax (storage-now stx)
|
|
(syntax-case stx ()
|
|
[(storage-here)
|
|
(with-syntax ([syms storage])
|
|
#'(quote syms))]))
|
|
(define-syntax (remember stx)
|
|
(syntax-case stx ()
|
|
[(remember sym)
|
|
#'(begin (define-syntax _ (add! (quote sym)))
|
|
(display (storage-now))
|
|
(newline))]))
|
|
(remember a)
|
|
(remember b)
|
|
\end{schemedisplay}
|
|
The effect of adding new symbols to the \scheme|storage| variable
|
|
is not executed within the macro, but the macro expander
|
|
executes the resulting \scheme{define-syntax} form when it continues
|
|
expanding the module body, so the effect of the first addition to the
|
|
list still occurs before the second \scheme{remember} is
|
|
expanded. This version introduces a helper macro,
|
|
\scheme{storage-now}, to retrieve the value of \scheme{storage} after
|
|
the update.
|
|
|
|
Since the compile-time part of a compiled module includes all of the
|
|
macro definitions, the side-effect is preserved:
|
|
\begin{schemedisplay}
|
|
(compiled-module memory.v2
|
|
(require scheme)
|
|
(require (for-syntax storage))
|
|
(define-syntax (storage-now stx) ELIDED)
|
|
(define-syntax (remember stx) ELIDED)
|
|
(define-syntax _1 (add! 'a))
|
|
(display '(a)) (newline)
|
|
(define-syntax _2 (add! 'b))
|
|
(display '(b a)) (newline))
|
|
\end{schemedisplay}
|
|
The calls to \scheme{add!} are executed whenever
|
|
\scheme{memory} is required for the compilation of another
|
|
module. Thus they are executed when \scheme{more.v2} is compiled
|
|
(refer back to Figure~\ref{fig:module-invocations}), so the storage is
|
|
already set to \scheme{(b a)} when the use of \scheme{remember} in
|
|
\scheme{more} is expanded. Thus, executing the new version of
|
|
the program prints \scheme{(c b a)}.
|
|
|
|
As a matter of readability, the \scheme{begin-for-syntax} form
|
|
accomplishes the same effect as the awkward use of
|
|
\scheme{define-syntax} with a throw-away name. Using
|
|
\scheme{begin-for-syntax} also explicitly signals the programmer's
|
|
intent to generate an expression that creates a persistent effect.
|
|
|
|
%In summary:
|
|
%\begin{itemize}
|
|
%\item Persistent effects must be part of the compiled module.
|
|
%\item \scheme{require} forms indicate dependence, not just the import
|
|
% of names.
|
|
%\item There is no way for one module to affect another module that
|
|
% doesn't depend on it.
|
|
%\end{itemize}
|
|
\end{schemeregion}
|