\begin{schemeregion} \section{Modules, or You Want it When, Again?} \label{sect:modules} The PLT Scheme module system~\cite{f:modules} allows programmers to group definitions, use imports and exports to control the scope of names, and specify the dependencies between modules. The presence of macros complicates the notion of dependence between modules. In the presence of procedural macros, a compiler must execute parts of a program in order to deal with the remainder of the program. This blurs the line between compilation and execution. % In particular, an interpreter may draw the line in a different place than the compiler, requiring programmers to debug their compiled program after they have already debugged their interpreted program. To eliminate this potential for inconsistency, the PLT Scheme module system require explicit module dependencies and, based on these, provides uniform behavior in both interactive and batch-compilation mode. % Modules are compilation units, and every module must be compiled % before it can be used. Modules contain declarations of their direct % dependencies. When a module is compiled, the module system uses those % declarations to determine the portions of existing modules that must % be executed to support the compilation of the current module. % % % If a macro transformer depends on a value definition, the macro's % module must declare a ``for-syntax'' dependency on the value % definition's module. % % % Scoping rules prevent access from macros to undeclared run-time % dependencies, and the compiler creates separate instantiations of % declared dependencies to prevent interference across separate % compilations. \subsection{Split environments} Syntactically, a module declaration contains a module reference specifying the language that the module is written in, the module's name, and a sequence of definitions and expressions. In our examples, the module's name is left implicit, and provided in a comment. In the PLT Scheme implementation, the name is taken from the filename. \begin{schemedisplay} hlang initial-language ;; module-name module-contents $\cdots$ \end{schemedisplay} Denotationally, a module consists of two code parts (plus a dependency specification): a compile-time component and a run-time component. The compile-time part consists of the syntax definitions. The run-time part consists of ordinary definitions and expressions. The compiler keeps separate environments for the compile-time expressions and run-time expressions. If a module defines a procedure as a run-time value, a macro transformer in the same module cannot \emph{use} that procedure; the binding is unavailable in the compile-time phase. The macro can, of course, expand into code that \emph{refers} to the procedure. Likewise, a binding in the compile-time phase cannot be used in the run-time phase. This phase separation permits the compiler to compile a module without also executing its entire contents.% \footnote{The same name may have (possibly distinct) meanings in both phases simultaneously. For example, modules written in the \scheme{scheme} language automatically import all primitive bindings into both phases.} The two environments yield two kinds of module dependencies and thus two distinct module import forms. The plain \scheme{require} form imports bindings into the environment for run-time expressions, and the \scheme{for-syntax} variant imports bindings into the environment for compile-time expressions. \begin{figure} \begin{schemedisplay} langs ;; macro-util (provide check-for-duplicate-identifier) (define (check-for-duplicate-identifier ids) ELIDED) langs ;; rec (require (for-syntax macro-util)) (define-syntax (recur stx) (syntax-case stx () [(recur name ([var init] ...) . body) (begin (check-for-duplicate-identifier #'(var ...)) #'(letrec ([name (lambda (var ...) . body)]) (name init ...)))])) (define (build-list n f) (recur loop ([i 0]) (if (< i n) (cons (f i) (loop (+ i 1))) null))) \end{schemedisplay} \caption{Four kinds of references} \label{fig:four-references} \end{figure} Macros bridge the gap between the two phases. The implementation of a macro is a compile-time expression, but the macro definition extends the environment for run-time expressions. To understand this idea, it is important to distinguish between the notions of macro versus value bindings from the notions of environments for compile-time versus run-time expressions. The modules in Figure~\ref{fig:four-references} illustrate the four different possibilities. % In the context of the \scheme{rec} module, \scheme{check-for-duplicate-identifier} is a value binding in the compile-time environment; thus, it is available for use in the body of the \scheme{recur} macro definition. Even though \scheme{check-duplicate-identifier} is a ``compile-time procedure,'' it is not a macro. In fact, it cannot be used in run-time expressions at all. % In contrast, \scheme{recur} is a macro binding in the run-time environment. It is bound to a compile-time value, but the binding is available to run-time expressions such as the definition of \scheme{build-list}. % The occurrence of \scheme{syntax-case} refers to a macro binding in the compile-time environment. Finally, the definition of \scheme{build-list} creates a value binding in the run-time environment. Compilation of a module involves executing its dependencies% \footnote{If the module depends on modules that are not already compiled, they are automatically compiled when the dependency is detected.} and expanding uses of macros in the module's body. The dependencies include the compile-time part of the module's initial language module, the compile-time part of every module imported with \scheme{require}, and both compile-time and run-time parts of every module imported with \scheme{for-syntax} inside \scheme|require|. The rules for compilation (and also for invoking a module's compile time part) are as follows: \begin{itemize} \item For every \scheme{require} import, including the initial language module, invoke that module's compile-time part in the same phase. \item For every \scheme{for-syntax} import, invoke that module's compile-time and run-time parts in the next higher phase. \end{itemize} If a module is imported twice, once with plain \scheme{require} and once with \scheme{for-syntax}, the two corresponding invocations of the module are separate. They do not share mutable state. The module system uses phase numbers to distinguish the different instances. % Finally, a module is invoked only once per phase, per compilation. Multiple modules that depend on a single module in the same phase share a single invocation of that module and its state. \subsection{Compilation independence} True separate compilation is impossible in a module system that supports the import and export of macros. Instead, the module system has a principle of compilation independence: \begin{quote} Compiling a module depends only on the compiled forms of the modules that it (transitively) requires. \end{quote} This principle has two consequences: \begin{itemize} \item The compilation of two modules, neither of which transitively requires the other, should produce the same two results no matter which is compiled first, or whether they are compiled in parallel. \item The compilation of a module does not depend on side effects that occurred during the compilation of modules that it transitively requires. This has important implications for the use of side-effects at compile time. \end{itemize} The compiler effectively creates a new store for each module that it compiles. Each compilation gets a new execution of all supporting module code. % Since the result of the compilation process is nothing but a body of code, the states of mutable variables and objects created during the compilation process of any module are discarded at the end. The pair of modules in figure~\ref{fig:side-effect} illustrates the interaction between side-effects and compilation. \begin{figure} \begin{schemedisplay} langs ;; storage (define storage '()) (define (add! x) (set! storage (cons x storage))) (provide storage add!) langs ;; memory (require (for-syntax storage)) (define-syntax (remember stx) (syntax-case stx () [(remember sym) (begin (add! (syntax->datum #'sym)) (with-syntax ([syms storage]) #`(begin (display (quote syms)) (newline))))])) (remember a) (remember b) \end{schemedisplay} \caption{Side Effects and Compilation} \label{fig:side-effect} \end{figure} The first module defines two variables. The second module accesses the variables at compile time, so it imports the first module via \scheme{for-syntax}. It defines a \scheme{remember} macro that adds a symbol to the remembered list and generates code to print out the updated list of remembered symbols. Then it uses the macro twice. At the end of compiling the \scheme{memory} module, the \scheme{storage} variable has the value \schemeresult{(b a)}. Executing the \scheme{memory} module prints out the lists \schemeresult{(a)} and \schemeresult{(b a)}, as expected. Consider the following addition to the program: \begin{schemedisplay} langs ;; inspect-storage (require storage) (require memory) (display storage) (newline) \end{schemedisplay} When this module is executed, the last line it prints out is \schemeresult{()}, not \schemeresult{(b a)}, because the \emph{run-time} instance of the \scheme{storage} module is distinct from the \emph{compile-time} instance. That is, side-effects do not cross phases. Now consider this further addition to the program: \begin{schemedisplay} langs ;; more (require memory) (remember c) \end{schemedisplay} When this module is executed, the last line it prints out is \schemeresult{(c)}, not \schemeresult{(c b a)}. % % This result often surprises macro programmers. Many of them expect the % final line to be \schemeresult{(c b a)}. It seems to them as if the % effects in \scheme{memory} occur and are subsequently unwound behind % their backs. Programming with compile-time side effects can result in % unexpected behavior---or lack of behavior---unless programmers % recognize the forgetful nature of the compilation process. % The reason that the \scheme{(remember c)} in \scheme{more} prints just \schemeresult{(c)} is that \scheme{more} was compiled with a fresh instance of \scheme{storage} (initially the empty list), and because executing the compile-time part of \scheme{memory} does not change that value. The variable is updated during \emph{macro expansion}; the side-effects are not present in the compiled form of \scheme{memory}: \begin{schemedisplay} (compiled-module memory (require scheme) (require (for-syntax storage)) (define-syntax (remember stx) ELIDED) (begin (display '(a)) (newline)) (begin (display '(b a)) (newline))) \end{schemedisplay} \input{fig-module-instances} Figure~\ref{fig:module-invocations} shows all of the module invocations involved in compiling and executing the program \scheme|more|. Each box represents a module invocation, and the text at the bottom of each box indicates what parts of the module are executed. Each column represents a shared store; effects in one column are not visible in another column. The furthest left column simply represents the compilation of \scheme|storage|---this module has no \scheme|for-syntax| dependencies, and so its compilation triggers no computation in other modules. The second column is the compilation of \scheme|memory|, which requires first running the compile-time portions of the \scheme|storage| module, since \scheme|memory| requires \scheme|storage| \scheme|for-syntax|, then expanding any macros in the \scheme|memory| module. The first two columns are performed since \scheme|storage| and \scheme|memory| are both dependencies of \scheme|more|. Third, the \scheme|more| module is compiled. This requires running the compile-time portion of \scheme|memory| (which is \scheme|require|d by \scheme|more|) and therefore the compile- and run-time portions of \scheme|storage| (which is \scheme|require|d \scheme|for-syntax| by \scheme|memory|). Finally, the fourth column is the final runtime, which invokes both the compile- and run-time portions of \scheme|more| and \scheme|memory|, as well as \scheme|storage|. \subsection{Persistent effects} \label{sect:syntax:persistent} The compilation rules of the module system require the development of a design pattern for expressing persistent effects. % Since compile-time side effects are transient, only the code in the compiled module is permanent. Thus, the way to express a persistent effect is to make it part of the module: \begin{schemedisplay} langs ;; memory.v2 (require (for-syntax storage)) (define-syntax (storage-now stx) (syntax-case stx () [(storage-here) (with-syntax ([syms storage]) #'(quote syms))])) (define-syntax (remember stx) (syntax-case stx () [(remember sym) #'(begin (define-syntax _ (add! (quote sym))) (display (storage-now)) (newline))])) (remember a) (remember b) \end{schemedisplay} The effect of adding new symbols to the \scheme|storage| variable is not executed within the macro, but the macro expander executes the resulting \scheme{define-syntax} form when it continues expanding the module body, so the effect of the first addition to the list still occurs before the second \scheme{remember} is expanded. This version introduces a helper macro, \scheme{storage-now}, to retrieve the value of \scheme{storage} after the update. Since the compile-time part of a compiled module includes all of the macro definitions, the side-effect is preserved: \begin{schemedisplay} (compiled-module memory.v2 (require scheme) (require (for-syntax storage)) (define-syntax (storage-now stx) ELIDED) (define-syntax (remember stx) ELIDED) (define-syntax _1 (add! 'a)) (display '(a)) (newline) (define-syntax _2 (add! 'b)) (display '(b a)) (newline)) \end{schemedisplay} The calls to \scheme{add!} are executed whenever \scheme{memory} is required for the compilation of another module. Thus they are executed when \scheme{more.v2} is compiled (refer back to Figure~\ref{fig:module-invocations}), so the storage is already set to \scheme{(b a)} when the use of \scheme{remember} in \scheme{more} is expanded. Thus, executing the new version of the program prints \scheme{(c b a)}. As a matter of readability, the \scheme{begin-for-syntax} form accomplishes the same effect as the awkward use of \scheme{define-syntax} with a throw-away name. Using \scheme{begin-for-syntax} also explicitly signals the programmer's intent to generate an expression that creates a persistent effect. %In summary: %\begin{itemize} %\item Persistent effects must be part of the compiled module. %\item \scheme{require} forms indicate dependence, not just the import % of names. %\item There is no way for one module to affect another module that % doesn't depend on it. %\end{itemize} \end{schemeregion}