racket/collects/scribblings/inside/overview.scrbl
2008-07-08 15:31:06 +00:00

592 lines
23 KiB
Racket

#lang scribble/doc
@(require "utils.ss")
@title[#:tag "overview"]{Overview}
@section{CGC versus 3m}
Before mixing any C code with MzScheme, first decide whether to use
the @bold{3m} variant of PLT Scheme, the @bold{CGC} variant of PLT
Scheme, or both:
@itemize{
@item{@bold{@as-index{3m}} : the main variant of PLT Scheme, which
uses @defterm{precise} garbage collection instead of conservative
garbage collection, and it may move objects in memory during a
collection.}
@item{@bold{@as-index{CGC}} : the original variant of PLT Scheme,
where memory management depends on a @defterm{conservative} garbage
collector. The conservative garbage collector can automatically find
references to managed values from C local variables and (on some
platforms) static variables.}
}
At the C level, working with CGC can be much easier than working with
3m, but overall system performance is typically better with 3m.
@; ----------------------------------------------------------------------
@section{Writing MzScheme Extensions}
@section-index["extending MzScheme"]
The process of creating an extension for 3m or CGC is essentially the
same, but the process for 3m is most easily understood as a variant of
the process for CGC.
@subsection{CGC Extensions}
To write a C/C++-based extension for PLT Scheme CGC, follow these
steps:
@itemize{
@item{@index['("header files")]{For} each C/C++ file that uses PLT
Scheme library functions, @cpp{#include} the file
@as-index{@filepath{escheme.h}}.
This file is distributed with the PLT software in an
@filepath{include} directory, but if @|mzc| is used to compile, this
path is found automatically.}
@item{Define the C function @cppi{scheme_initialize}, which takes a
@cpp{Scheme_Env*} namespace (see @secref["im:env"]) and returns a
@cpp{Scheme_Object*} Scheme value.
This initialization function can install new global primitive
procedures or other values into the namespace, or it can simply
return a Scheme value. The initialization function is called when the
extension is loaded with @scheme[load-extension] (the first time);
the return value from @cpp{scheme_initialize} is used as the return
value for @scheme[load-extension]. The namespace provided to
@cpp{scheme_initialize} is the current namespace when
@scheme[load-extension] is called.}
@item{Define the C function @cppi{scheme_reload}, which has the same
arguments and return type as @cpp{scheme_initialize}.
This function is called if @scheme[load-extension] is called a second
time (or more times) for an extension. Like @cpp{scheme_initialize},
the return value from this function is the return value for
@scheme[load-extension].}
@item{Define the C function @cppi{scheme_module_name}, which takes
no arguments and returns a @cpp{Scheme_Object*} value, either a
symbol or @cpp{scheme_false}.
The function should return a symbol when the effect of calling
@cpp{scheme_initialize} and @cpp{scheme_reload} is only to declare
a module with the returned name. This function is called when the
extension is loaded to satisfy a @scheme[require] declaration.
The @cpp{scheme_module_name} function may be called before
@cpp{scheme_initialize} and @cpp{scheme_reload}, after those
functions, or both before and after, depending on how the extension
is loaded and re-loaded.}
@item{Compile the extension C/C++ files to create platform-specific
object files.
The @as-index{@|mzc|} compiler, which is distributed with PLT Scheme,
compiles plain C files when the @as-index{@DFlag{cc}} flag is
specified. More precisely, @|mzc| does not compile the files itself,
but it locates a C compiler on the system and launches it with the
appropriate compilation flags. If the platform is a relatively
standard Unix system, a Windows system with either Microsoft's C
compiler or @exec{gcc} in the path, or a Mac OS X system with Apple's
developer tools installed, then using @|mzc| is typically easier than
working with the C compiler directly. Use the @as-index{@DFlag{cgc}}
flag to indicate that the build is for use with PLT Scheme CGC.}
@item{Link the extension C/C++ files with
@as-index{@filepath{mzdyn.o}} (Unix, Mac OS X) or
@as-index{@filepath{mzdyn.obj}} (Windows) to create a shared object. The
resulting shared object should use the extension @filepath{.so} (Unix),
@filepath{.dll} (Windows), or @filepath{.dylib} (Mac OS X).
The @filepath{mzdyn} object file is distributed in the installation's
@filepath{lib} directory. For Windows, the object file is in a
compiler-specific sub-directory of @filepath{plt\lib}.
The @|mzc| compiler links object files into an extension when the
@as-index{@DFlag{ld}} flag is specified, automatically locating
@filepath{mzdyn}. Again, use the @DFlag{cgc} flag with @|mzc|.}
@item{Load the shared object within Scheme using
@scheme[(load-extension _path)], where @scheme[_path] is the name of
the extension file generated in the previous step.
Alternately, if the extension defines a module (i.e.,
@cpp{scheme_module_name} returns a symbol), then place the shared
object in a special directory with a special name, so that it is
detected by the module loader when @scheme[require] is used. The
special directory is a platform-specific path that can be obtained by
evaluating @scheme[(build-path "compiled" "native"
(system-library-subpath))]; see @scheme[load/use-compiled] for more
information. For example, if the shared object's name is
@filepath{example_ss.dll}, then @scheme[(require "example.ss")] will
be redirected to @filepath{example_ss.dll} if the latter is placed in
the sub-directory @scheme[(build-path "compiled" "native"
(system-library-subpath))] and if @filepath{example.ss} does not
exist or has an earlier timestamp.
Note that @scheme[(load-extension _path)] within a @scheme[module]
does @italic{not} introduce the extension's definitions into the
module, because @scheme[load-extension] is a run-time operation. To
introduce an extension's bindings into a module, make sure that the
extension defines a module, put the extension in the
platform-specific location as described above, and use
@scheme[require].}
}
@index['("allocation")]{@bold{IMPORTANT:}} With PLT Scheme CGC, Scheme
values are garbage collected using a conservative garbage collector,
so pointers to Scheme objects can be kept in registers, stack
variables, or structures allocated with @cppi{scheme_malloc}. However,
static variables that contain pointers to collectable memory must be
registered using @cppi{scheme_register_extension_global} (see
@secref["im:memoryalloc"]).
As an example, the following C code defines an extension that returns
@scheme["hello world"] when it is loaded:
@verbatim[#:indent 2]{
#include "escheme.h"
Scheme_Object *scheme_initialize(Scheme_Env *env) {
return scheme_make_utf8_string("hello world");
}
Scheme_Object *scheme_reload(Scheme_Env *env) {
return scheme_initialize(env); /* Nothing special for reload */
}
Scheme_Object *scheme_module_name() {
return scheme_false;
}
}
Assuming that this code is in the file @filepath{hw.c}, the extension
is compiled under Unix with the following two commands:
@commandline{mzc --cgc --cc hw.c}
@commandline{mzc --cgc --ld hw.so hw.o}
(Note that the @DFlag{cgc}, @DFlag{cc}, and @DFlag{ld} flags are each
prefixed by two dashes, not one.)
The @filepath{collects/mzscheme/examples} directory in the PLT
distribution contains additional examples.
@subsection{3m Extensions}
To build an extension to work with PLT Scheme 3m, the CGC instructions
must be extended as follows:
@itemize{
@item{Adjust code to cooperate with the garbage collector as
described in @secref["im:3m"]. Using @|mzc| with the
@as-index{@DFlag{xform}} might convert your code to implement part of
the conversion, as described in @secref["im:3m:mzc"].}
@item{In either your source in the in compiler command line,
@cpp{#define} @cpp{MZ_PRECISE_GC} before including
@filepath{escheme.h}. When using @|mzc| with the @DFlag{cc} and
@as-index{@DFlag{3m}} flags, @cpp{MZ_PRECISE_GC} is automatically
defined.}
@item{Link with @as-index{@filepath{mzdyn3m.o}} (Unix, Mac OS X) or
@as-index{@filepath{mzdyn3m.obj}} (Windows) to create a shared
object. When using @|mzc|, use the @DFlag{ld} and @DFlag{3m} flags
to link to these libraries.}
}
For a relatively simple extension @filepath{hw.c}, the extension is
compiled under Unix for 3m with the following three commands:
@commandline{mzc --xform --cc hw.c}
@commandline{mzc --3m --cc hw.3m.c}
@commandline{mzc --3m --ld hw.so hw.o}
Some examples in @filepath{collects/mzscheme/examples} work with
MzScheme3m in this way. A few examples are manually instrumented, in
which case the @DFlag{xform} step should be skipped.
@; ----------------------------------------------------------------------
@section[#:tag "embedding"]{Embedding MzScheme into a Program}
@section-index["embedding MzScheme"]
Like creating extensions, the embedding process for PLT Scheme CGC or
PLT Scheme 3m is essentially the same, but the process for PLT Scheme
3m is most easily understood as a variant of the process for
PLT Scheme CGC.
@subsection{CGC Embedding}
To embed PLT Scheme CGC in a program, follow these steps:
@itemize{
@item{Locate or build the PLT Scheme CGC libraries. Since the
standard distribution provides 3m libraries, only, you will most
likely have to build from source.
Under Unix, the libraries are @as-index{@filepath{libmzscheme.a}}
and @as-index{@filepath{libmzgc.a}} (or
@as-index{@filepath{libmzscheme.so}} and
@as-index{@filepath{libmzgc.so}} for a dynamic-library build, with
@as-index{@filepath{libmzscheme.la}} and
@as-index{@filepath{libmzgc.la}} files for use with
@exec{libtool}). Building from source and installing places the
libraries into the installation's @filepath{lib} directory. Be sure
to build the CGC variant, since the default is 3m.
Under Windows, stub libraries for use with Microsoft tools are
@filepath{libmzsch@italic{x}.lib} and
@filepath{libmzgc@italic{x}.lib} (where @italic{x} represents the
version number) are in a compiler-specific directory in
@filepath{plt\lib}. These libraries identify the bindings that are
provided by @filepath{libmzsch@italic{x}.dll} and
@filepath{libmzgc@italic{x}.dll} --- which are typically installed
in @filepath{plt\lib}. When linking with Cygwin, link to
@filepath{libmzsch@italic{x}.dll} and
@filepath{libmzgc@italic{x}.dll} directly. At run time, either
@filepath{libmzsch@italic{x}.dll} and
@filepath{libmzgc@italic{x}.dll} must be moved to a location in the
standard DLL search path, or your embedding application must
``delayload'' link the DLLs and explicitly load them before
use. (@filepath{MzScheme.exe} and @filepath{MrEd.exe} use the latter
strategy.)
Under Mac OS X, dynamic libraries are provided by the
@filepath{PLT_MzScheme} framework, which is typically installed in
@filepath{lib} sub-directory of the installation. Supply
@exec{-framework PLT_MzScheme} to @exec{gcc} when linking, along
with @exec{-F} and a path to the @filepath{lib} directory. Beware
that CGC and 3m libraries are installed as different versions within
a single framework, and installation marks one version or the other
as the default (by setting symbolic links); install only CGC to
simplify accessing the CGC version within the framework. At run
time, either @filepath{PLT_MzScheme.framework} must be moved to a
location in the standard framework search path, or your embedding
executable must provide a specific path to the framework (possibly
an executable-relative path using the Mach-O @tt["@executable_path"]
prefix).}
@item{For each C/C++ file that uses MzScheme library functions,
@cpp{#include} the file @as-index{@filepath{scheme.h}}.
The C preprocessor symbol @cppi{SCHEME_DIRECT_EMBEDDED} is defined
as @cpp{1} when @filepath{scheme.h} is @cpp{#include}d, or as
@cpp{0} when @filepath{escheme.h} is @cpp{#include}d.
The @filepath{scheme.h} file is distributed with the PLT software in
the installation's @filepath{include} directory. Building and
installing from source also places this file in the installation's
@filepath{include} directory.}
@item{Start your main program through the @cpp{scheme_main_setup} (or
@cpp{scheme_main_stack_setup}) trampoline, and put all uses of
MzScheme functions inside the function passed to
@cpp{scheme_main_setup}. The @cpp{scheme_main_setup} function
registers the current C stack location with the memory manager. It
also creates the initial namespace @cpp{Scheme_Env*} by calling
@cppi{scheme_basic_env} and passing the result to the function
provided to @cpp{scheme_main_setup}. (The
@cpp{scheme_main_stack_setup} trampoline registers the C stack with
the memory manager without creating a namespace.)}
@item{Configure the namespace by adding module declarations. The
initial namespace contains declarations only for a few primitive
modules, such as @scheme['#%kernel], and no bindings are imported
into the top-level environment.
To embed a module like @schememodname[scheme/base] (along with all
its dependencies), use @exec{mzc --c-mods}, which generates a C file
that contains modules in bytecode form as encapsulated in a static
array. The generated C file defines a @cppi{declare_modules}
function that takes a @cpp{Scheme_Env*}, installs the modules into
the environment, and adjusts the module name resolver to access the
embedded declarations.
Alternately, use @cpp{scheme_set_collects_path} and
@cpp{scheme_init_collection_paths} to configure and install a path
for finding modules at run time.}
@item{Access Scheme through @cppi{scheme_dynamic_require},
@cppi{scheme_load}, @cppi{scheme_eval}, and/or other functions
described in this manual.}
@item{Compile the program and link it with the MzScheme libraries.}
}
@index['("allocation")]{With} PLT Scheme CGC, Scheme values are
garbage collected using a conservative garbage collector, so pointers
to Scheme objects can be kept in registers, stack variables, or
structures allocated with @cppi{scheme_malloc}. In an embedding
application on some platforms, static variables are also automatically
registered as roots for garbage collection (but see notes below
specific to Mac OS X and Windows).
For example, the following is a simple embedding program which
evaluates all expressions provided on the command line and displays
the results, then runs a @scheme[read]-@scheme[eval]-@scheme[print]
loop. Run
@commandline{mzc --c-mods base.c ++lib scheme/base}
to generate @filepath{base.c}, which encapsulates @scheme[scheme/base]
and all of its transitive imports (so that they need not be found
separately a run time).
@verbatim[#:indent 2]{
#include "scheme.h"
#include "base.c"
static int run(Scheme_Env *e, int argc, char *argv[])
{
Scheme_Object *curout;
int i;
mz_jmp_buf * volatile save, fresh;
/* Declare embedded modules in "base.c": */
declare_modules(e);
scheme_namespace_require(scheme_intern_symbol("scheme/base"));
curout = scheme_get_param(scheme_current_config(),
MZCONFIG_OUTPUT_PORT);
for (i = 1; i < argc; i++) {
save = scheme_current_thread->error_buf;
scheme_current_thread->error_buf = &fresh;
if (scheme_setjmp(scheme_error_buf)) {
scheme_current_thread->error_buf = save;
return -1; /* There was an error */
} else {
Scheme_Object *v, *a[2];
v = scheme_eval_string(argv[i], e);
scheme_display(v, curout);
scheme_display(scheme_make_char('\n'), curout);
/* read-eval-print loop, uses initial Scheme_Env: */
a[0] = scheme_intern_symbol("scheme/base");
a[1] = scheme_intern_symbol("read-eval-print-loop");
scheme_apply(scheme_dynamic_require(2, a), 0, NULL);
scheme_current_thread->error_buf = save;
}
}
return 0;
}
int main(int argc, char *argv[])
{
return scheme_main_setup(1, run, argc, argv);
}
}
Under Mac OS X, or under Windows when MzScheme is compiled to a DLL
using Cygwin, the garbage collector cannot find static variables
automatically. In that case, @cppi{scheme_main_setup} must be called with a
non-zero first argument.
Under Windows (for any other build mode), the garbage collector finds
static variables in an embedding program by examining all memory
pages. This strategy fails if a program contains multiple Windows
threads; a page may get unmapped by a thread while the collector is
examining the page, causing the collector to crash. To avoid this
problem, call @cpp{scheme_main_setup} with a non-zero first argument.
When an embedding application calls @cpp{scheme_main_setup} with a non-zero
first argument, it must register each of its static variables with
@cppi{MZ_REGISTER_STATIC} if the variable can contain a GCable
pointer. For example, if @cpp{curout} above is made @cpp{static}, then
@cpp{MZ_REGISTER_STATIC(curout)} should be inserted before the call to
@cpp{scheme_get_param}.
When building an embedded MzSchemeCGC to use SenoraGC (SGC) instead of
the default collector, @cpp{scheme_main_setup} must be called with a
non-zero first argument. See @secref["im:memoryalloc"] for more
information.
@subsection{3m Embedding}
MzScheme3m can be embedded mostly the same as MzScheme, as long as the
embedding program cooperates with the precise garbage collector as
described in @secref["im:3m"].
In either your source in the in compiler command line, @cpp{#define}
@cpp{MZ_PRECISE_GC} before including @filepath{scheme.h}. When using
@|mzc| with the @DFlag{cc} and @DFlag{3m} flags, @cpp{MZ_PRECISE_GC}
is automatically defined.
In addition, some library details are different:
@itemize{
@item{Under Unix, the library is just
@as-index{@filepath{libmzscheme3m.a}} (or
@as-index{@filepath{libmzscheme3m.so}} for a dynamic-library build,
with @as-index{@filepath{libmzscheme3m.la}} for use with
@exec{libtool}). There is no separate library for 3m analogous to
CGC's @filepath{libmzgc.a}.}
@item{Under Windows, the stub library for use with Microsoft tools is
@filepath{libmzsch3m@italic{x}.lib} (where @italic{x} represents the
version number). This library identifies the bindings that are
provided by @filepath{libmzsch3m@italic{x}.dll}. There is no
separate library for 3m analogous to CGC's
@filepath{libmzgc@italic{x}.lib}.}
@item{Under Mac OS X, 3m dynamic libraries are provided by the
@filepath{PLT_MzScheme} framework, just as for CGC, but as a version
suffixed with @filepath{_3m}.}
}
For MzScheme3m, an embedding application must call @cpp{scheme_main_setup}
with a non-zero first argument.
The simple embedding program from the previous section can be
processed by @exec{mzc --xform}, then compiled and linked with
MzScheme3m. Alternately, the source code can be extended to work with
either CGC or 3m depending on whether @cpp{MZ_PRECISE_GC} is defined
on the compiler's command line:
@verbatim[#:indent 2]{
#include "scheme.h"
#include "base.c"
static int run(Scheme_Env *e, int argc, char *argv[])
{
Scheme_Object *curout = NULL, *v = NULL, *a[2] = {NULL, NULL};
Scheme_Config *config = NULL;
int i;
mz_jmp_buf * volatile save = NULL, fresh;
MZ_GC_DECL_REG(8);
MZ_GC_VAR_IN_REG(0, e);
MZ_GC_VAR_IN_REG(1, curout);
MZ_GC_VAR_IN_REG(2, save);
MZ_GC_VAR_IN_REG(3, config);
MZ_GC_VAR_IN_REG(4, v);
MZ_GC_ARRAY_VAR_IN_REG(5, a, 2);
MZ_GC_REG();
declare_modules(e);
v = scheme_intern_symbol("scheme/base");
scheme_namespace_require(v);
config = scheme_current_config();
curout = scheme_get_param(config, MZCONFIG_OUTPUT_PORT);
for (i = 1; i < argc; i++) {
save = scheme_current_thread->error_buf;
scheme_current_thread->error_buf = &fresh;
if (scheme_setjmp(scheme_error_buf)) {
scheme_current_thread->error_buf = save;
return -1; /* There was an error */
} else {
v = scheme_eval_string(argv[i], e);
scheme_display(v, curout);
v = scheme_make_char('\n');
scheme_display(v, curout);
/* read-eval-print loop, uses initial Scheme_Env: */
a[0] = scheme_intern_symbol("scheme/base");
a[1] = scheme_intern_symbol("read-eval-print-loop");
v = scheme_dynamic_require(2, a);
scheme_apply(v, 0, NULL);
scheme_current_thread->error_buf = save;
}
}
MZ_GC_UNREG();
return 0;
}
int main(int argc, char *argv[])
{
return scheme_main_setup(1, run, argc, argv);
}
}
Strictly speaking, the @cpp{config} and @cpp{v} variables above need
not be registered with the garbage collector, since their values are
not needed across function calls that allocate. The code is much
easier to maintain, however, when all local variables are registered
and when all temporary values are put into variables.
@; ----------------------------------------------------------------------
@section{MzScheme and Threads}
MzScheme implements threads for Scheme programs without aid from the
operating system, so that MzScheme threads are cooperative from the
perspective of C code. Under Unix, stand-alone MzScheme uses a single
OS-implemented thread. Under Windows and Mac OS X, stand-alone
MzScheme uses a few private OS-implemented threads for background
tasks, but these OS-implemented threads are never exposed by the
MzScheme API.
In an embedding application, MzScheme can co-exist with additional
OS-implemented threads, but the additional OS threads must not call
any @cpp{scheme_} function. Only the OS thread that originally calls
@cpp{scheme_basic_env} can call @cpp{scheme_} functions. (This
restriction is stronger than saying all calls must be serialized
across threads. MzScheme relies on properties of specific threads to
avoid stack overflow and garbage collection.) When
@cpp{scheme_basic_env} is called a second time to reset the
interpreter, it can be called in an OS thread that is different from
the original call to @cpp{scheme_basic_env}. Thereafter, all calls to
@cpp{scheme_} functions must originate from the new thread.
See @secref["threads"] for more information about threads, including
the possible effects of MzScheme's thread implementation on extension
and embedding C code.
@; ----------------------------------------------------------------------
@section[#:tag "im:unicode"]{MzScheme, Unicode, Characters, and Strings}
A character in MzScheme is a Unicode code point. In C, a character
value has type @cppi{mzchar}, which is an alias for @cpp{unsigned} ---
which is, in turn, 4 bytes for a properly compiled MzScheme. Thus, a
@cpp{mzchar*} string is effectively a UCS-4 string.
Only a few MzScheme functions use @cpp{mzchar*}. Instead, most
functions accept @cpp{char*} strings. When such byte strings are to be
used as a character strings, they are interpreted as UTF-8
encodings. A plain ASCII string is always acceptable in such cases,
since the UTF-8 encoding of an ASCII string is itself.
See also @secref["im:strings"] and @secref["im:encodings"].
@; ----------------------------------------------------------------------
@section[#:tag "im:intsize"]{Integers}
MzScheme expects to be compiled in a mode where @cppi{short} is a
16-bit integer, @cppi{int} is a 32-bit integer, and @cppi{long} has
the same number of bits as @cpp{void*}. The @cppi{mzlonglong} type has
64 bits for compilers that support a 64-bit integer type, otherwise it
is the same as @cpp{long}; thus, @cpp{mzlonglong} tends to match
@cpp{long long}. The @cppi{umzlonglong} type is the unsigned version
of @cpp{mzlonglong}.