racket/collects/scribblings/foreign/intro.scrbl
2012-08-12 20:17:29 -06:00

520 lines
18 KiB
Racket

#lang scribble/doc
@(require "utils.rkt"
scribble/racket
(for-syntax racket/base)
(for-label ffi/unsafe/define))
@(define-syntax _MEVENT (make-element-id-transformer
(lambda (stx) #'@schemeidfont{_MEVENT})))
@(define-syntax _MEVENT-pointer (make-element-id-transformer
(lambda (stx) #'@schemeidfont{_MEVENT-pointer})))
@(define-syntax _WINDOW-pointer (make-element-id-transformer
(lambda (stx) #'@schemeidfont{_WINDOW-pointer})))
@(define-syntax _mmask_t (make-element-id-transformer
(lambda (stx) #'@schemeidfont{_mmask_t})))
@title[#:tag "intro"]{Overview}
Although using the FFI requires writing no new C code, it provides
very little insulation against the issues that C programmers face
related to safety and memory management. An FFI programmer must be
particularly aware of memory management issues for data that spans the
Racket--C divide. Thus, this manual relies in many ways on the
information in @|InsideRacket|, which defines how Racket
interacts with C APIs in general.
Since using the FFI entails many safety concerns that Racket
programmers can normally ignore, the library name includes
@racketidfont{unsafe}. Importing the library macro should be
considered as a declaration that your code is itself unsafe, therefore
can lead to serious problems in case of bugs: it is your
responsibility to provide a safe interface. If your library provides
an unsafe interface, then it should have @racketidfont{unsafe} in its
name, too.
For more information on the motivation and design of the Racket FFI,
see @cite["Barzilay04"].
@; --------------------------------------------------
@section{Libraries, C Types, and Objects}
To use the FFI, you must have in mind
@itemlist[
@item{a particular library from which you want to access a function
or value, }
@item{a particular symbol exported by the file, and}
@item{the C-level type (typically a function type) of the exported
symbol.}
]
The library corresponds to a file with a suffix such as
@filepath{.dll}, @filepath{.so}, or @filepath{.dylib} (depending on
the platform), or it might be a library within a @filepath{.framework}
directory on Mac OS X.
Knowing the library's name and/or path is often the trickiest part of
using the FFI. Sometimes, when using a library name without a path
prefix or file suffix, the library file can be located automatically,
especially on Unix. See @racket[ffi-lib] for advice.
The @racket[ffi-lib] function gets a handle to a library. To extract
exports of the library, it's simplest to use
@racket[define-ffi-definer] from the @racketmodname[ffi/unsafe/define]
library:
@racketmod[
racket/base
(require ffi/unsafe
ffi/unsafe/define)
(define-ffi-definer define-curses (ffi-lib "libcurses"))
]
This @racket[define-ffi-definer] declaration introduces a
@racket[define-curses] form for binding a Racket name to a value
extracted from @filepath{libcurses}---which might be located
at @filepath{/usr/lib/libcurses.so}, depending on
the platform.
To use @racket[define-curses], we need the names and C types of
functions from @filepath{libcurses}. We'll start by using the
following functions:
@verbatim[#:indent 2]{
WINDOW* initscr(void);
int waddstr(WINDOW *win, char *str);
int wrefresh(WINDOW *win);
int endwin(void);
}
We make these functions callable from Racket as follows:
@margin-note{By convention, an underscore prefix
indicates a representation of a C type (such as @racket[_int]) or a
constructor of such representations (such as @racket[_cpointer]).}
@racketblock[
(define _WINDOW-pointer (_cpointer 'WINDOW))
(define-curses initscr (_fun -> _WINDOW-pointer))
(define-curses waddstr (_fun _WINDOW-pointer _string -> _int))
(define-curses wrefresh (_fun _WINDOW-pointer -> _int))
(define-curses endwin (_fun -> _int))
]
The definition of @racket[_WINDOW-pointer] creates a Racket value that
reflects a C type via @racket[_cpointer], which creates a type
representation for a pointer type---usually one that is opaque. The
@racket['WINDOW] argument could have been any value, but by
convention, we use a symbol matching the C base type.
Each @racket[define-curses] form uses the given identifier as both the
name of the library export and the Racket identifier to
bind.@margin-note*{An optional @racket[#:c-id] clause for
@racket[define-curses] can specify a name for the library export that
is different from the Racket identifier to bind.} The @racket[(_fun
... -> ...)] part of each definition describes the C type of the
exported function, since the library file does not encode that
information for its exports. The types listed to the left of @racket[->] are the
argument types, while the type to the right of @racket[->] is the
result type. The pre-defined @racket[_int] type naturally corresponds
to the @tt{int} C type, while @racket[_string] corresponds to the
@tt{char*} type when it is intended as a string to read.
At this point, @racket[initscr], @racket[waddstr], @racket[wrefresh],
and @racket[endwin] are normal Racket bindings to Racket functions
(that happen to call C functions), and so they can be exported from
the defining module or called directly:
@racketblock[
(define win (initscr))
(void (waddstr win "Hello"))
(void (wrefresh win))
(sleep 1)
(void (endwin))
]
@; --------------------------------------------------
@section{Function-Type Bells and Whistles}
Our initial use of functions like @racket[waddstr] is sloppy, because
we ignore return codes. C functions often return error
codes, and checking them is a pain. A better approach is to build the
check into the @racket[waddstr] binding and raise an exception when
the code is non-zero.
The @racket[_fun] function-type constructor includes many options to
help convert C functions to nicer Racket functions. We can use some of
those features to convert return codes into either @|void-const| or an
exception:
@racketblock[
(define (check v who)
(unless (zero? v)
(error who "failed: ~a" v)))
(define-curses initscr (_fun -> _WINDOW-pointer))
(define-curses waddstr (_fun _WINDOW-pointer _string -> (r : _int)
-> (check r 'waddstr)))
(define-curses wrefresh (_fun _WINDOW-pointer -> (r : _int)
-> (check r 'wrefresh)))
(define-curses endwin (_fun -> (r : _int)
-> (check r 'endwin)))
]
Using @racket[(r : _int)] as a result type gives the local name
@racket[r] to the C function's result. This name is then used in the
result post-processing expression that is specified after a second
@racket[->] in the @racket[_fun] form.
@; --------------------------------------------------
@section{By-Reference Arguments}
To get mouse events from @filepath{libcurses}, we must explicitly
enable them through the @racket[mousemask] function:
@verbatim[#:indent 2]{
typedef unsigned long mmask_t;
#define BUTTON1_CLICKED 004L
mmask_t mousemask(mmask_t newmask, mmask_t *oldmask);
}
Setting @racket[BUTTON1_CLICKED] in the mask enables button-click
events. At the same time, @racket[mousemask] returns the current mask
by installing it into the pointer provided as its second
argument.
Since these kinds of call-by-reference interfaces are common in C,
@racket[_fun] cooperates with a @racket[_ptr] form to automatically
allocate space for a by-reference argument and extract the value put
there by the C function. Give the extracted value name to use in the
post-processing expression. The post-processing expression can combine
the by-reference result with the function's direct result (which, in
this case, reports a subset of the given mask that is actually
supported).
@racketblock[
(define _mmask_t _ulong)
(define-curses mousemask (_fun _mmask_t (o : (_ptr o _mmask_t))
-> (r : _mmask_t)
-> (values o r)))
(define BUTTON1_CLICKED #o004)
(define-values (old supported) (mousemask BUTTON1_CLICKED))
]
@; --------------------------------------------------
@section{C Structs}
Assuming that mouse events are supported, the @filepath{libcurses}
library reports them via @racket[getmouse], which accepts a pointer to
a @cpp{MEVENT} struct to fill with mouse-event information:
@verbatim[#:indent 2]{
typedef struct {
short id;
int x, y, z;
mmask_t bstate;
} MEVENT;
int getmouse(MEVENT *event);
}
To work with @cpp{MEVENT} values, we use @racket[define-cstruct]:
@racketblock[
(define-cstruct _MEVENT ([id _short]
[x _int]
[y _int]
[z _int]
[bstate _mmask_t]))
]
This definition binds many names in the same way that
@racket[define-struct] binds many names: @racket[_MEVENT] is a C type
representing the struct type, @racket[_MEVENT-pointer] is a C type
representing a pointer to a @racket[_MEVENT], @racket[make-MEVENT]
constructs a @racket[_MEVENT] value, @racket[MEVENT-x] extracts
the @racket[x] fields from an @racket[_MEVENT] value, and so on.
With this C struct declaration, we can define the function type for
@racket[getmouse]. The simplest approach is to define
@racket[getmouse] to accept an @racket[_MEVENT-pointer], and then explicitly
allocate the @racket[_MEVENT] value before calling @racket[getmouse]:
@racketblock[
(define-curses getmouse (_fun _MEVENT-pointer -> _int))
(define m (make-MEVENT 0 0 0 0 0))
(when (zero? (getmouse m))
(code:comment @#,t{use @racket[m]...})
....)
]
For a more Racket-like function, use @racket[(_ptr o _MEVENT)] and a
post-processing expression:
@racketblock[
(define-curses getmouse (_fun (m : (_ptr o _MEVENT))
-> (r : _int)
-> (and (zero? r) m)))
(waddstr win (format "click me fast..."))
(wrefresh win)
(sleep 1)
(define m (getmouse))
(when m
(waddstr win (format "at ~a,~a"
(MEVENT-x m)
(MEVENT-y m)))
(wrefresh win)
(sleep 1))
(endwin)
]
The difference between @racket[_MEVENT-pointer] and @racket[_MEVENT]
is crucial. Using @racket[(_ptr o _MEVENT-pointer)] would allocate
only enough space for a pointer to an @cpp{MEVENT} struct, which is
not enough space for an @cpp{MEVENT} struct.
@; --------------------------------------------------
@section{Pointers and Manual Allocation}
To get text from the user instead of a mouse click, @racket{libcurses}
provides @racket[wgetnstr]:
@verbatim[#:indent 2]{
int wgetnstr(WINDOW *win, char *str, int n);
}
While the @cpp{char*} argument to @racket[waddstr] is treated as a
nul-terminated string, the @cpp{char*} argument to @racket[wgetnstr]
is treated as a buffer whose size is indicated by the final @cpp{int}
argument. The C type @racket[_string] does not work for such
buffers.
One way to approach this function from Racket is to describe the
arguments in their rawest form, using plain @racket[_pointer] for the
second argument to @racket[wgetnstr]:
@racket[
(define-curses wgetnstr (_fun _WINDOW-pointer _pointer _int
-> _int))
]
To call this raw version of @racket[wgetnstr], allocate memory, zero
it, and pass the size minus one (to leave room a nul
terminator) to @racket[wgetnstr]:
@racketblock[
(define SIZE 256)
(define buffer (malloc 'raw SIZE))
(memset buffer 0 SIZE)
(void (wgetnstr win buffer (sub1 SIZE)))
]
When @racket[wgetnstr] returns, it has written bytes to
@racket[buffer]. At that point, we can use @racket[cast] to convert the
value from a raw pointer to a string:
@racketblock[
(cast buffer _pointer _string)
]
Conversion via the @racket[_string] type causes the data refereced by
the original pointer to be copied (and UTF-8 decoded), so the memory
referenced by @racket[buffer] is no longer needed. Memory allocated
with @racket[(malloc 'raw ...)] must be released with @racket[free]:
@racketblock[
(free buffer)
]
@; --------------------------------------------------
@section{Pointers and GC-Managed Allocation}
Instead of allocating @racket[buffer] with @racket[(malloc 'raw ...)],
we could have allocated it with @racket[(malloc 'atomic ...)]:
@racketblock[
(define buffer (malloc 'atomic SIZE))
]
Memory allocated with @racket['atomic] is managed by the garbage
collector, so @racket[free] is neither necessary nor allowed when the
memory referenced by @racket[buffer] is no longer needed. Instead,
when @racket[buffer] becomes inaccessible, the allocated memory will
be reclaimed automatically.
Allowing the garbage collector (GC) to manage memory is usually
preferable. It's easy to forget to call @racket[free], and exceptions
or thread termination can easily skip a @racket[free].
At the same time, using GC-managed memory adds a different burden on
the programmer: data managed by the GC may be moved to a new address
as the GC compacts allocated objects to avoid fragmentation. C
functions, meanwhile, expect to receive pointers to objects that will
stay put.
Fortunately, unless a C function calls back into the Racket run-time
system (perhaps through a function that is provided as an argument),
no garbage collection will happen between the time that a C function
is called and the time that the function returns.
Let's look a few possibilities related to allocation and pointers:
@itemlist[
@item{Ok:
@racketblock[
(define p (malloc 'atomic SIZE))
(wgetnstr win p (sub1 SIZE))
]
Although the data allocated by @racket[malloc] can move
around, @racket[p] will always point to it, and no garbage collection
will happen between the time that the address is extracted form
@racket[p] to pass to @racket[wgetnstr] and the time that
@racket[wgetnstr] returns.}
@item{Bad:
@racketblock[
(define p (malloc 'atomic SIZE))
(define i (cast p _pointer _intptr))
(wgetnstr win (cast i _intptr _pointer) (sub1 SIZE))
]
The data referenced by @racket[p] can move after the
address is converted to an integer, in which case @racket[i] cast
back to a pointer will be the wrong address.
Obviously, casting a pointer to an integer is generally a bad idea,
but the cast simulates another possibility, which is passing the
pointer to a C function that retains the pointer in its own private
store for later use. Such private storage is invisible to the Racket
GC, so it has the same effect as casting the pointer to an integer.}
@item{Ok:
@racketblock[
(define p (malloc 'atomic SIZE))
(define p2 (ptr-add p 4))
(wgetnstr win p2 (- SIZE 5))
]
The pointer @racket[p2] retains the original reference and
only adds the @racket[4] at the last minute before calling
@racket[wgetnstr] (i.e., after the point that garbage collection is
allowed).}
@item{Ok:
@racketblock[
(define p (malloc 'atomic-interior SIZE))
(define i (cast p _pointer _intptr))
(wgetnstr win (cast i _intptr _pointer) (sub1 SIZE))
]
This is ok assuming that @racket[p] itself stays accessible, so that
the data it references isn't reclaimed. Allocating with
@racket['atomic-interior] puts data at a particular address and
keeps it there. A garbage collection will not change the address in
@racket[p], and so @racket[i] (cast back to a pointer) will always
refer to the data.}
]
Keep in mind that C struct constructors like @racket[make-MEVENT] are
effectively the same as @racket[(malloc 'atomic ...)]; the result values
can move in memory during a garbage collection. The same is true of
byte strings allocated with @racket[make-bytes], which (as a
convenience) can be used directly as a pointer value (unlike character
strings, which are always copied for UTF-8 encoding or decoding).
For more information about memory management and garbage collection,
see @secref[#:doc InsideRacket-doc "im:memoryalloc"] in
@|InsideRacket|.
@; --------------------------------------------------
@section{Reliable Release of Resources}
Using GC-managed memory saves you from manual @racket[free]s for plain
memory blocks, but C libraries often allocate resources and require a
matching call to a function that releases the resources. For example,
@filepath{libcurses} supports windows on the screen that
are created with @racket[newwin] and released with @racket[delwin]:
@verbatim[#:indent 2]{
WINDOW *newwin(int lines, int ncols, int y, int x);
int delwin(WINDOW *win);
}
In a sufficiently complex program, ensuring that every @racket[newwin]
is paired with @racket[delwin] can be challenging, especially if the
functions are wrapped by otherwise safe functions that are provided
from a library. A library that is intended to be safe for use in a
sandbox, say, must protect against resource leaks within the Racket
process as a whole when a sandboxed program misbehaves or is
terminated.
The @racketmodname[ffi/unsafe/alloc] library provides functions to
connect resource-allocating functions and resource-releasing
functions. The library then arranges for finalization to release a resource if
it becomes inaccessible (according to the GC) before it is explicitly
released. At the same time, the library handles tricky atomicity
requirements to ensure that the finalization is properly registered
and never run multiple times.
Using @racketmodname[ffi/unsafe/alloc], the @racket[newwin] and
@racket[delwin] functions can be imported with @racket[allocator]
and @racket[deallocator] wrappers, respectively:
@racketblock[
(require ffi/unsafe/alloc)
(define-curses delwin (_fun _WINDOW-pointer -> _int)
#:wrap (deallocator))
(define-curses newwin (_fun _int _int _int _int
-> _WINDOW-pointer)
#:wrap (allocator delwin))
]
A @racket[deallocator] wrapper makes a function cancel any existing
finalizer for the function's argument. An @racket[allocator] wrapper
refers to the deallocator, so that the deallocator can be run if
necessary by a finalizer.
If a resource is scarce or visible to end users, then @tech[#:doc
reference.scrbl]{custodian} management is more appropriate than
mere finalization as implemented by @racket[allocator]. See the
@racketmodname[ffi/unsafe/custodian] library.
@; ------------------------------------------------------------
@section{More Examples}
For more examples of common FFI patterns, see the defined interfaces
in the @filepath{ffi/examples} collection. See also @cite["Barzilay04"].