#lang scribble/doc @(require "utils.rkt" scribble/racket (for-syntax racket/base) (for-label ffi/unsafe/define)) @(define-syntax _MEVENT (make-element-id-transformer (lambda (stx) #'@schemeidfont{_MEVENT}))) @(define-syntax _MEVENT-pointer (make-element-id-transformer (lambda (stx) #'@schemeidfont{_MEVENT-pointer}))) @(define-syntax _WINDOW-pointer (make-element-id-transformer (lambda (stx) #'@schemeidfont{_WINDOW-pointer}))) @(define-syntax _mmask_t (make-element-id-transformer (lambda (stx) #'@schemeidfont{_mmask_t}))) @title[#:tag "intro"]{Overview} Although using the FFI requires writing no new C code, it provides very little insulation against the issues that C programmers face related to safety and memory management. An FFI programmer must be particularly aware of memory management issues for data that spans the Racket--C divide. Thus, this manual relies in many ways on the information in @|InsideRacket|, which defines how Racket interacts with C APIs in general. Since using the FFI entails many safety concerns that Racket programmers can normally ignore, the library name includes @racketidfont{unsafe}. Importing the library macro should be considered as a declaration that your code is itself unsafe, therefore can lead to serious problems in case of bugs: it is your responsibility to provide a safe interface. If your library provides an unsafe interface, then it should have @racketidfont{unsafe} in its name, too. For more information on the motivation and design of the Racket FFI, see @cite["Barzilay04"]. @; -------------------------------------------------- @section{Libraries, C Types, and Objects} To use the FFI, you must have in mind @itemlist[ @item{a particular library from which you want to access a function or value, } @item{a particular symbol exported by the file, and} @item{the C-level type (typically a function type) of the exported symbol.} ] The library corresponds to a file with a suffix such as @filepath{.dll}, @filepath{.so}, or @filepath{.dylib} (depending on the platform), or it might be a library within a @filepath{.framework} directory on Mac OS X. Knowing the library's name and/or path is often the trickiest part of using the FFI. Sometimes, when using a library name without a path prefix or file suffix, the library file can be located automatically, especially on Unix. See @racket[ffi-lib] for advice. The @racket[ffi-lib] function gets a handle to a library. To extract exports of the library, it's simplest to use @racket[define-ffi-definer] from the @racketmodname[ffi/unsafe/define] library: @racketmod[ racket/base (require ffi/unsafe ffi/unsafe/define) (define-ffi-definer define-curses (ffi-lib "libcurses")) ] This @racket[define-ffi-definer] declaration introduces a @racket[define-curses] form for binding a Racket name to a value extracted from @filepath{libcurses}---which might be located at @filepath{/usr/lib/libcurses.so}, depending on the platform. To use @racket[define-curses], we need the names and C types of functions from @filepath{libcurses}. We'll start by using the following functions: @verbatim[#:indent 2]{ WINDOW* initscr(void); int waddstr(WINDOW *win, char *str); int wrefresh(WINDOW *win); int endwin(void); } We make these functions callable from Racket as follows: @margin-note{By convention, an underscore prefix indicates a representation of a C type (such as @racket[_int]) or a constructor of such representations (such as @racket[_cpointer]).} @racketblock[ (define _WINDOW-pointer (_cpointer 'WINDOW)) (define-curses initscr (_fun -> _WINDOW-pointer)) (define-curses waddstr (_fun _WINDOW-pointer _string -> _int)) (define-curses wrefresh (_fun _WINDOW-pointer -> _int)) (define-curses endwin (_fun -> _int)) ] The definition of @racket[_WINDOW-pointer] creates a Racket value that reflects a C type via @racket[_cpointer], which creates a type representation for a pointer type---usually one that is opaque. The @racket['WINDOW] argument could have been any value, but by convention, we use a symbol matching the C base type. Each @racket[define-curses] form uses the given identifier as both the name of the library export and the Racket identifier to bind.@margin-note*{An optional @racket[#:c-id] clause for @racket[define-curses] can specify a name for the library export that is different from the Racket identifier to bind.} The @racket[(_fun ... -> ...)] part of each definition describes the C type of the exported function, since the library file does not encode that information for its exports. The types listed to the left of @racket[->] are the argument types, while the type to the right of @racket[->] is the result type. The pre-defined @racket[_int] type naturally corresponds to the @tt{int} C type, while @racket[_string] corresponds to the @tt{char*} type when it is intended as a string to read. At this point, @racket[initscr], @racket[waddstr], @racket[wrefresh], and @racket[endwin] are normal Racket bindings to Racket functions (that happen to call C functions), and so they can be exported from the defining module or called directly: @racketblock[ (define win (initscr)) (void (waddstr win "Hello")) (void (wrefresh win)) (sleep 1) (void (endwin)) ] @; -------------------------------------------------- @section{Function-Type Bells and Whistles} Our initial use of functions like @racket[waddstr] is sloppy, because we ignore return codes. C functions often return error codes, and checking them is a pain. A better approach is to build the check into the @racket[waddstr] binding and raise an exception when the code is non-zero. The @racket[_fun] function-type constructor includes many options to help convert C functions to nicer Racket functions. We can use some of those features to convert return codes into either @|void-const| or an exception: @racketblock[ (define (check v who) (unless (zero? v) (error who "failed: ~a" v))) (define-curses initscr (_fun -> _WINDOW-pointer)) (define-curses waddstr (_fun _WINDOW-pointer _string -> (r : _int) -> (check r 'waddstr))) (define-curses wrefresh (_fun _WINDOW-pointer -> (r : _int) -> (check r 'wrefresh))) (define-curses endwin (_fun -> (r : _int) -> (check r 'endwin))) ] Using @racket[(r : _int)] as a result type gives the local name @racket[r] to the C function's result. This name is then used in the result post-processing expression that is specified after a second @racket[->] in the @racket[_fun] form. @; -------------------------------------------------- @section{By-Reference Arguments} To get mouse events from @filepath{libcurses}, we must explicitly enable them through the @racket[mousemask] function: @verbatim[#:indent 2]{ typedef unsigned long mmask_t; #define BUTTON1_CLICKED 004L mmask_t mousemask(mmask_t newmask, mmask_t *oldmask); } Setting @racket[BUTTON1_CLICKED] in the mask enables button-click events. At the same time, @racket[mousemask] returns the current mask by installing it into the pointer provided as its second argument. Since these kinds of call-by-reference interfaces are common in C, @racket[_fun] cooperates with a @racket[_ptr] form to automatically allocate space for a by-reference argument and extract the value put there by the C function. Give the extracted value name to use in the post-processing expression. The post-processing expression can combine the by-reference result with the function's direct result (which, in this case, reports a subset of the given mask that is actually supported). @racketblock[ (define _mmask_t _ulong) (define-curses mousemask (_fun _mmask_t (o : (_ptr o _mmask_t)) -> (r : _mmask_t) -> (values o r))) (define BUTTON1_CLICKED #o004) (define-values (old supported) (mousemask BUTTON1_CLICKED)) ] @; -------------------------------------------------- @section{C Structs} Assuming that mouse events are supported, the @filepath{libcurses} library reports them via @racket[getmouse], which accepts a pointer to a @cpp{MEVENT} struct to fill with mouse-event information: @verbatim[#:indent 2]{ typedef struct { short id; int x, y, z; mmask_t bstate; } MEVENT; int getmouse(MEVENT *event); } To work with @cpp{MEVENT} values, we use @racket[define-cstruct]: @racketblock[ (define-cstruct _MEVENT ([id _short] [x _int] [y _int] [z _int] [bstate _mmask_t])) ] This definition binds many names in the same way that @racket[define-struct] binds many names: @racket[_MEVENT] is a C type representing the struct type, @racket[_MEVENT-pointer] is a C type representing a pointer to a @racket[_MEVENT], @racket[make-MEVENT] constructs a @racket[_MEVENT] value, @racket[MEVENT-x] extracts the @racket[x] fields from an @racket[_MEVENT] value, and so on. With this C struct declaration, we can define the function type for @racket[getmouse]. The simplest approach is to define @racket[getmouse] to accept an @racket[_MEVENT-pointer], and then explicitly allocate the @racket[_MEVENT] value before calling @racket[getmouse]: @racketblock[ (define-curses getmouse (_fun _MEVENT-pointer -> _int)) (define m (make-MEVENT 0 0 0 0 0)) (when (zero? (getmouse m)) (code:comment @#,t{use @racket[m]...}) ....) ] For a more Racket-like function, use @racket[(_ptr o _MEVENT)] and a post-processing expression: @racketblock[ (define-curses getmouse (_fun (m : (_ptr o _MEVENT)) -> (r : _int) -> (and (zero? r) m))) (waddstr win (format "click me fast...")) (wrefresh win) (sleep 1) (define m (getmouse)) (when m (waddstr win (format "at ~a,~a" (MEVENT-x m) (MEVENT-y m))) (wrefresh win) (sleep 1)) (endwin) ] The difference between @racket[_MEVENT-pointer] and @racket[_MEVENT] is crucial. Using @racket[(_ptr o _MEVENT-pointer)] would allocate only enough space for a pointer to an @cpp{MEVENT} struct, which is not enough space for an @cpp{MEVENT} struct. @; -------------------------------------------------- @section{Pointers and Manual Allocation} To get text from the user instead of a mouse click, @racket{libcurses} provides @racket[wgetnstr]: @verbatim[#:indent 2]{ int wgetnstr(WINDOW *win, char *str, int n); } While the @cpp{char*} argument to @racket[waddstr] is treated as a nul-terminated string, the @cpp{char*} argument to @racket[wgetnstr] is treated as a buffer whose size is indicated by the final @cpp{int} argument. The C type @racket[_string] does not work for such buffers. One way to approach this function from Racket is to describe the arguments in their rawest form, using plain @racket[_pointer] for the second argument to @racket[wgetnstr]: @racket[ (define-curses wgetnstr (_fun _WINDOW-pointer _pointer _int -> _int)) ] To call this raw version of @racket[wgetnstr], allocate memory, zero it, and pass the size minus one (to leave room a nul terminator) to @racket[wgetnstr]: @racketblock[ (define SIZE 256) (define buffer (malloc 'raw SIZE)) (memset buffer 0 SIZE) (void (wgetnstr win buffer (sub1 SIZE))) ] When @racket[wgetnstr] returns, it has written bytes to @racket[buffer]. At that point, we can use @racket[cast] to convert the value from a raw pointer to a string: @racketblock[ (cast buffer _pointer _string) ] Conversion via the @racket[_string] type causes the data refereced by the original pointer to be copied (and UTF-8 decoded), so the memory referenced by @racket[buffer] is no longer needed. Memory allocated with @racket[(malloc 'raw ...)] must be released with @racket[free]: @racketblock[ (free buffer) ] @; -------------------------------------------------- @section{Pointers and GC-Managed Allocation} Instead of allocating @racket[buffer] with @racket[(malloc 'raw ...)], we could have allocated it with @racket[(malloc 'atomic ...)]: @racketblock[ (define buffer (malloc 'atomic SIZE)) ] Memory allocated with @racket['atomic] is managed by the garbage collector, so @racket[free] is neither necessary nor allowed when the memory referenced by @racket[buffer] is no longer needed. Instead, when @racket[buffer] becomes inaccessible, the allocated memory will be reclaimed automatically. Allowing the garbage collector (GC) to manage memory is usually preferable. It's easy to forget to call @racket[free], and exceptions or thread termination can easily skip a @racket[free]. At the same time, using GC-managed memory adds a different burden on the programmer: data managed by the GC may be moved to a new address as the GC compacts allocated objects to avoid fragmentation. C functions, meanwhile, expect to receive pointers to objects that will stay put. Fortunately, unless a C function calls back into the Racket run-time system (perhaps through a function that is provided as an argument), no garbage collection will happen between the time that a C function is called and the time that the function returns. Let's look a few possibilities related to allocation and pointers: @itemlist[ @item{Ok: @racketblock[ (define p (malloc 'atomic SIZE)) (wgetnstr win p (sub1 SIZE)) ] Although the data allocated by @racket[malloc] can move around, @racket[p] will always point to it, and no garbage collection will happen between the time that the address is extracted form @racket[p] to pass to @racket[wgetnstr] and the time that @racket[wgetnstr] returns.} @item{Bad: @racketblock[ (define p (malloc 'atomic SIZE)) (define i (cast p _pointer _intptr)) (wgetnstr win (cast i _intptr _pointer) (sub1 SIZE)) ] The data referenced by @racket[p] can move after the address is converted to an integer, in which case @racket[i] cast back to a pointer will be the wrong address. Obviously, casting a pointer to an integer is generally a bad idea, but the cast simulates another possibility, which is passing the pointer to a C function that retains the pointer in its own private store for later use. Such private storage is invisible to the Racket GC, so it has the same effect as casting the pointer to an integer.} @item{Ok: @racketblock[ (define p (malloc 'atomic SIZE)) (define p2 (ptr-add p 4)) (wgetnstr win p2 (- SIZE 5)) ] The pointer @racket[p2] retains the original reference and only adds the @racket[4] at the last minute before calling @racket[wgetnstr] (i.e., after the point that garbage collection is allowed).} @item{Ok: @racketblock[ (define p (malloc 'atomic-interior SIZE)) (define i (cast p _pointer _intptr)) (wgetnstr win (cast i _intptr _pointer) (sub1 SIZE)) ] This is ok assuming that @racket[p] itself stays accessible, so that the data it references isn't reclaimed. Allocating with @racket['atomic-interior] puts data at a particular address and keeps it there. A garbage collection will not change the address in @racket[p], and so @racket[i] (cast back to a pointer) will always refer to the data.} ] Keep in mind that C struct constructors like @racket[make-MEVENT] are effectively the same as @racket[(malloc 'atomic ...)]; the result values can move in memory during a garbage collection. The same is true of byte strings allocated with @racket[make-bytes], which (as a convenience) can be used directly as a pointer value (unlike character strings, which are always copied for UTF-8 encoding or decoding). For more information about memory management and garbage collection, see @secref[#:doc InsideRacket-doc "im:memoryalloc"] in @|InsideRacket|. @; -------------------------------------------------- @section{Reliable Release of Resources} Using GC-managed memory saves you from manual @racket[free]s for plain memory blocks, but C libraries often allocate resources and require a matching call to a function that releases the resources. For example, @filepath{libcurses} supports windows on the screen that are created with @racket[newwin] and released with @racket[delwin]: @verbatim[#:indent 2]{ WINDOW *newwin(int lines, int ncols, int y, int x); int delwin(WINDOW *win); } In a sufficiently complex program, ensuring that every @racket[newwin] is paired with @racket[delwin] can be challenging, especially if the functions are wrapped by otherwise safe functions that are provided from a library. A library that is intended to be safe for use in a sandbox, say, must protect against resource leaks within the Racket process as a whole when a sandboxed program misbehaves or is terminated. The @racketmodname[ffi/unsafe/alloc] library provides functions to connect resource-allocating functions and resource-releasing functions. The library then arranges for finalization to release a resource if it becomes inaccessible (according to the GC) before it is explicitly released. At the same time, the library handles tricky atomicity requirements to ensure that the finalization is properly registered and never run multiple times. Using @racketmodname[ffi/unsafe/alloc], the @racket[newwin] and @racket[delwin] functions can be imported with @racket[allocator] and @racket[deallocator] wrappers, respectively: @racketblock[ (require ffi/unsafe/alloc) (define-curses delwin (_fun _WINDOW-pointer -> _int) #:wrap (deallocator)) (define-curses newwin (_fun _int _int _int _int -> _WINDOW-pointer) #:wrap (allocator delwin)) ] A @racket[deallocator] wrapper makes a function cancel any existing finalizer for the function's argument. An @racket[allocator] wrapper refers to the deallocator, so that the deallocator can be run if necessary by a finalizer. If a resource is scarce or visible to end users, then @tech[#:doc reference.scrbl]{custodian} management is more appropriate than mere finalization as implemented by @racket[allocator]. See the @racketmodname[ffi/unsafe/custodian] library. @; ------------------------------------------------------------ @section{More Examples} For more examples of common FFI patterns, see the defined interfaces in the @filepath{ffi/examples} collection. See also @cite["Barzilay04"].