diff --git a/collects/scribblings/foreign/foreign.scrbl b/collects/scribblings/foreign/foreign.scrbl index 0e13b1d690..014b70a24b 100644 --- a/collects/scribblings/foreign/foreign.scrbl +++ b/collects/scribblings/foreign/foreign.scrbl @@ -15,6 +15,8 @@ interface}. Furthermore, since most APIs consist mostly of functions, the foreign interface is sometimes called a @defterm{foreign function interface}, abbreviated @deftech{FFI}. +@;------------------------------------------------------------------------ + @table-of-contents[] @include-section["intro.scrbl"] @@ -25,4 +27,11 @@ interface}, abbreviated @deftech{FFI}. @include-section["misc.scrbl"] @include-section["unexported.scrbl"] +@(bibliography + (bib-entry #:key "Barzilay04" + #:author "Eli Barzilay and Dmitry Orlovsky" + #:title "Foreign Interface for PLT Scheme" + #:location "Workshop on Scheme and Functional Programming" + #:date "2004")) + @index-section[] diff --git a/collects/scribblings/foreign/intro.scrbl b/collects/scribblings/foreign/intro.scrbl index d2ca27ee1e..ba7912ae02 100644 --- a/collects/scribblings/foreign/intro.scrbl +++ b/collects/scribblings/foreign/intro.scrbl @@ -1,5 +1,18 @@ #lang scribble/doc -@(require "utils.rkt") +@(require "utils.rkt" + scribble/racket + (for-syntax racket/base) + (for-label ffi/unsafe/define)) + +@(define-syntax _MEVENT (make-element-id-transformer + (lambda (stx) #'@schemeidfont{_MEVENT}))) +@(define-syntax _MEVENT-pointer (make-element-id-transformer + (lambda (stx) #'@schemeidfont{_MEVENT-pointer}))) +@(define-syntax _WINDOW-pointer (make-element-id-transformer + (lambda (stx) #'@schemeidfont{_WINDOW-pointer}))) +@(define-syntax _mmask_t (make-element-id-transformer + (lambda (stx) #'@schemeidfont{_mmask_t}))) + @title[#:tag "intro"]{Overview} @@ -20,5 +33,482 @@ responsibility to provide a safe interface. If your library provides an unsafe interface, then it should have @racketidfont{unsafe} in its name, too. -For examples of common FFI usage patterns, see the defined interfaces -in the @filepath{ffi} collection. +For more information on the motivation and design of the Racket FFI, +see @cite["Barzilay04"]. + +@; -------------------------------------------------- + +@section{Libraries, C Types, and Objects} + +To use the FFI, you must have in mind + +@itemlist[ + + @item{a particular library from which you want to access a function + or value, } + + @item{a particular symbol exported by the file, and} + + @item{the C-level type (typically a function type) of the exported + symbol.} + +] + +The library corresponds to a file with a suffix such as +@filepath{.dll}, @filepath{.so}, or @filepath{.dylib} (depending on +the platform), or it might be a library within a @filepath{.framework} +directory on Mac OS X. + +Knowing the library's name and/or path is often the trickiest part of +using the FFI. Sometimes, when using a library name without a path +prefix or file suffix, the library file can be located automatically, +especially on Unix. See @racket[ffi-lib] for advice. + +The @racket[ffi-lib] function gets a handle to a library. To extract +exports of the library, it's simplest to use +@racket[define-ffi-definer] from the @racketmodname[ffi/unsafe/define] +library: + +@racketmod[ +racket/base +(require ffi/unsafe + ffi/unsafe/define) + +(define-ffi-definer define-curses "libcurses") +] + +This @racket[define-ffi-definer] declaration introduces a +@racket[define-curses] form for binding a Racket name to a value +extracted from @filepath{libcurses}---which might be located +at @filepath{/usr/lib/libcurses.so}, depending on +the platform. + +To use @racket[define-curses], we need the names and C types of +functions from @filepath{libcurses}. We'll start by using the +following functions: + +@verbatim[#:indent 2]{ + WINDOW* initscr(void); + int waddstr(WINDOW *win, char *str); + int wrefresh(WINDOW *win); + int endwin(void); +} + +We make these functions callable from Racket as follows: + +@margin-note{By convention, an underscore prefix +indicates a representation of a C type (such as @racket[_int]) or a +constructor of such representations (such as @racket[_cpointer]).} + +@racketblock[ +(define _WINDOW-pointer (_cpointer 'WINDOW)) + +(define-curses initscr (_fun -> _WINDOW-pointer)) +(define-curses waddstr (_fun _WINDOW-pointer _string -> _int)) +(define-curses wrefresh (_fun _WINDOW-pointer -> _int)) +(define-curses endwin (_fun -> _int)) +] + +The definition of @racket[_WINDOW-pointer] creates a Racket value that +reflects a C type via @racket[_cpointer], which creates a type +representation for a pointer type---usually one that is opaque. The +@racket['WINDOW] argument could have been any value, but by +convention, we use a symbol matching the C base type. + +Each @racket[define-curses] form uses the given identifier as both the +name of the library export and the Racket identifier to +bind.@margin-note*{An optional @racket[#:c-id] clause for +@racket[define-curses] can specify a name for the library export that +is different from the Racket identifier to bind.} The @racket[(_fun +... -> ...)] part of each definition describes the C type of the +exported function, since the library file does not encode that +information for its exports. The types listed to the left of @racket[->] are the +argument types, while the type to the right of @racket[->] is the +result type. The pre-defined @racket[_int] type naturally corresponds +to the @tt{int} C type, while @racket[_string] corresponds to the +@tt{char*} type when it is intended as a string to read. + +At this point, @racket[initscr], @racket[waddstr], @racket[wrefresh], +and @racket[endwin] are normal Racket bindings to Racket functions +(that happen to call C functions), and so they can be exported from +the defining module or called directly: + +@racketblock[ +(define win (initscr)) +(void (waddstr win "Hello")) +(void (wrefresh win)) +(sleep 1) +(void (endwin)) +] + +@; -------------------------------------------------- + +@section{Function-Type Bells and Whistles} + +Our initial use of functions like @racket[waddstr] is sloppy, because +we ignore return codes. C functions often return error +codes, and checking them is a pain. A better approach is to build the +check into the @racket[waddstr] binding and raise an exception when +the code is non-zero. + +The @racket[_fun] function-type constructor includes many options to +help convert C functions to nicer Racket functions. We can use some of +those features to convert return codes into either @|void-const| or an +exception: + +@racketblock[ +(define (check v who) + (unless (zero? v) + (error who "failed: ~a" v))) + +(define-curses initscr (_fun -> _WINDOW-pointer)) +(define-curses waddstr (_fun _WINDOW-pointer _string -> (r : _int) + -> (check r 'waddstr))) +(define-curses wrefresh (_fun _WINDOW-pointer -> (r : _int) + -> (check r 'wrefresh))) +(define-curses endwin (_fun -> (r : _int) + -> (check r 'endwin))) +] + +Using @racket[(r : _int)] as a result type gives the local name +@racket[r] to the C function's result. This name is then used in the +result post-processing expression that is specified after a second +@racket[->] in the @racket[_fun] form. + +@; -------------------------------------------------- + +@section{By-Reference Arguments} + +To get mouse events from @filepath{libcurses}, we must explicitly +enable them through the @racket[mousemask] function: + +@verbatim[#:indent 2]{ +typedef unsigned long mmask_t; +#define BUTTON1_CLICKED 004L + +mmask_t mousemask(mmask_t newmask, mmask_t *oldmask); +} + +Setting @racket[BUTTON1_CLICKED] in the mask enables button-click +events. At the same time, @racket[mousemask] returns the current mask +by installing it into the pointer provided as its second +argument. + +Since these kinds of call-by-reference interfaces are common in C, +@racket[_fun] cooperates with a @racket[_ptr] form to automatically +allocate space for a by-reference argument and extract the value put +there by the C function. Give the extracted value name to use in the +post-processing expression. The post-processing expression can combine +the by-reference result with the function's direct result (which, in +this case, reports a subset of the given mask that is actually +supported). + +@racketblock[ +(define _mmask_t _ulong) +(define-curses mousemask (_fun _mmask_t (o : (_ptr o _mmask_t)) + -> (r : _mmask_t) + -> (values o r))) +(define BUTTON1_CLICKED #o004) + +(define-values (old supported) (mousemask BUTTON1_CLICKED)) +] + +@; -------------------------------------------------- + +@section{C Structs} + +Assuming that mouse events are supported, the @filepath{libcurses} +library reports them via @racket[getmouse], which accepts a pointer to +a @cpp{MEVENT} struct to fill with mouse-event information: + +@verbatim[#:indent 2]{ + typedef struct { + short id; + int x, y, z; + mmask_t bstate; + } MEVENT; + + int getmouse(MEVENT *event); +} + +To work with @cpp{MEVENT} values, we use @racket[define-cstruct]: + +@racketblock[ +(define-cstruct _MEVENT ([id _short] + [x _int] + [y _int] + [z _int] + [bstate _mmask_t])) +] + +This definition binds many names in the same way that +@racket[define-struct] binds many names: @racket[_MEVENT] is a C type +representing the struct type, @racket[_MEVENT-pointer] is a C type +representing a pointer to a @racket[_MEVENT], @racket[make-MEVENT] +constructs a @racket[_MEVENT] value, @racket[MEVENT-x] extracts +the @racket[x] fields from an @racket[_MEVENT] value, and so on. + +With this C struct declaration, we can define the function type for +@racket[getmouse]. The simplest approach is to define +@racket[getmouse] to accept an @racket[_MEVENT-pointer], and then explicitly +allocate the @racket[_MEVENT] value before calling @racket[getmouse]: + +@racketblock[ +(define-curses getmouse (_fun _MEVENT-pointer -> _int)) + +(define m (make-MEVENT 0 0 0 0 0)) +(when (zero? (getmouse m)) + (code:comment @#,t{use @racket[m]...}) + ....) +] + +For a more Racket-like function, use @racket[(_ptr o _MEVENT)] and a +post-processing expression: + +@racketblock[ +(define-curses getmouse (_fun (m : (_ptr o _MEVENT)) + -> (r : _int) + -> (and (zero? r) m))) + +(waddstr win (format "click me fast...")) +(wrefresh win) +(sleep 1) + +(define m (getmouse)) +(when m + (waddstr win (format "at ~a,~a" + (MEVENT-x m) + (MEVENT-y m))) + (wrefresh win) + (sleep 1)) + +(endwin) +] + +The difference between @racket[_MEVENT-pointer] and @racket[_MEVENT] +is crucial. Using @racket[(_ptr o _MEVENT-pointer)] would allocate +only enough space for a pointer to an @cpp{MEVENT} struct, which is +not enough space for an @cpp{MEVENT} struct. + +@; -------------------------------------------------- + +@section{Pointers and Manual Allocation} + +To get text from the user instead of a mouse click, @racket{libcurses} +provides @racket[wgetnstr]: + +@verbatim[#:indent 2]{ +int wgetnstr(WINDOW *win, char *str, int n); +} + +While the @cpp{char*} argument to @racket[waddstr] is treated as a +nul-terminated string, the @cpp{char*} argument to @racket[wgetnstr] +is treated as a buffer whose size is indicated by the final @cpp{int} +argument. The C type @racket[_string] does not work for such +buffers. + +One way to approach this function from Racket is to describe the +arguments in their rawest form, using plain @racket[_pointer] for the +second argument to @racket[wgetnstr]: + +@racket[ +(define-curses wgetnstr (_fun _WINDOW-pointer _pointer _int + -> _int)) +] + +To call this raw version of @racket[wgetnstr], allocate memory, zero +it, and pass the size minus one (to leave room a nul +terminator) to @racket[wgetnstr]: + +@racketblock[ +(define SIZE 256) +(define buffer (malloc 'raw SIZE)) +(memset buffer 0 SIZE) + +(void (wgetnstr win buffer (sub1 SIZE))) +] + +When @racket[wgetnstr] returns, it has written bytes to +@racket[buffer]. At that point, we can use @racket[cast] to convert the +value from a raw pointer to a string: + +@racketblock[ +(cast buffer _pointer _string) +] + +Conversion via the @racket[_string] type causes the data refereced by +the original pointer to be copied (and UTF-8 decoded), so the memory +referenced by @racket[buffer] is no longer needed. Memory allocated +with @racket[(malloc 'raw ...)] must be released with @racket[free]: + +@racketblock[ +(free buffer) +] + +@; -------------------------------------------------- + +@section{Pointers and GC-Managed Allocation} + +Instead of allocating @racket[buffer] with @racket[(malloc 'raw ...)], +we could have allocated it with @racket[(malloc 'atomic ...)]: + +@racketblock[ +(define buffer (malloc 'atomic SIZE)) +] + +Memory allocated with @racket['atomic] is managed by the garbage +collector, so @racket[free] is neither necessary nor allowed when the +memory referenced by @racket[buffer] is no longer needed. Instead, +when @racket[buffer] becomes inaccessible, the allocated memory will +be reclaimed automatically. + +Allowing the garbage collector (GC) to manage memory is usually +preferable. It's easy to forget to call @racket[free], and exceptions +or thread termination can easily skip a @racket[free]. + +At the same time, using GC-managed memory adds a different burden on +the programmer: data managed by the GC may be moved to a new address +as the GC compacts allocated objects to avoid fragmentation. C +functions, meanwhile, expect to receive pointers to objects that will +stay put. + +Fortunately, unless a C function calls back into the Racket run-time +system (perhaps through a function that is provided as an argument), +no garbage collection will happen between the time that a C function +is called and the time that the function returns. + +Let's look a few possibilities related to allocation and pointers: + +@itemlist[ + + @item{Ok: + + @racketblock[ + (define p (malloc 'atomic SIZE)) + (wgetnstr win p (sub1 SIZE)) + ] + + Although the data allocated by @racket[malloc] can move + around, @racket[p] will always point to it, and no garbage collection + will happen between the time that the address is extracted form + @racket[p] to pass to @racket[wgetnstr] and the time that + @racket[wgetnstr] returns.} + + @item{Bad: + + @racketblock[ + (define p (malloc 'atomic SIZE)) + (define i (cast p _pointer _intptr)) + (wgetnstr win (cast i _intptr _pointer) (sub1 SIZE)) + ] + + The data referenced by @racket[p] can move after the + address is converted to an integer, in which case @racket[i] cast + back to a pointer will be the wrong address. + + Obviously, casting a pointer to an integer is generally a bad idea, + but the cast simulates another possibility, which is passing the + pointer to a C function that retains the pointer in its own private + store for later use. Such private storage is invisible to the Racket + GC, so it has the same effect as casting the pointer to an integer.} + + @item{Ok: + + @racketblock[ + (define p (malloc 'atomic SIZE)) + (define p2 (ptr-add p 4)) + (wgetnstr win p2 (- SIZE 5)) + ] + + The pointer @racket[p2] retains the original reference and + only adds the @racket[4] at the last minute before calling + @racket[wgetnstr] (i.e., after the point that garbage collection is + allowed).} + + @item{Ok: + + @racketblock[ + (define p (malloc 'atomic-interior SIZE)) + (define i (cast p _pointer _intptr)) + (wgetnstr win (cast i _intptr _pointer) (sub1 SIZE)) + ] + + This is ok assuming that @racket[p] itself stays accessible, so that + the data it references isn't reclaimed. Allocating with + @racket['atomic-interior] puts data at a particular address and + keeps it there. A garbage collection will not change the address in + @racket[p], and so @racket[i] (cast back to a pointer) will always + refer to the data.} + +] + +Keep in mind that C struct constructors like @racket[make-MEVENT] are +effectively the same as @racket[(malloc 'atomic ...)]; the result values +can move in memory during a garbage collection. The same is true of +byte strings allocated with @racket[make-bytes], which (as a +convenience) can be used directly as a pointer value (unlike character +strings, which are always copied for UTF-8 encoding or decoding). + +For more information about memory management and garbage collection, +see @secref[#:doc InsideRacket-doc "im:memoryalloc"] in +@|InsideRacket|. + +@; -------------------------------------------------- + +@section{Reliable Release of Resources} + +Using GC-managed memory saves you from manual @racket[free]s for plain +memory blocks, but C libraries often allocate resources and require a +matching call to a function that releases the resources. For example, +@filepath{libcurses} supports windows on the screen that +are created with @racket[newwin] and released with @racket[delwin]: + +@verbatim[#:indent 2]{ +WINDOW *newwin(int lines, int ncols, int y, int x); +int delwin(WINDOW *win); +} + +In a sufficiently complex program, ensuring that every @racket[newwin] +is paired with @racket[delwin] can be challenging, especially if the +functions are wrapped by otherwise safe functions that are provided +from a library. A library that is intended to be safe for use in a +sandbox, say, must protect against resource leaks within the Racket +process as a whole when a sandboxed program misbehaves or is +terminated. + +The @racketmodname[ffi/unsafe/alloc] library provides functions to +connect resource-allocating functions and resource-releasing +functions. The library then for finalization to release a resource if +it becomes inaccessible (according to the GC) before it is explicitly +released. At the same time, the library handles tricky atomicity +requirements to ensure that the finalization is properly registered +and never run multiple times. + +Using @racketmodname[ffi/unsafe/alloc], the @racket[newwin] and +@racket[delwin] functions can be imported with @racket[allocator] +and @racket[deallocator] wrappers, respectively: + +@racketblock[ +(require ffi/unsafe/alloc) + +(define-curses delwin (_fun _WINDOW-pointer -> _int) + #:wrap (deallocator)) + +(define-curses newwin (_fun _int _int _int _int + -> _WINDOW-pointer) + #:wrap (allocator delwin)) +] + +A @racket[deallocator] wrapper makes a function cancel any existing +finalizer for the function's argument. An @racket[allocator] wrapper +refers to the deallocator, so that the deallocator can be run if +necessary by a finalizer. + +@; ------------------------------------------------------------ + +@section{More Examples} + +For more examples of common FFI patterns, see the defined interfaces +in the @filepath{ffi/examples} collection. See also @cite["Barzilay04"]. + + diff --git a/collects/scribblings/foreign/utils.rkt b/collects/scribblings/foreign/utils.rkt index 0571ef0e54..b3e9b21bef 100644 --- a/collects/scribblings/foreign/utils.rkt +++ b/collects/scribblings/foreign/utils.rkt @@ -12,7 +12,7 @@ ffi/vector)) (provide cpp - InsideRacket + InsideRacket InsideRacket-doc guide.scrbl (all-from-out scribble/manual) (for-label (all-from-out racket/base @@ -21,8 +21,10 @@ ffi/unsafe/cvector ffi/vector))) +(define InsideRacket-doc '(lib "scribblings/inside/inside.scrbl")) + (define InsideRacket - (other-manual '(lib "scribblings/inside/inside.scrbl"))) + (other-manual InsideRacket-doc)) (define guide.scrbl '(lib "scribblings/guide/guide.scrbl"))