ffi docs: clarifications on string representations

This commit is contained in:
Matthew Flatt 2019-05-03 09:16:35 -06:00
parent 920c899ba8
commit 94a24efd10

View File

@ -274,31 +274,50 @@ inputs.}
@subsection{Primitive String Types}
See also @racket[_bytes/nul-terminator] and @racket[_bytes] for
See also @racket[_bytes/nul-terminated] and @racket[_bytes] for
converting between byte strings and C's @cpp{char*} type.
@deftogether[(
@defthing[_string/ucs-4 ctype?]
)]{
A type for Racket's native Unicode strings, which are in UCS-4 format.
These correspond to the C @cpp{mzchar*} type used by Racket's C API.
As usual, the type treats @racket[#f] as @cpp{NULL} and vice versa.}
A type for UCS-4 format strings that include a nul terminator. As
usual, the type treats @racket[#f] as @cpp{NULL} and vice versa.
For the @3m[] and @CGC[] variants of Racket, the conversion of a
Racket string for the foreign side shares memory with the Racket
string representation, since UCS-4 is the native representation format
for those variants. The foreign pointer corresponds to the
@cpp{mzchar*} type in Racket's C API.
For the @CS[] variant of Racket, the conversion of a Racket string for
the foreign side is a copy of the Racket representation, where the
copy is managed by the garbage collector.}
@deftogether[(
@defthing[_string/utf-16 ctype?]
)]{
Unicode strings in UTF-16 format. As usual, the types treat
@racket[#f] as @cpp{NULL} and vice versa.}
Unicode strings in UTF-16 format that include a nul terminator. As
usual, the types treat @racket[#f] as @cpp{NULL} and vice versa.
The conversion of a Racket string for the foreign side is a copy of
the Racket representation (reencoded), where the copy is managed by
the garbage collector.}
@defthing[_path ctype?]{
Simple @cpp{char*} strings, corresponding to Racket's @tech[#:doc
reference.scrbl]{path or string}. As usual, the type treats
@racket[#f] as @cpp{NULL} and vice versa.
Simple @cpp{char*} strings that are nul terminated, corresponding to
Racket's @tech[#:doc reference.scrbl]{path or string}. As usual, the
type treats @racket[#f] as @cpp{NULL} and vice versa.
For the @3m[] and @CGC[] variants of Racket, the conversion of a
Racket path for the foreign side shares memory with the Racket path
representation. Otherwise (for the @CS[] variant or for Racket
strings), conversion for the foreign side creates a copy that is
managed by the garbage collector.
Beware that changing the current directory via
@racket[current-directory] does not change the OS-level current
@ -307,11 +326,21 @@ be converted to absolute form using @racket[path->complete-path]
(which uses the @racket[current-directory] parameter) before passing
them to a foreign function.}
@defthing[_symbol ctype?]{
Simple @cpp{char*} strings as Racket symbols (encoded in UTF-8).
Return values using this type are interned as symbols.}
Simple @cpp{char*} strings as Racket symbols (encoded in UTF-8 and nul
terminated), intended as read-only for the foreign side. Return values
using this type are interned as symbols.
For the @3m[] and @CGC[] variants of Racket, the conversion of a
Racket symbol for the foreign side shares memory with the Racket
symbol representation, but points to the middle of the symbol's
allocated memory---so the string pointer must not be used across a
garbage collection.
For the @CS[] variant of Racket, the conversion of a Racket symbol for
the foreign side is a copy of the Racket representation, where the
copy is managed by the garbage collector.}
@subsection{Fixed Auto-Converting String Types}
@ -337,7 +366,7 @@ Racket paths are converted using @racket[path->bytes].}
@subsection{Variable Auto-Converting String Type}
The @racket[_string/ucs-4] type is rarely useful when interacting with
foreign code, while using @racket[_bytes/nul-terminator] is somewhat unnatural, since
foreign code, while using @racket[_bytes/nul-terminated] is somewhat unnatural, since
it forces Racket programmers to use byte strings. Using
@racket[_string/utf-8], etc., meanwhile, may prematurely commit to a
particular encoding of strings as bytes. The @racket[_string] type
@ -1108,27 +1137,23 @@ See @racket[_list] for more explanation about the examples.}
(_bytes o len-expr)]]{
The @racket[_bytes] form by itself corresponds to C's @cpp{char*}
type; a byte string is passed as @racket[_bytes] without any
copying. In the current Racket implementation, a Racket byte string is
normally nul terminated implicitly, but a future implementation of
Racket may not include an implicit nul terminator for byte strings.
See also @racket[_bytes/nul-terminated].
type; a byte string is passed as @racket[_bytes] without any copying.
Beware that a Racket byte string is not necessarily nul terminated;
see also @racket[_bytes/nul-terminated].
In the current Racket implementation, as @racket[_bytes] result, a C
non-NULL @cpp{char*} is wrapped as a Racket byte string without
copying; future Racket implementations may require copying to
represent a C @cpp{char*} result as a Racket byte string. The C result
must have a nul terminator to determine the Racket byte string's
length.
In the @3m[] and @CGC[] variants of Racket, a C non-NULL @cpp{char*}
is converted to a Racket byte string without copying. In the @CS[]
variant, conversion requires copying to represent a C @cpp{char*}
result as a Racket byte string. In both cases, the C result must have
a nul terminator to determine the Racket byte string's length.
A @racket[(_bytes o len-expr)] form is a @tech{custom function type}.
As an argument, a byte string is allocated with the given length; in
the current Racket implementation, that byte string includes an extra
byte for the nul terminator (but, again, a future Racket
implementation may not behave that way). As a result, @racket[(_bytes
o len-expr)] wraps a C non-NULL @cpp{char*} pointer as a byte string of
the given length (but, again, a future Racket implementation may copy
the indicated number of bytes to a fresh byte string).
the @3m[] and @CGC[] variants, that byte string includes an extra byte
for the nul terminator, and @racket[(_bytes o len-expr)] as a result
wraps a C non-NULL @cpp{char*} pointer as a byte string of the given
length. For the @CS[] variant, the allocated argument does not include
a nul terminator and a copy is made for a result string.
As usual, @racket[_bytes] treats @racket[#f] as @cpp{NULL} and vice
versa. As a result type, @racket[(_bytes o len-expr)] works only for
@ -1145,9 +1170,9 @@ an explicit nul-terminator byte is added to a byte-string argument,
which implies copying. As a result type, a @cpp{char*} is copied to a
fresh byte string (without an explicit nul terminator).
When @racket[(_bytes o len-expr)] is used as an argument type, a byte
When @racket[(_bytes/nul-terminated o len-expr)] is used as an argument type, a byte
string of length @racket[len-expr] is allocated. Similarly, when
@racket[(_bytes o len-expr)] is used as a result type, a @cpp{char*}
@racket[(_bytes/nul-terminated o len-expr)] is used as a result type, a @cpp{char*}
result is copied to a fresh byte string of length @racket[len-expr].
As usual, @racket[_bytes/nul-terminated] treats @racket[#f] as