racket/notes/mzscheme/MzScheme_300.txt
2005-05-27 17:52:04 +00:00

598 lines
24 KiB
Plaintext

MzScheme version 300 is different from previous versions of MzScheme
in several significant ways:
* MzScheme's reader is case-sensitive for symbols/identifier by
default. Prefix an S-expression with #ci to make it
case-insensitive.
* MzScheme now directly supports Unicode. The "char" datatype
corresponds to a Unicode code point, and strings correspond to a
sequence of code points. Meanwhile, a new "byte string" datatype
implements a sequence of bytes (exact integers between 0 and 255),
and byte strings take over the old role of strings with respect to
low-level port operations.
Regexp matching works on both char strings and byte strings, and
MzScheme provides various operations for encoding chars as byte
strings. See the "Unicode" section below for more information.
* Related to the Unicode change, MzScheme now uses a distinct "path"
datatype for file and directory names, instead of using
strings. Built-in procedures that accept a path also accept a
string (and implicitly convert it); procedures that produce a path
never produce a string. See the "Paths" section below for more
details.
* The new "foreign.ss" library in MzLib provides access to foreign
libraries dynamically and directly in Scheme. See "PLT Foreign
Interface Manual" for more information.
* File-stream output ports (including file ports, the initial output
port, and ports created by `subprocess') are now block-buffered by
default, instead of line-buffered. The exception is when an output
port corresponds to a terminal, in which case it is line-buffered
by default. Also, the initial error port remains unbuffered.
TCP output ports are block buffered (instead of unbuffered) by
default.
The file-stream changes are especially likely to affect stdio-based
communication among OS-level processes. For example, when
communicating with an ispell subprocess, adding a newline at the
end of a command previously would have been enough to send the
command to ispell. Now, the output must be flushed explicitly
(using `flush-output') or the buffer mode must be explicitly
changed to by-line (using `file-stream-buffer-mode').
The TCP changes affect most TCP-based communication. Explicitly
flush output using `flush-output' or change the buffer mode using
`file-stream-buffer-mode'.
* The class system has changed slightly. The `rename' keyword has
been changed to `rename-super', but the new `super' expression form
eliminates the need for most `rename' declarations. Also, the class
system supports methods that cannot be overridden entirely, but
that can augmented through "inner" methods (as in Beta). Some
methods in MrEd's classes have been changed to augment-only
methods. Finally, `class*/names' has been eliminated, and `this',
`super-new', etc. are all exported by the "class.ss" module. See
the "Classes" section below for more details.
* The built-in exception hierarchy has been revised and streamlined
(again). See the "Exceptions" section below for more details.
* A continuation is no longer tied to its creating thread.
Various continuation barriers remain in place, such as around the
call to an exception handler or syntax expander, and also around
the start of MzScheme's main thread. The main thread's barrier
prevents continuations captured in the main thread from being used
in other threads (which should make sense, intuitively, because
then other threads could become "main"). A newly created thread,
however, has no such barrier, so that created threads can trade
continuations.
* The "parameter" construct has been redefined. The revised
`parameterize' is like the old one, except that:
- The `parameterize' form accepts only parameter procedures
created by `make-parameter', not arbitrary procedures that
accept 0 or 1 arguments.
- The body of a `parameterize' is in tail position with respect
to the entire `parameterize' expression.
- The given parameter procedures are not called on exit from a
`parameterize' form, so the parameter guards (if any) are not
called.
- A `parameterize' expression tends to execute much more quickly,
while parameter lookup can be slightly slower.
- A `parameterize' has the expected effect if a continuation is
captured during the `parameterize' body and invoked in a
different thread.
Preserved thread cells now provide precisely the semantics of old
"parameters" (but without a form like `parameterize'). Meanwhile, a
new "parameter" maps a continuation to a preserved thread cell,
which in turn provides a thread-specific value.
* The `break-enabled' procedure no longer corresponds to a parameter,
because changing the break-enable state implies a check for a
suspended break, and this check is incompatible with tail
evaluation of `parameterize' forms.
Related to this change, if a `with-handlers' handler is called to
handle an exception, breaks are initially disabled for the handler,
but the handler is not called in tail position with respect to the
`with-handlers' form. (The body is in tail position, though.) Use
`with-handlers*' to make a handler called in tail position, but
without breaks disabled.
* The `object-wait-multiple' function has been renamed to
`sync/timeout', and `sync' is the same procedure without a timeout
argument. The `object-wait-multiple/enable-break' procedure has
been renamed to `sync/timeout/enable-break', and
`sync/enable-break' enables breaks without a timeout.
The "waitable" procedures have been renamed to "evt" procedures in
general, often dropping "make-". "Evt" stands for "synchronizable
event". Several new event-generating procedures have been added.
Old New
--- ---
object-waitable? evt?
waitables->waitable-set choice-evt
make-channel-put-waitable channel-put-evt
make-semaphore-peek semaphore-peek-evt
make-wrapped-waitable wrap-evt or handle-evt
make-guard-waitable guard-evt
make-nack-guard-waitable nack-guard-evt
make-poll-guard-waitable poll-guard-evt
thread-dead-waitable thread-dead-evt
thread-suspend-waitable thread-suspend-evt
thread-resume-waitable thread-resume-evt
udp-receive-waitable udp-receive-ready-evt
udp-send-waitable udp-send-ready-evt
alarm-evt
write-bytes-avail-evt
udp-receive!-evt
udp-send-to-evt
udp-send-evt
...
* The new `require-for-template' core form serves as a kind of dual
to `require-for-syntax', and the new `define-for-syntax' and
`begin-for-syntax' forms allow macro helper functions to be placed
closer to macro definitions. See the MzScheme manual for more
information.
* Unexported module bindings are more secure because they can only
appear in certified contexts, and they can be made completely
secure by changing the current code inspector. Certification
management is automatic for most macros, but certification requires
changes to programs that transform the result of `expand' and feed
the transformed program back to `eval'. See the MzScheme manual
for more information.
======================================================================
Unicode
======================================================================
The "char" datatype means "Unicode code point", which technically
should not be confused with "Unicode character". But most things that
a literate human would call a "character" can be represented by a
single code point in Unicode, so the "code point" approximation of
"character" works well for many purposes. See section 1.2 in the
MzScheme manual for an overview of MzScheme's approach to Unicode and
locales.
In particular, `integer->char' produces a character for every exact
integer from 0 to #x10FFFF, except #xD800 to #xDFFF (which are
reserved for surrogates in some encodings of Unicode).
The `bytes->string/utf-8' and `string->bytes/utf-8' functions convert
between byte string and character strings via UTF-8. The
`bytes->string/utf-8' procedure accepts an optional character to use
in place of bad encoding sequences (otherwise an exception is raised).
A general `bytes-convert' interface converts among different encodings
in a bytes, including UTF-8 and the current locale's encoding. The
conversion interface can deal with input that ends mid-encoding, so it
can be used for conversion on streams, too. (The converter uses iconv
where available.)
Internally, strings are encoded as UCS-4, but symbols are encoded in
UTF-8.
Other details:
* The `char->latin-1-integer' and `latin-1-integer->char' procedures
have been removed.
* Added a `bytes-...' operation for most every `string-...' operation.
The `byte?' predicate returns true for exact integers in [0,255].
* `regexp' produces a char regexp, and `byte-regexp' produces a byte
regexp. A regexp can be matched against a byte string (or port), in
which case the byte string (or port) is interpreted as a UTF-8
encoding. Similarly, a regexp can be matched against a string, in
which case the string is encoded via UTF-8 before matching.
* A hash before a string makes it a byte-string literal:
(string->list "hi") = '(#\h #\i)
(bytes->list #"hi") = '(104 105)
Similarly, #rx"...." is a regexp, while #rx#"...." is a byte
regexp.
* Use #\uXXXX or #\UXXXXXX for arbitrary character constants, where
each X is a hexadecimal digit and the resulting number identifies a
code point. In a string (but not a byte string), use "\uXXXX" or
"\UXXXXXX".
* All of the `char-whitespace?', `char-alphabetic?', etc. functions
are defined in accordance with SRFI-14. New functions include
`char-title-case?', `char-blank?', `char-graphic?' `char-symbolic?',
and `char-titlecase'.
* The built-in string functions remain locale-independent (as in
SRFI-13), and `string-locale=?', etc. provide locale-sensitive
comparisons. The `string-locale-upcase' and
`string-locale-downcase' functions provide locale-sensitive case
conversion. No locale-sensitive character operations are provided
(the old ones have been removed).
* Case-insensitivity for symbols is consistent with SRFI-13, which
means using the 1-1 character mapping defined by the Unicode
consortium.
Number parsing recognizes only ASCII digits (and A-F/a-f) for
numbers, but all `char-whitespace?' characters are treated as
whitespace by `read'.
* MzScheme effectively assumes UTF-8 stdin and stdout, but library
procedures like `reencode-input-port' can be used to accommodate
other encodings, including the locale's encoding. DrScheme reads
and writes files using UTF-8.
Ports
-----
"Port" still means "byte port" in MzScheme. Various port operations,
like `read-string-avail!', have been renamed to to `read-bytes-avail!'.
Character operations on a port, such as `read-char' and `read-string',
are defined in terms of a UTF-8 parsing/writing of the port's byte
stream. (With a custom-port wrapper and the byte-string conversion
functions, other decodings can be implemented.)
Position and column counting for a port is sensitive to UTF-8. For
example, reading #o302 followed by #o251 increments the position and
column by 1, instead of 2.
======================================================================
Paths
======================================================================
Under Unix, paths are fundamentally byte strings, not strings.
Typically, the correct printing of a path use the current locale's
encoding, but there's no guarantee that the path is well-formed using
the current locale's encoding.
To mediate these view of paths, MzScheme now supplies a "path"
datatype, with operations `path->string', `string->path',
`bytes->string', and `bytes->path'. Use `path->string' to print a path
to the user, but use `path->bytes' to marshal a path (e.g., for saving
a pathname in a file).
All functions that consume a pathname accept a string and implicitly
convert it (via the user's locale's default encoding) to a byte-string
pathname.
Under Windows, where a pathname is an array of UTF-16 code units,
MzScheme internally converts to and from byte strings via
UTF-8<->UTF-16, but extended to support unpaired surrogates and other
code units that are invalid in an encoding. A byte string that is not
a UTF-8 encoding will never correspond to a pathname under Windows.
======================================================================
Classes
======================================================================
Changes to the `(lib "class.ss")' object system are in three parts:
- a syntactic clean-up to eliminate `class*/names',
- a syntactic clean-up for super calls, and
- new constructs for augment-only methods.
Meanwhile, keywords such as `public' are now bound to syntactic forms
that report out-of-context uses (much like `unquote' and
`unquote-splicing').
The Demise of `class*/names'
----------------------------
The `class*/names' form allowed the programmer to specify names to be
bound instead of `this', `super-new', etc. The `class*' and `class'
forms non-hygienically introduced those names. Macros that would
naturally expand to `class' or `class*' had to expand to
`class*/names', instead, because expanding to a non-hygienic macro
usually does not work.
In v300, `this', `super-new', etc. are exported by `(lib "class.ss")',
and attempting to use the keywords outside of a `class' or `class*'
form results in a syntax error. Meanwhile, macros can easily and
correctly expand to uses of `class' and `class*'.
Super Calls
-----------
A `rename' clause is no longer necessary in a typical class with
method overrides, due to the new `super' form. For example,
(class splotch%
(rename [super-paint paint])
(define/override (paint x)
(super-paint x)
....)
(super-new))
can now be written
(class splotch%
(define/override (paint x)
(super paint x)
....)
(super-new))
An `override' declaration enables the corresponding (internal) method
name to be used with the `super' form. The `super' form is legal only
for expressions within a `class' (or `class*', etc.).
For cases where `super' cannot be used --- either because no
overriding method is declared in a class that calls a super method, or
because the super call is in a lexically nested class --- the
`rename-super' form can be used just like the old `rename' form.
The script plt/notes/mzscheme/rename-super-fixup.ss may be useful for
converting code that uses `rename' to use `super'.
Augment-Only Methods
--------------------
A `pubment' clause declares a method like `public', but the resulting
method cannot be overridden. Instead, the `pubment' method can use
`inner' to dispatch to an augmenting method declared in a
subclass. The word "pubment" is a contraction of "public, but merely
augmentable in subclasses".
The `inner' expression form includes an expression to evaluate when a
subclass does not provide an augmenting method. A subclass augments a
`pubment' method with `augment' instead of `override'. The `augment'
declaration itself is non-overridable, and it can use `inner' to allow
further augmentation in further subclasses.
Example:
(define img%
(class object%
;; No subclass can avoid clearing the dc in `paint',
;; but a subclass can augment `paint' to draw afterward.
;; The result indicates the size of the drawn image,
;; which is 0 if the paint method is not augmented.
(define/pubment (paint dc)
(send dc clear)
(inner 0 paint dc))
(super-new)))
(define box%
(class img%
;; Add a square to the drawing, but allow subclasses
;; to draw first. Subclasses cannot skip the final
;; square-drawing step. Note that the result of the
;; method is the result of the `inner' call, which is 20
;; if the paint method is not augmented.
(define/augment (paint dc)
(begin0
(inner 20 paint dc)
(send dc draw-rectangle 0 0 20 20)))
(super-new)))
(define frbox%
(class img%
;; Add a larger red square as a background.
(define/augment (paint dc)
(send dc set-color (make-object color% "red"))
(send dc draw-rectangle -1 -1 22 22)
(send dc set-color (make-object color% "black"))
(inner 22 paint dc))
(super-new)))
(send (new img%) paint dc) ; => 0
; and clears the dc
(send (new box%) paint dc) ; => 20
; and clears the dc,
; then draws a black rectangle
(send (new frbox%) paint dc) ; => 22
; and clears the dc,
; then draws a big red rectangle
; then draws a black rectangle
An augmentation itself can be made overrideable using `augride', which
is a contraction of "augment, but allow the augment to be overridden".
Similarly, `overment' overrides a method, but allows subclasses only
to augment this overriding.
(define dot%
(class img%
;; This augmentation of img% can be replaced in
;; subclasses.
(define/augride (paint dc)
(send dc draw-ellipse 0 0 20 20)
20)
(super-new)))
(define emptydot%
(class dot%
;; Draw nothing, but still claim to have
;; drawn something of size 20. The dc is still
;; cleared in `paint' from img%; the override
;; replaces only `paint' in dot%.
(define/override (paint dc)
20)
(super-new)))
(define frdot%
(class dot%
;; This method re-uses the `paint' augmentation in
;; dot%, and allows further augmentation in subclasses
;; (which cannot skip the painting here).
(define/overment (paint dc)
(send dc set-color (make-object color% "red"))
(send dc draw-ellipse -1 -1 22 22)
(send dc set-color (make-object color% "black"))
(super paint dc)
(inner 22 paint dc))
(super-new)))
Note that `pubment', `augment', or `overment' without an `inner' call
is effectively the same as `public-final', `augment-final', or
`override-final'. However, the `-final' variants report a class error
if a subclass attempts to augment the method, whereas the non-`-final'
variants allow subclasses to include an augmentation (that is always
ignored).
In general:
Can use `inner'? Can use `super'?
public N N
pubment Y N
override N Y
augment Y N
overment Y Y
augride N N
public-final N N
override-final N Y
augment-final N N
The `rename-inner' form is similar to `rename-super'. Like
`rename-super', it is rarely useful compared to `inner'. A use of a
binding introduced by `rename-inner' must include a `lambda' pattern
after the identifier to provide the default expression (i.e., the
expression to evaluate if no subclass augments the method); see the
documentation for further information.
Keywords
--------
The various keywords for class clauses are now all defined as syntax
and exported by `(lib "class.ss")'. Use of a keyword in an expression
positions produces a syntax error.
A complete list of keywords:
private public override augment
pubment overment augride
public-final override-final augment-final
field init init-field
rename-super rename-inner inherit
super inner
======================================================================
Exceptions
======================================================================
The new exception hierarchy distinguishes between breaks and failures
at nearly the top level of the hierarchy. In particular, most
`with-handlers' expressions should use the `exn:fail?' predicate,
instead of the old (and now removed) `not-break-exn?' predicate.
The "type" and "mismatch" exceptions have been merged into
`exn:fail:contract'. Similarly, `exn:i/o:tcp' and `exn:i/o:udp'
have been merged into `exn:fail:network'.
Many exception fields have been eliminated, but certain exceptions
contain multiple source locations instead of just one. Instead of a
single type for all exceptions with source locations, the
`exn:srclocs' property identifies exceptions with source-location
information.
Field guards are triggered when an exception record is created, and it
checks the "type" of the field arguments. Mutators are not exported
for exception fields.
Structs:
exn - message continuation-marks
exn:fail
exn:fail:contract
exn:fail:contract:arity
exn:fail:contract:divide-by-zero
exn:fail:contract:continuation
exn:fail:contract:variable - id
exn:fail:syntax - exprs
exn:fail:read - sources
exn:fail:read:eof
exn:fail:read:non-char
exn:fail:filesystem
exn:fail:filesystem:exists
exn:fail:filesystem:version
exn:fail:network
exn:fail:out-of-memory
exn:fail:unsupported
exn:break - continuation
special-comment - width
; Note: not exn:special-comment, because it doesn't need
; a message or marks
Properties:
exn:srclocs - accessor
======================================================================
Inside MzScheme (extend MzScheme via C)
======================================================================
A structure that represents a Scheme type should now start with a
Scheme_Object, instead of Scheme_Type. A Scheme_Object contains only a
Scheme_Type (except in 3m mode), so it takes the same amount of space
as before. But using Scheme_Object instead of Scheme_Type ensures that
casts to and from Scheme_Object* do not run afoul of C99's aliasing
assumptions.
SCHEME_STRINGP(), etc. have been replaced by SCHEME_CHAR_STRINGP(),
etc. and SCHEME_BYTE_STRINGP(), etc. A character is represented by the
`mzchar' type, which corresponds to an unsigned integer (4 bytes).
Use the functions scheme_char_string_to_byte_string() and
scheme_byte_string_to_char_string() to convert between string types
via UTF-8. Several UTF-8/UTF-16 <-> mzchar conversion functions are
also provided.
In addition to functions scheme_char_string...() which operate on
`mzchar' arrays, some functions scheme_utf8_string...() are provided,
which accept a `char' array and interpret it as a UTF-8 encoding.
SCHEME_PATHP() recognizes the new path type. Use SCHEME_STRING_PATHP()
to recognize either a string or path, and use scheme_string_to_path()
to convert a string to a path.
The error_buf field of Scheme_Thread is now a pointer to a mz_jmp_buf,
instead of an inlined mz_jmp_buf. The protocol for temporarily
catching an exception is now as follows:
mz_jmp_buf *save, fresh;
save = scheme_current_thread->error_buf;
scheme_current_thread->error_buf = &fresh;
if (scheme_setjmp(scheme_error_buf)) {
/* There was an error or continuation invocation */
if (scheme_jumping_to_continuation) {
/* It was a continuation jump */
scheme_longjmp(*save, 1);
/* To block the jump, instead: scheme_clear_escape(); */
} else {
/* It was a primitive error escape */
}
} else {
/* Whatever might escape. */
....
}
scheme_current_thread->error_buf = save;
The input and output port driver interfaces have changed to accommodate
progress events and commits (for input ports) and write events (for
output ports). For most port types, the new features can be
implemented automatically by MzScheme with a small amount of extra
work in the driver.