racket/release_notes/release_notes.stex

\documentclass{releasenotes}

\thisversion{Version 9.4.1}
\thatversion{Version 8.4}
\pubmonth{March}
\pubyear{2017}

\begin{document}

\maketitle

% \tableofcontents

\section{Overview}

This document outlines the changes made to {\ChezScheme} for
{\thisversion} since {\thatversion}.

{\thisversion} is supported for the following platforms.
The Chez Scheme machine type (returned by the \scheme{machine-type}
procedure) is given in parentheses.

\begin{itemize}
\item Linux x86, nonthreaded (i3le) and threaded (ti3le)
\item Linux x86\_64, nonthreaded (a6le) and threaded (ta6le)
\item MacOS X x86, nonthreaded (i3osx) and threaded (ti3osx)
\item MacOS X x86\_64, nonthreaded (a6osx) and threaded (ta6osx)
\item Linux ARMv6 (32-bit), nonthreaded (arm32le)
\item Linux PowerPC (32-bit), nonthreaded (ppc32le) and threaded (tppc32le)
\item Windows x86, nonthreaded (i3nt) and threaded (ti3nt)
\item Windows x86\_64, nonthreaded (a6nt) and threaded (ta6nt) [experimental]
%\item OpenBSD x86, nonthreaded (i3ob) and threaded (ti3ob)
%\item OpenBSD x86\_64, nonthreaded (a6ob) and threaded (ta6ob)
%\item FreeBSD x86, nonthreaded (i3fb) and threaded (ti3fb)
%\item FreeBSD x86\_64, nonthreaded (a6fb) and threaded (ta6fb)
%\item NetBSD x86, nonthreaded (i3nb) and threaded (ti3nb)
%\item NetBSD x86\_64, nonthreaded (a6nb) and threaded (ta6nb)
%\item OpenSolaris x86, nonthreaded (i3s2) and threaded (ti3s2)
%\item OpenSolaris x86\_64, nonthreaded (a6s2) and threaded (ta6s2)
\end{itemize}

This document contains three sections describing significant
(1) \href[static]{section:functionality}{functionality changes},
(2) \href[static]{section:bugfixes}{bugs fixed}, and
(3) \href[static]{section:performance}{performance enhancements}.
A version number listed in parentheses in the header for a change
indicates the first minor release or internal prerelease to support
the change.

More information on {\ChezScheme} and {\PetiteChezScheme} can
\scheme{be} found at \hyperlink{http://www.scheme.com/}{http://www.scheme.com},
and extensive documentation is available in
\TSPL{4}{th} (available directly from MIT Press or from online and local retailers)
and the \CSUG{9}.
Online versions of both books can be found at
\hyperlink{http://www.scheme.com/}{http://www.scheme.com}.

%-----------------------------------------------------------------------------
\section{Functionality Changes}\label{section:functionality}

\subsection{Record equality and hashing (9.4.1)}

The new procedures \scheme{record-type-equal-procedure} and
\scheme{record-type-hash-procedure} can be used to customize the
handling of records by \scheme{equal?} and \scheme{hash}, and
the new procedures \scheme{record-equal-procedure} and
\scheme{record-hash-procedure} can be used to look up the
applicable (possibly inherited) equality and hashing procedures
for specific record instances.

\subsection{Immutable vectors, fxvectors, bytevectors, strings, and boxes (9.4.1)}

Support for immutable vectors, fxvectors, bytevectors, strings, and boxes
has been added.
Immutable vectors are created via \scheme{vector->immutable-vector},
and immutable fxvectors, bytevectors, and strings are created by similarly named
procedures.
Immutable boxes are created via \scheme{box-immutable}.
Any attempt to modify an immutable object causes an exception to be raised.

\subsection{Optional timeout for \protect\scheme{condition-wait} (9.4.1)}

The \scheme{condition-wait} procedure now takes an optional
\var{timeout} argument and returns a boolean indicating whether the
thread was awakened by the condition before the timeout. The
\var{timeout} can be a time record of type \scheme{time-duration} or
\scheme{time-utc}, or it can be \scheme{#f} for no timeout (the
default).

\subsection{\protect\scheme{procedure-arity-mask} (9.4.1)}

The new primitive procedure \scheme{procedure-arity-mask} takes a
procedure \var{p} and returns a two's complement bitmask representing
the argument counts accepted by \var{p}.
For example, the arity mask for a two-argument procedure such as
\var{cons} is $4$ (only bit two set),
while the arity mask for a procedure that accepts one or more arguments,
such as \var{list*}, is $-2$ (all but bit 0 set).

\subsection{High-precision clock time in Windows 8 and up (9.4.1)}

When running on Windows 8 and up, Chez Scheme uses the high-precision
clock time function for the current date and time.

\subsection{Printing of non-standard (extended) identifiers (9.4.1)}

Chez Scheme extends the syntax of identifiers as described in the
introduction to the Chez Scheme User's Guide, except within forms prefixed
by \scheme{#!r6rs}, which is implied by in a library or top-level program.
Prior to Version~9.4.1, the printer always printed such identifiers using
hex scalar value escapes as necessary to render them with valid R6RS identifier syntax.
When the new parameter \scheme{print-extended-identifiers} is set
to \scheme{#t}, these identifiers are printed without escapes, e.g.,
\scheme{1+} prints as \scheme{1+} rather than as \scheme{\x31;+}.
The default value of this parameter is \scheme{#f}.

\subsection{Expression-editor Unicode support (9.4.1)}

The expression editor now supports Unicode characters under Linux and MacOS~X
except that combining characters are not treated correctly for
line-wrapping.

\subsection{Extensions to whole-program, whole-library optimization (9.3.1, 9.3.4)}

\scheme{compile-whole-program} now supports incomplete
whole-program optimization, i.e., whole program optimization that
incorporates only libraries for which wpo files are available while
leaving separate libraries for which only object files are available.
In addition, imported libraries can be left visible for run-time
use by the \scheme{environment} procedure or for dynamically loaded
object files that might require them.
The new procedure \scheme{compile-whole-library} supports the combination
of groups of libraries separate from programs and unconditionally
leaves all imported libraries visible.

\subsection{24-, 40-, 48-, and 56-bit bit-field containers (9.3.3)}

The total size of the fields within an ftype \scheme{bits} can now be
24, 40, 48, or 56 (as well as 8, 16, 32, and 64).

\subsection{Object-counting for static-generation collections (9.3.3)}

Object counting (see \scheme{object-counts} below) is now enabled for
all collections targeting the static generation.

\subsection{Support for off-line profile profile-dump processing (9.3.2)}

Previously, the output of \scheme{profile-dump} was not specified.
It is now specified to be a list of source-object, profile-count pairs.
In addition, \scheme{profile-dump-html}, \scheme{profile-dump-list},
and \scheme{profile-dump-data} all now take an optional \var{dump}
argument, which is a list of source-object, profile-count pairs in
the form returned by \scheme{profile-dump} and defaults to the current
value of \scheme{(profile-dump)}.

With these changes, it is now possible to obtain a dump from
\scheme{profile-dump} in one process, and write it to a fasl file
(using \scheme{fasl-write}) for subsequent off-line processing in
another process, where it can be read from the fasl file (using
\scheme{fasl-read}) and processed using \scheme{profile-dump-html},
\scheme{profile-dump-list}, \scheme{profile-dump-data} or some
custom mechanism.

\subsection{More support for controlling return of memory to the O/S (9.3.2)}

A new parameter, \scheme{release-minimum-generation}, determines when
the collector attempts to return unneeded virtual memory to the O/S.
It defaults to the value of \scheme{collect-maximum-generation}, so the
collector attempts to return memory to the O/S only when performing a
maximum-generation collection.
It can be set to a lower generation number to cause the collector to
do so for younger generations we well.

\subsection{sstats changes (9.3.1)}

The vector-based sstats structure has been replaced with a record type.
The time fields are all time objects, and the bytes and count fields
are now exact integers.
\scheme{time-difference} no longer coerces negative results to zero.

\subsection{\protect\scheme{library-group} eliminated (9.3.1)}

With the extensions to \scheme{compile-whole-program} and the
addition of \scheme{compile-whole-library}, as described above,
support for whole-program and whole-library optimization now subsumes
the functionality of the experimental \scheme{library-group} form,
and the form has been eliminated.
This is an \emph{incompatible change}.

\subsection{Support for Version~7 interaction-environment semantics eliminated (9.3.1)}

Prior to Version~8, the semantics of the interaction environment
used by the read-eval-print loop (REPL), aka waiter, and by
\scheme{load}, \scheme{compile}, and \scheme{interpret} without
explicit environment arguments treated all variables in the environment
as mutable, including those bound to primitives.
This meant that top-level references to primitive names could not
be optimized by the compiler because their values might change at
run time, except that, at optimize-level 2 and above, the compiler
did treat primitive names as always having their original values.

In Version 8 and subsequent versions, primitive bindings in the
interaction environment are immutable, as if imported directly from
the immutable Scheme environment.
That is, they cannot be assigned, although they can be replaced
with new bindings with a top-level definition.

To provide temporary backward compatibility, the
\scheme{--revert-interaction-semantics} command-line option and
\scheme{revert-interaction-semantics} parameter allowed programmers
to revert the interaction environment to Version~7 semantics.
This functionality has now been eliminated and along with it the
special treatment of primitive bindings at optimize level 2 and
above.

This is an \emph{incompatible change}.

\subsection{Explicit specification of profile source locations (9.3.1)}

Version 9.3.1 augments existing support for explicit source-code
annotations with additional features targeted at source profiling
for externally generated programs, including programs generated by
language front ends that target Scheme and use Chez Scheme as the
back end.
Included is a \scheme{profile} expression that explicitly associates
a specified source object with a profile count (of times the
expression is evaluated), \scheme{generate-profile-forms} parameter
that controls whether the compiler (also) associates profile counts
with source locations implicitly identified by annotated expressions
in the input, and a finer-grained method for marking whether an
individual annotation should be used for debugging, profiling, or
both.

\subsection{``Maybe'' file (re)compilation (9.3.1)}

When \scheme{compile-imported-libraries} is set to \scheme{#t},
libraries required indirectly by one of the
file-compilation procedures, e.g., \scheme{compile-library},
\scheme{compile-program}, and \scheme{compile-file}, are automatically
compiled if and only if the object file is not present, older than
the source (main and include) files, or some library upon which
they depend has been or needs to be recompiled.

Version 9.3.1 adds three new procedures: \scheme{maybe-recompile-library},
\scheme{maybe-recompile-program}, and \scheme{maybe-recompile-file},
that perform a similar analysis and compile the library, program,
or file only under similar circumstances.

\subsection{New primitives for querying memory utilization (9.3.1)}

Three new primitives have been added to allow a Scheme process to
track usage of virtual memory for its heap.

\scheme{current-memory-bytes} returns the total number of bytes of
virtual memory used or reserved to represent the Scheme heap.
This differs from \scheme{bytes-allocated}, which returns the number
of bytes currently occupied by Scheme objects.
\scheme{current-memory-bytes} additionally includes memory used for
heap management as well as memory held in reserve to satisfy future
allocation requests.

\scheme{maximum-memory-bytes} returns the maximum number of bytes
of virtual memory occupied or reserved for the Scheme heap by the
calling process since the last call to \scheme{reset-maximum-memory-bytes!}
or, if \scheme{reset-maximum-memory-bytes!} has never been called,
since system start-up.

\scheme{reset-maximum-memory-bytes!} resets the maximum memory bytes
to the current memory bytes.

\subsection{Unicode 7.0 support (9.3.1)}

The character sets, character classes, and word-breaking algorithms
for character, string, and Unicode-related bytevector operations
have now been updated to Unicode 7.0.

\subsection{Linux PowerPC (32-bit) support (9.3)}

Support for running {\ChezScheme} on 32-bit PowerPC processors
running Linux has been added, with machines type ppc32le (nonthreaded)
and tppc32le (threaded).
C~code intended to be linked with these versions of the system
should be compiled using the GNU C~compiler's \scheme{-m32} option.

\subsection{Printed representation of procedures (9.2.1)}

The printed representation of a procedure now includes the source
file and beginning file position when available.

\subsection{I/O errors writing to the console error port (9.2.1)}

The default exception handler now catches I/O exceptions that occur
when it attempts to display a condition and, if an I/O exception
does occur, resets as if by calling the \scheme{reset} procedure.
The intent is to avoid an infinite regression (ultimately ending
in exhaustion of memory) in which the process repeatedly recurs
back to the default exception handler trying to write to a console-error
port (typically stderr) that is no longer writable, e.g., due to
the other end of a pipe or socket having been closed.

\subsection{C locking macros (9.2.1)}

The header file scheme.h distributed with Chez Scheme now includes
several new lock-related macros:
\scheme{INITLOCK} (corresponding to \scheme{ftype-init-lock!}),
\scheme{SPINLOCK} (\scheme{ftype-spin-lock!}),
\scheme{UNLOCK} (\scheme{ftype-unlock!}),
\scheme{LOCKED_INCR} (\scheme{ftype-locked-incr!}), and
\scheme{LOCKED_DECR} (\scheme{ftype-locked-decr!}).
All take a pointer to an iptr or uptr.
\scheme{LOCKED_INCR} and \scheme{LOCKED_DECR} also take an
\scheme{lvalue} argument that is set to true (nonzero) if the result
of the increment or decrement is zero, otherwise false (zero).

\subsection{New \protect\scheme{compile-to-file} procedure (9.2.1)}

The new procedure \scheme{compile-to-file} is similar to
\scheme{compile-to-port} with the output port replaced with an
output pathname.

\subsection{Whole-program optimization (9.2)}

Version 9.2 includes support for whole-program optimization of a top-level
program and the libraries upon which it depends at run time based on ``wpo''
(whole-program-optimization) files produced as a byproduct of compiling
the program and libraries when the parameter \scheme{generate-wpo-files}
is set to \scheme{#t}.
The new procedure \scheme{compile-whole-program} takes as input
a wpo file for a top-level program, combines it with the wpo files for
any libraries the program requires at run time, and produces a single
object file containing a self-contained program.
In so doing, it discards unused code and optimizes across program and
library boundaries, potentially reducing program load time, run time,
and memory requirements.

\scheme{compile-file}, \scheme{compile-program}, \scheme{compile-library},
and \scheme{compile-script} produce wpo files as well as ordinary
object files when the new \scheme{generate-wpo-files} parameter is set
to \scheme{#t} (the default is \scheme{#f}).
\scheme{compile-port} and \scheme{compile-to-port} do so when passed
an optional \var{wpo output port}.

\subsection{Type-specific symbol-hashtable operators (9.2)\label{sec:symbol-hashtables}}

A new set of primitives that operate on symbol
hashtables has been added:

\schemedisplay
symbol-hashtable?
symbol-hashtable-ref
symbol-hashtable-set!
symbol-hashtable-contains?
symbol-hashtable-cell
symbol-hashtable-update!
symbol-hashtable-delete!
\endschemedisplay

These are like their generic counterparts but operate only on symbol
hashtables, i.e., hashtables created with \scheme{symbol-hash} as
the hash function and \scheme{eq?}, \scheme{eqv?}, \scheme{equal?},
or \scheme{symbol=?} as the equivalence function.

These primitives are more efficient at optimize-level 3 than their
generic counterparts when both are applied to symbol hashtables.
The performance of symbol hashtables has been improved even when the new
operators are not used (Section~\ref{sec:symbol-hashtable-performance}).

\subsection{\protect\scheme{strip-fasl-file} is now machine-independent (9.2)}

\scheme{strip-fasl-file} can now strip fasl files created for a machine
type other than the machine type of the calling process as long as the
Chez Scheme version is the same.

\subsection{\protect\scheme{source-file-descriptor} and \protect\scheme{locate-source} (9.2)}

The new procedure \scheme{source-file-descriptor} can be used to construct
a custom source-file descriptor or reconstruct a source-file descriptor
from values previously extracted from another source-file descriptor.
It takes two arguments: a string \var{path} and exact nonnegative integer
\var{checksum} and returns a new source-file descriptor.

The new procedure \scheme{locate-source} can be used to determine a full
path, line number, and character position from a source-file descriptor
and file position.
It accepts two arguments: a source-file descriptor \var{sfd} and an
exact nonnegative integer file position \var{fp}.
It returns zero values if the unmodified file is not found in the source
directories and three values (string \var{path}, exact nonnegative
integer \var{line}, and exact nonnegative integer \var{char}) if the
file is found.

\subsection{Compressed compiled scripts and partially compressed files (9.2)}

Support for creating and handling files that begin with uncompressed
data and end with compressed data has been added in the form of the
new procedure \scheme{port-file-compressed!} that takes a port and
if not already set up to read or write compressed data, sets it up
to do so.
The port must be a file port pointing to a regular file, i.e., a
file on disk rather than a socket or pipe, and the port must not be
an input/output port.
The port can be a binary or textual port.
If the port is an output port, subsequent output sent to the port
will be compressed.
If the port is an input port, subsequent input will be decompressed
if and only if the port is currently pointing at compressed data.

When the parameter \scheme{compile-compressed} is set ot \scheme{#t},
the \scheme{compile-script} and \scheme{compile-program} procedures
take advantage of this functionality to copy the \scheme{#!} prefix,
if present in the source file, uncompressed in the object file while
compressing the object code emitted for the program, thus reducing
the size of the resulting file without preventing the \scheme{#!}
line from being read and interpreted properly by the operating
system.

\subsection{Change in library import handling (9.2)}

In previous releases, when an object file was found before the
corresponding source file in the library directories, the object file was
older, and the parameter \scheme{compile-imported-libraries} was not set,
the object file was loaded rather than the source file.
The (newer) source file is now loaded instead, just as it would be if
the source file is found before the corresponding, older object file.
This is an \emph{incompatible change}.

\subsection{Change in fasl-strip options (9.1)}

\scheme{strip-fasl-file} now supports stripping of all compile-time
information and no longer supports stripping of just library visit code.
Stripping all compile-time information nearly always results in smaller
object files than stripping just library visit code, with a corresponding
reduction in the memory required when the resulting
file is loaded.

To reflect this, the old fasl-strip option \scheme{library-visit-code}
has been eliminated, and the new fasl-strip option
\scheme{compile-time-information} has been added.
This is an \emph{incompatible change} in that code that previously
used the fasl-strip option \scheme{library-visit-code} will
have to be modified to omit the option or to replace it with
\scheme{compile-time-information}.

\subsection{Library loading (9.1)}

Visiting (via \scheme{visit}) a library no longer loads the library's
run-time information (invoke dependencies and invoke code), and revisiting
(via \scheme{revisit}) a library no longer loads the library's
compile-time information (import and visit dependencies and import and
visit code).

When a library is invoked due to a run-time dependency of another
library or a top-level program on the library, the library is now
``revisited'' (as if via \scheme{revisit}) rather than ``loaded''
(as if via \scheme{load}).
As a result, the compile-time information is not loaded, which can result
in substantial reductions in both library invocation time and memory
footprint.

If a library is revisited, either explicitly or as the result of run-time
dependency, a subsequent import of the library causes it to be
``visited'' (as if via \scheme{visit}) if the same object file can be
found at the same path and the visit code has not been stripped.
The compile-time code can alternatively be loaded explicitly from the same or a
different file via a direct call to \scheme{visit}.

While this change is mostly transparent (ignoring the reduced invocation
time and memory footprint), it is an \emph{incompatible change} in the
sense that the system potentially reads the file twice and can run
code that is marked using \scheme{eval-when} as both visit
and revisit code.

\subsection{Finding objects in the heap (9.1)}

Version 9.1 includes support for a new heap inspection tool that
allows a programmer to look for objects in the heap according to
arbitrary predicates.
The new procedure \scheme{make-object-finder} takes a predicate \var{pred} and two optional
arguments: a starting point \var{x} and a maximum generation \var{g}.
The starting point defaults to the value of the procedure \scheme{oblist},
and the maximum generation defaults to the value of the parameter
\scheme{collect-maximum-generation}.
\scheme{make-object-finder} returns an object finder \var{p} that can be used to
search for objects satisfying \var{pred} within the starting-point object \var{x}.
Immediate objects and objects in generations older than \var{g} are treated
as leaves.
\var{p} is a procedure accepting no arguments.
If an object \var{y} satisfying \var{pred} can be found starting with \var{x},
\var{p} returns a list whose first element is \var{y} and whose remaining
elements represent the path of objects from \var{x} to \var{y}, listed
in reverse order.
\var{p} can be invoked multiple times to find additional objects satisfying
the predicate, if any.
\var{p} returns \scheme{#f} if no more objects matching the predicate
can be found.

\var{p} maintains internal state recording where it has been so that it
can restart at the point of the last found object and not return
the same object twice.
The state can be several times the size of the starting-point object
\var{x} and all that is reachable from \var{x}.

The interactive inspector provides a convenient interface to the object
finder in the form of \scheme{find} and \scheme{find-next} commands.
The \scheme{find} command evaluates its first argument, which should
evaluate to the desired predicate, and treats its second argument, if
present, as the maximum generation, overriding the default.
The starting point \var{x} is the object upon which the
inspector is currently focused.
If an object is found, the inspector's new focus is the found object,
the parent focus (obtainable via the \scheme{up} command) is the first
element in the (reversed) path, the parent's parent is the next element,
and so on up to \var{x}.
The \scheme{find-next} command repeats the last find, as if by an explicit
invocation of the same object finder.

Relocation tables for static code objects are discarded by default, which
prevents object finders from providing accurate results when static code
objects are involved.
That is, they will not find any objects pointed to directly from a code
object that has been promoted to the static generation.
If this is a problem, the command-line argument
\scheme{--retain-static-relocation} can be used to prevent the relocation
tables from being discarded.

\subsection{Object counts (9.1)}

The new procedure \scheme{object-counts} can be used to determine,
for each type of object, the number and size in bytes of objects of
that type in each generation.
Its return value has the following structure:

\schemedisplay
((\var{type} (\var{generation} \var{count} . \var{bytes}) \dots) \dots)
\endschemedisplay

\var{type} is either the name of a primitive type, represented as a
symbol, e.g., \scheme{pair}, or a record-type descriptor (rtd).
\var{generation} is a nonnegative fixnum between 0 and the value
of \scheme{(collect-maximum-generation)}, inclusive, or the symbol
\scheme{static} representing the static generation.
\var{count} and \var{bytes} are nonnegative fixnums.

Object counts are accurate for a generation $n$ immediately after
a collection of generation $n$ or higher if enabled during that
collection.
Object counts are enabled by setting the parameter
\scheme{enable-object-counts} to \scheme{#t}.
The command-line option \scheme{--enable-object-counts} can be used to
set this parameter to \scheme{#t} on startup.
Object counts are not enabled by default since it adds overhead to
garbage collection.

To make the information more useful in the presence of ftype pointers,
the ftype descriptors produced by \scheme{define-ftype} for each
defined ftype now carry the name of the ftype rather than a generic
name like \scheme{ftd-struct}.
(Ftype descriptors are subtypes of record-type descriptors and can appear
as types in the \scheme{object-counts} return value.)

\subsection{Native-eol style is now none (9.1)}

To simplify interaction with tools that naively expose multiple-character
end-of-line sequences such as CRLF as separate characters to the user, the
native end-of-line style (\scheme{native-eol-style}) is now \scheme{none}
on all machine types.
This is an \emph{incompatible change}.

\subsection{Library-requirements options (9.1)}

In previous releases, the \scheme{library-requirements} procedure
returns a list of all libraries required by the specified library,
whether they are needed when the specified library is imported,
visited, or invoked.
While this remains the default behavior, \scheme{library-requirements}
now takes an optional ``options'' argument.
This must be a library-requirements-options enumerations set, i.e., the
value of a \scheme{library-requirements-options} form with some subset of
the options \scheme{import}, \scheme{visit@visit}, \scheme{invoke@visit},
and \scheme{invoke}.  \scheme{import} includes the libraries
that must be imported when the specified library is imported;
\scheme{visit@visit} includes the libraries that must be visited when
the specified library is visited; \scheme{invoke@visit} includes the libraries
that must be invoked when the specified library is visited; and
\scheme{invoke} includes the libraries that must be invoked when
the specified library is invoked.
The default behavior is obtained by supplying a enumeration set containing all
of these options.

\subsection{Nested object size and composition (9.1)}

Two new procedures, \scheme{compute-size} and
\scheme{compute-composition}, can be used to determine the
size and make-up of nested objects with the heap.

Both take an object and an optional generation.
The generation must be a fixnum between 0 and the value of
\scheme{(collect-maximum-generation)}, inclusive, or the symbol static.
It defaults to the value of \scheme{(collect-maximum-generation)}.

\scheme{compute-size} returns the number of bytes occupied by the object
and everything to which it points, ignoring objects in generations older
than the specified generation.

\scheme{compute-composition} returns an association list giving the
number and number of bytes of each type of object that the specified
object is constructed from, ignoring objects in generations older than
the specified generation.  The association list maps type names (e.g.,
pair and flonum) or record-type descriptors to a pair of fixnums
giving the count and bytes.
Types with zero counts are not included in the list.

A surprising number of objects effectively point indirectly to a large
percentage of all objects in the heap due to the attachment of top-level
environment bindings to symbols, but the generation argument can be used
in combination with explicit calls to collect (with automatic collections
disabled) to measure precisely how much space is allocated to freshly
allocated structures.

When used directly from the REPL with no other threads running,
\scheme{(compute-size (oblist) 'static)} effectively gives the size of
the entire heap, and \scheme{(compute-composition (oblist) 'static)}
effectively gives the composition of the entire heap.

The inspector makes the aggregate size of an object similarly available
through the \scheme{size} inspector-object message and the corresponding
\scheme{size} interactive-inspector command, with the twist that it
does not include objects whose sizes were previously requested in the
same session, making it possible to see the effectively smaller sizes
of what the programmer perceives to be substructures in shared and
cyclic structures.

These procedures potentially allocate a large amount of memory and
so should be used only when the information returned by the
procedure \scheme{object-counts} (see preceding entry) does not suffice.

Relocation tables for static code objects are discarded by default,
which prevents these procedures from providing accurate results when
static code objects are involved.
That is, they will not find any objects pointed to directly from a code
object that has been promoted to the static generation.
If accurate sizes and compositions for static code objects are
required, the command-line argument \scheme{--retain-static-relocation}
can be used to prevent the relocation tables from being discarded.

\subsection{Showing expander and optimizer output (9.1)}

When the parameter \scheme{expand-output} is set to a textual output
port, the output of the expander is printed to the port as a side effect
of running \scheme{compile}, \scheme{interpret}, or any of the file
compiling primitives, e.g., \scheme{compile-file} or
\scheme{compile-library}.
Similarly, when the parameter \scheme{expand/optimize-output} is set to a
textual output port, the output of the source optimizer is printed.

\subsection{Undefined-variable warnings (9.1)}

When \scheme{undefined-variable-warnings} is set to \scheme{#t}, the
compiler issues a warning message whenever it cannot determine that
a variable bound by \scheme{letrec}, \scheme{letrec*}, or an internal
definition will not be referenced before it is defined.
The default value is \scheme{#f}.

Regardless of the setting of this parameter, the compiler inserts code
to check for the error, except at optimize level 3.
The check is fairly inexpensive and does not typically inhibit inlining
or other optimizations.
In code that must be carefully tuned, however, it is sometimes useful
to reorder bindings or make other changes to eliminate the checks.
Enabling this warning can facilitate this process.

The checks are also visible in the output of \scheme{expand/optimize}.

\subsection{Detecting accidental use of generative record types (9.1)}

When the new boolean parameter \scheme{require-nongenerative-clause}
is set to \scheme{#t}, a \scheme{define-record-type} without a
\scheme{nongenerative} clause is treated as a syntax error.
This allows the programmer to detect accidental use of generative
record types.
Generative record types are rarely useful and are less efficient
than nongenerative types, since generative record types require the
construction of a record-type-descriptor each time a
\scheme{define-record-type} form is evaluated rather than once,
at compile time.
To support the rare need for a generative record type while still
allowing accidental generativity to be detected,
\scheme{define-record-type} has been extended to allow a generative
record type to be explicitly declared with a \scheme{nongenerative}
clause with \scheme{#f} for the uid, i.e., \scheme{(nongenerative #f)}.

\subsection{Improved support for cross compilation (9.1)}

Cross-compilation support has been improved in two ways: (1) it is
now possible to cross-compile a library and import it later in a
separate process for cross-compilation of dependent libraries, and
(2) the code produced for the target machine when cross compiling is no
longer less efficient than code produced natively on the target
machine.

\subsection{Linux ARMv6 (32-bit) support (9.1)}

Support for running {\ChezScheme} on ARMv6 processors running Linux
has been added, with machine type arm32le (32-bit nonthreaded).
C~code intended to be linked with these versions of the system
should be compiled using the GNU C~compiler's \scheme{-m32} option.

\subsection{Source information in ftype ref/set! error messages (9.0)}

When available at compile time, source information is now included
in run-time error messages produced when \scheme{ftype-&ref},
\scheme{ftype-ref}, \scheme{ftype-set!}, and the locked ftype
operations are handed invalid inputs, e.g., ftype pointers of some
unexpected type, RHS values of some unexpected type, or improper
indices.

\subsection{\protect\scheme{compile-to-port} top-level-program dependencies (9.0)}

When passed a single \scheme{top-level-program} form,
\scheme{compile-to-port} now returns a list of the libraries the
top-level program requires at run time, as with \scheme{compile-program}.
Otherwise, the return value is unspecified.

\subsection{Better feedback for record-type mismatches (9.0)}

When \scheme{make-record-type} or \scheme{make-record-type-descriptor}
detect an incompatibility between two record types with the same
UID, the resulting error messages provide more information to
describe the mismatch, i.e., whether the parent, fields, flags, or
mutability differ.

\subsection{\protect\scheme{enable-cross-library-optimization} parameter (9.0)}

When a library is compiled, information is stored with the object
code to enable propagation of constants and inlining of procedures
defined in the library into dependent libraries.
The new parameter \scheme{enable-cross-library-optimization}, whose
value defaults to \scheme{#t}, can be set to \scheme{#f} to prevent
this information from being stored and disable the corresponding
optimizations.
This might be done to reduce the size of the object files or to
reduce the potential for exposure of near-source information via
the object file.

\subsection{Stripping object files (9.0)}

The new procedure \scheme{strip-fasl-file} allows the removal of
source information of various sorts from a compiled object (fasl) file
produced by \scheme{compile-file} or one of the other file compiling
procedures.
It also allows removal of library visit code, i.e., the code
required to compile (but not run) dependent libraries.

\scheme{strip-fasl-file} accepts three arguments: an input pathname,
and output pathname, and a fasl-strip-options enumeration set,
created by \scheme{fasl-strip-options} with zero or more of the
following options.

\begin{description}
\item[\scheme{inspector-source}:]
Strip inspector source information.

\item[\scheme{source-annotations}:]
Strip source annotations.

\item[\scheme{profile-source}:]
Strip source file and character position information from profiled
code objects.

\item[\scheme{library-visit-code}:]
This strips library visit code from compiled libraries.
\end{description}

\subsection{Ftype array bound of zero (9.0)}

The bound of an ftype array can now be zero and, when zero, is
treated as unbounded in the sense that no run-time upper-bound
checks are performed for accesses to the array.
This simplifies the creation of ftype arrays whose actual bounds
are determined dynamically.

\subsection{\protect\scheme{compile-profile} no longer implies \protect\scheme{generate-inspector-information} (9.0)}

In previous releases, profile and inspector source information was
gathered and stored together so that compiling with profiling enabled
required that inspector information also be stored with each code object.
This is no longer the case.

\subsection{\protect\scheme{case} now uses \protect\scheme{member} (9.0)}

\scheme{case} now uses \scheme{member} rather than \scheme{memv} for key
comparisons, a generalization that allows \scheme{case} to be used for
strings, lists, vectors, etc., rather than just atomic values.
This adds no overhead when keys are comparable with \scheme{memv},
since the compiler converts calls to \scheme{member} into calls to
\scheme{memv} (or \scheme{memq}, or even individual inline pointer
comparisons) when it can determine the more expensive test is not
required.

The \scheme{case} syntax exported by the \scheme{(rnrs)} and
\scheme{(rnrs base)} libraries still uses \scheme{memv} for
compatibility with the R6RS standard.

\subsection{\protect\scheme{write} and \protect\scheme{display} and foreign addresses (9.0)}

The \scheme{write} and \scheme{display} procedures now recognize
foreign addresses that happen to look like Scheme objects and print
them as \scheme{#<foreign>}; previously, \scheme{write} and
\scheme{display} would attempt to treat the addresses as Scheme
objects, typically leading to invalid memory references.
Some foreign addresses are indistinguishable from fixnums and
still print as fixnums.

\subsection{Profile-directed optimization (9.0)}

Compiled code can be instrumented to gather two kinds of
execution counts, source-level and block-level, via different settings
of the \scheme{compile-profile} parameter.
When \scheme{compile-profile} is set to the symbol \scheme{source}
at compile time, source execution counts are gathered by the generated
code, and when \scheme{compile-profile} is set to \scheme{block},
block execution counts are gathered.
Setting it to \scheme{#f} (the default) disables instrumentation.

Source counts are identical to the source counts gathered by generated
code in previous releases when compiled with
\scheme{compile-profile} set to \scheme{#t}, and \scheme{#t}
can be still be used in place of \scheme{source} for backward
compatibility.
Source counts can be viewed by the programmer at the end of the run
of the generated code via \scheme{profile-dump-list} and
\scheme{profile-dump-html}.

Block counts are per \emph{basic block}.
Basic blocks are individual sequences of straight-line code and are
the building blocks of the machine code generated by the compiler.
Counting the number of times a block is executed is thus equivalent
to counting the number of times the instructions within it are
executed.

There is no mechanism for the programmer to view block counts, but
both block counts and source counts can now be saved after a sample
run of the generated code for use in guiding various optimizations
during a subsequent compilation of the same code.

The source counts can be used by ``profile-aware macros,'' i.e.,
macros whose expansion is guided by profiling information.
A profile-aware macro can use profile information to optimize
the code it produces.
For example, a macro defining an abstract datatype might choose
representations and algorithms based on the frequencies
of its operations.
Similarly, a macro, like \scheme{case}, that performs a set of
disjoint tests might choose to order those tests based on which are
most likely to succeed.
Indeed, the built-in \scheme{case} now does just that.
A new syntactic form, \scheme{exclusive-cond}, abstracts a common
use case for profile-aware macros.

The block counts are used to guide certain low-level optimizations,
such as block ordering and register allocation.

The procedure \scheme{profile-dump-data} writes to a specified file
the profile data collected during the run of a program compiled
with \scheme{compile-profile} set to either \scheme{source} or
\scheme{block}.
It is similar to \scheme{profile-dump-list} or \scheme{profile-dump-html}
but stores the profile data in a machine readable form.

The procedure \scheme{profile-load-data} loads one or more files
previously created by \scheme{profile-dump-data} into an internal
database.

The database associates \emph{weights} with source locations or
blocks, where a weight is a flonum representing the ratio of the
location's count versus the maximum count.
When multiple profile data sets are loaded, the weights for each
location are averaged across the data sets.

The procedure \scheme{profile-query-weight} accepts a source object
and returns the weight associated with the location identified by
the source object, or \scheme{#f} if no weight is associated with
the location.
This procedure is intended to be used by a profile-aware macro on
pieces of its input to optimize code based on profile data previously
stored by \scheme{profile-dump-data} and loaded by
\scheme{profile-load-data}.

The procedure \scheme{profile-clear-data} clears the database.

The new \scheme{exclusive-cond} syntax is similar to \scheme{cond}
except it assumes the tests performed by the clauses are disjoint
and reorders them based on available profiling data.
Because the tests might be reordered, the order in which side effects
of the test expressions occur is undefined.
The built-in \scheme{case} form is implemented in terms of
\scheme{exclusive-cond}.

\subsection{New \protect\scheme{ssize_t} foreign type (9.0)}

A new foreign type, \scheme{ssize_t}, is now supported.
It is the signed analogue of \scheme{size_t}.

\subsection{Guardian representatives (9.0)}

When \scheme{make-guardian} is passed a second, \emph{representative},
argument, the representative is returned from the guardian in place
of the guarded object when the guarded object is no longer accessible.

\subsection{Library reloading on dependency change (9.0)}

A library initially imported from an object file is now reimported from
source when a dependency (another library or include file) has changed
since the library was compiled.

\subsection{Expression-editor filename completion (8.9.5)}

The expression editor now performs filename- rather than
command-completion within string constants.
It looks only at the current line to determine whether the cursor is
within a string constant; this can lead to the wrong kind of command
completion for strings that cross line boundaries.

\subsection{New lock mechanisms and elimination of old lock mechanism (8.9.5)}

The built in ftype \scheme{ftype-lock} has been eliminated along
with the corresponding procedures, \scheme{acquire-lock},
\scheme{release-lock}, and \scheme{initialize-lock}.
This is an incompatible change, although defining
\scheme{ftype-lock} and the associated procedures is straightforward
using the forms described below.

The functionality has been replaced and generalized by four new syntactic
forms that operate on lock fields wherever they appear within a foreign
type:

\schemedisplay
(ftype-init-lock! \var{T} (\var{a} ...) \var{e})
(ftype-lock! \var{T} (\var{a} ...) \var{e})
(ftype-spin-lock! \var{T} (\var{a} ...) \var{e})
(ftype-unlock! \var{T} (\var{a} ...) \var{e})
\endschemedisplay

The access chain \scheme{\var{a} \dots} must specify a word-size
integer represented using the native endianness, i.e., a \scheme{uptr}
or \scheme{iptr}.
It is a syntax violation when this is not the case.

For each of the forms, the expression \var{e} is evaluated first
and must evaluate to a ftype pointer \var{p} of type \var{T}.

\scheme{ftype-init-lock!} initializes the specified field of the foreign
object to which \var{p} points, puts the field into the unlocked state,
and returns an unspecified value.

If the field is in the unlocked state, \scheme{ftype-lock!} puts it
into the locked state and returns \scheme{#t}.
If the field is already in the locked state, \scheme{ftype-lock!}
returns \scheme{#f}.

\scheme{ftype-spin-lock!} loops until the lock is in the unlocked
state, then puts it into the locked state and returns an unspecified
value.
\emph{This operation will never return if no other thread or process
unlocks the field, causing interrupts and requests for collection to
be ignored.}

Finally, \scheme{ftype-unlock} puts the field into the unlocked state
(regardless of the current state) and returns an unspecified value.

An additional pair of syntactic forms can be used when just an
atomic increment or decrement is required:

\schemedisplay
(ftype-locked-incr! \var{T} (\var{a} ...) \var{e})
(ftype-locked-decr! \var{T} (\var{a} ...) \var{e})
\endschemedisplay

As for the first set of forms, the access chain \scheme{\var{a} \dots}
must specify a word-size integer represented using the native endianness.

\subsection{\protect\scheme{ftype-pointer-null?}, \protect\scheme{ftype-pointer=?} (8.9.5)}

The new procedure \scheme{ftype-pointer-null?} can be used to compare the
address of its single argument, which must be an ftype pointer, against 0.
It returns \scheme{#t} if the address is 0 and \scheme{#f} otherwise.
Similarly, \scheme{ftype-pointer=?} can be used to compare the
addresses of two ftype-pointer arguments.
It returns \scheme{#t} if the address are the same and \scheme{#f}
otherwise.

These are potentially more efficient than extracting ftype-pointer
addresses first, which might result in bignum allocation for addresses
outside the fixnum range,
although the compiler also now
tries to avoid allocation when the result of a call to
\scheme{ftype-pointer-address} is directly compared with 0 or with the
result of another call to \scheme{ftype-pointer-address}, as described
in Section~\ref{ftpaopt}.

\subsection{\protect\scheme{gensym}'s new optional unique-name argument (8.9.5)}

\scheme{gensym} now accepts a second optional argument, the unique
name to use.
It must be a string and should not be used by any other gensym intended
to be distinct from the new gensym.

\subsection{GC times now maintained with finer granularity (8.9.5)}

In previous releases, collection times as reported by \scheme{statistics}
or printed by \scheme{display-statistics} were gathered internally
with millisecond granularity at each collection, possibly leading to
significant inaccuracies over the course of many collections.
They are now maintained using high-resolution timers with generally
much better accuracy.

\subsection{New time types for tracking collection times (8.9.5)}

New time types \scheme{time-collector-cpu} and \scheme{time-collector-real}
have been added.
When \scheme{current-time} is passed one of these types, a time
object of the specified type is returned and represents the time
(cpu or real) spent during collection.

Previously, this information was available only via the
\scheme{statistics} or \scheme{display-statistics} procedures, and then
with lower precision.

\subsection{New storage-management introspection procedures (8.9.5)}

Three new storage-management introspection procedures have been
added:

\schemedisplay
(collections)
(initial-bytes-allocated)
(bytes-deallocated)
\endschemedisplay

\scheme{collections} returns the number of collections performed so
far by the current Scheme process.

\scheme{initial-bytes-allocated} returns the number of bytes
allocated after loading the boot files and before running any
non-boot user code.

\scheme{bytes-deallocated} returns the total number of bytes
deallocated by the collector.

Previously, this information was available only via the
\scheme{statistics} or \scheme{display-statistics}
procedures.

\subsection{New time-object manipulation procedures (8.9.5)}

Three new procedures for performing arithmetic on time objects have
been added, per SRFI~19:

\schemedisplay
(time-difference \var{t1} \var{t2}) ;=> \var{t3}
(add-duration \var{t1} \var{t2}) ;=> \var{t3}
(subtract-duration \var{t1} \var{t2}) ;=> \var{t3}
\endschemedisplay

\scheme{time-difference} takes two time objects \var{t1} and \var{t2},
which must have the same time type, and returns the result of subtracting
\var{t2} from \var{t1}, represented as a new time object with type
\scheme{time-duration}.
\scheme{add-duration} adds time object \var{t2}, which must be of type
\scheme{time-duration}, to time object \var{t1}, producing a new time object
\var{t3} with the same type as \var{t1}.
\scheme{subtract-duration} subtracts time object \var{t2} which must be
of type \scheme{time-duration}, from time object \var{t1}, producing a new
time object \var{t3} with the same type as \var{t1}.

SRFI~19 also names destructive versions of these operators:

\schemedisplay
(time-difference! \var{t1} \var{t2}) ;=> \var{t3}
(add-duration! \var{t1} \var{t2}) ;=> \var{t3}
(subtract-duration! \var{t1} \var{t2}) ;=> \var{t3}
\endschemedisplay

These are available as well in {\ChezScheme} but are actually
nondestructive, i.e., entirely equivalent to the nondestructive
versions.

\subsection{Better reporting of profile counts (8.9.4, 8.9.5)}

The compiler now collects and reports profile counts for every
source expression that is not determined to be dead either at
compile time or by the time the profile information is obtained via
\scheme{profile-dump-list} or \scheme{profile-dump-html}.
Previously, the compiler suppressed profile counts for constants and
variable references in contexts where the information was likely (though
not guaranteed) to be redundant, and it dropped profile counts for some
forms that were optimized away, such as inlined calls, folded calls,
or useless code.
Furthermore, profile counts now uniformly represent the number of times
a source expression's evaluation was started, which was not always the
case before.

A small related enhancement has been made in the HTML output produced
by \scheme{profile-dump-html}.
Hovering over a source expression now shows, in addition to the count,
the starting position (line number and character) of the source expression
to which the count belongs.
This is useful for identifying when a source expression does not have its
own count but instead inherits the count (and color) from an enclosing
expression.

\subsection{Virtual registers (8.9.4)}

A limited set of \emph{virtual registers} is now supported by the compiler
for use by programs that require high-speed, global, and mutable storage
locations.
Referencing or assigning a virtual register is potentially faster and
never slower than accessing an assignable local or global variable,
and the code sequences for doing so are generally smaller.
Assignment is potentially significantly faster because there is no need
to track pointers from the virtual registers to young objects, as there
is for variable locations that might reside in older generations.
On threaded versions of the system, virtual registers are ``per thread''
and thus serve as thread-local storage in a manner that is less expensive
than thread parameters.

The interface consists of three procedures:

\scheme{(virtual-register-count)} returns the number of virtual registers.
As of this writing, the count is set at 16.  This number is fixed, i.e.,
cannot be changed except by recompiling {\ChezScheme} from source.

\scheme{(set-virtual-register! \var{k} \var{x})} stores \var{x} in virtual
register \var{k}.
\var{k} must be a fixnum between 0 (inclusive) and the value of
\scheme{(virtual-register-count)} (exclusive).

\scheme{(virtual-register \var{k})} returns the value most recently
stored in virtual register \var{k} (on the current thread, in threaded
versions of the system).

To get the fastest possible speed out of the latter two procedures,
\var{k} should be a constant embedded right in the call
(or propagatable via optimization to the call).
To avoid putting these constants in the source code, programmers should
consider using identifier macros to give names to virtual registers, e.g.:

\schemedisplay
(define-syntax foo
  (identifier-syntax
    [id (virtual-register 0)]
    [(set! id e) (set-virtual-register! 0 e)]))
(set! foo 'hello)
foo ;=> hello
\endschemedisplay

Virtual-registers must be treated as an application-level resource, i.e.,
libraries intended to be used by multiple applications should generally
not use virtual registers to avoid conflicts with the applications use of
the registers.

\subsection{24-, 40-, 48-, and 56-bit integer values (8.9.3)}

Support for storing and extracting 24-, 40-, 48-, and 56-bit integers
to and from records, bytevectors, and foreign types (ftypes) has been
added.
For records and ftypes, this is accomplished by declaring a field
to be of type
\scheme{integer-24}, \scheme{unsigned-24},
\scheme{integer-40}, \scheme{unsigned-40},
\scheme{integer-48}, \scheme{unsigned-48},
\scheme{integer-56}, or \scheme{unsigned-56}.
For bytevectors, this is accomplished via the following new
primitives:

\schemedisplay
bytevector-24-ref
bytevector-24-set!
bytevector-40-ref
bytevector-40-set!
bytevector-48-ref
bytevector-48-set!
bytevector-56-ref
bytevector-56-set!
\endschemedisplay

Similarly, support has been added for sending and receiving
24-, 40-, 48-, and 56-bit integers to and from foreign code via
\scheme{foreign-procedure} and \scheme{foreign-callable}.
Arguments and return values of type \scheme{integer-24} and
\scheme{unsigned-24} are passed as 32-bit quantities, while
those of type \scheme{integer-40}, \scheme{unsigned-40},
\scheme{integer-48}, \scheme{unsigned-48}, \scheme{integer-56},
and \scheme{unsigned-56} are passed as 64-bit quantities.

For unpacked ftypes, a 48-bit (6-byte) quantity is aligned
on an even two-byte boundary, while a
24-bit (3-byte), 40-bit (5-byte), or 56-bit (7-byte) quantity
is aligned on an arbitrary byte boundary.

\subsection{New \protect\scheme{pariah} expression (8.9.3)}

A \scheme{pariah} expression:

\schemedisplay
(pariah \var{expr} \var{expr} \dots)
\endschemedisplay

is syntactically similar and semantically equivalent to a begin
expression but tells the compiler that the expressions within are
relatively unlikely to be executed.
This information is currently used by the compiler for prioritizing
allocation of registers to variables and for putting pariah code
out-of-line in an attempt to reduce instruction cache misses for the
remaining code.

A \scheme{pariah} form is generally most usefully wrapped around the
consequent or alternative of an \scheme{if} expression to identify which
is the less likely path.

The compiler implicitly treats as pariah code any code that leads
up to an unconditional call to \scheme{raise}, \scheme{error},
\scheme{errorf}, \scheme{assertion-violation}, etc., so it is not
necessary to wrap a \scheme{pariah} around such a call.

At some point, there will likely be an option for gathering similar
information automatically via profiling.
In the meantime, we are interested in feedback about whether the
mechanism is beneficial and whether the benefit of using the
\scheme{pariah} form outweighs the programming overhead.

\subsection{Improved automatic library recompilation (8.9.2)}

Local imports within a library now trigger automatic recompilation
of the library when the imported library has been recompiled or needs
to be recompiled, in the same manner as imports listed directly in the
importing library's \scheme{library} form.
Changes in include files also trigger automatic recompilation.

(Automatic recompilation of a library is enabled when an import of
the library, e.g., in another library or in a top-level program, is
compiled and the parameter \scheme{compile-imported-libraries} is set
to a true value.)

\subsection{Redundant profile information (8.9.2)}

Profiling information is no longer produced for constants and variable
references where the information is likely to be redundant.
It is still produced in contexts where the counts are likely to differ
from those of the enclosing form, e.g., where a constant or variable
reference occurs in the consequent or alternative of an \scheme{if}
expression.
This change brings the profiling information largely in sync with
Version~8.4.1 and earlier, though Version~8.9.2 retains source information
in a few cases where it is inappropriately discarded by Version~8.4.1's
compiler, and Version~8.9.2 discards source information in a few cases
where the code has been optimized away.

\subsection{New \protect\scheme{compile-to-port} procedure (8.9.2)}

The procedure \scheme{compile-to-port} is like \scheme{compile-port}
but, instead of taking an input port from which it reads expressions
to be compiled, takes a list of expressions to be compiled.
As with \scheme{compile-port}, the second argument must be a binary
output port.

\subsection{Debug levels (8.9.1)}

Newly introduced debug levels control the amount of debugging support
embedded in the code generated by the compiler.
The current debug level is controlled by the parameter
\scheme{debug-level} and must be set when the compiler is run to have
any effect on the generated code.
Valid debug levels are~0, 1, 2, and~3, and the default is~1.
At present, the only difference between debug levels is whether calls to
certain error-producing routines, like \scheme{error}, whether explicit
or as the result of an implicit run-time check (such as the pair check
in \scheme{car}), are treated as tail calls even when not in tail position.
At debug levels 0 and 1, they are treated as tail calls, and at debug
levels 2 and 3, they are treated as nontail calls.
Treating them as tail calls is more efficient, but treating them as
nontail calls leaves more information on the stack, which affects what
can be shown by the inspector.

For example, assume \scheme{f} is defined as follows:

\schemedisplay
(define f
  (lambda (x)
    (unless (pair? x) (error #f "oops"))
    (car x)))
\endschemedisplay

and is called with a non-pair argument, e.g.:

\schemedisplay
(f 3)
\endschemedisplay

If the debug level is 2 or more at the time the definition is compiled,
the call to \scheme{f} will still be on the stack when the exception
is raised by \scheme{error} and will thus be visible to the inspector:

\schemedisplay
> (f 3)
Exception: oops
Type (debug) to enter the debugger.
> (debug)
debug> i
#<continuation in f>                                              : sf
  0: #<continuation in f>
  1: #<system continuation in new-cafe>
#<continuation in f>                                              : s
  continuation:          #<system continuation in new-cafe>
  procedure code:        (lambda (x) (if (...) ...) (car x))
  call code:             (error #f "oops")
  frame and free variables:
  0. x:                  3
\endschemedisplay

On the other hand, if the debug level is 1 (the default) or 0 at the
time the definition of \scheme{f} is compiled, the call to \scheme{f}
will no longer be on the stack:

\schemedisplay
> (f 3)
Exception: oops
Type (debug) to enter the debugger.
> (debug)
debug> i
#<system continuation in new-cafe>                                : sf
  1: #<system continuation in new-cafe>
\endschemedisplay

\subsection{Cost centers (8.9.1)}

Cost centers are used to track the bytes allocated, instructions executed,
and/or cpu time elapsed while evaluating selected sections of code.
Cost centers are created via the procedure \scheme{make-cost-center}, and
costs are tracked via the procedure \scheme{with-cost-center}.

Allocation and instruction counts are tracked only for code instrumented
for that purpose.
This instrumentation is controlled by the \scheme{generate-allocation-counts}
and \scheme{generate-instruction-counts} parameters.
Instrumentation is disabled by default.
Built in procedures are not instrumented, nor is interpreted code or
non-Scheme code.
Elapsed time is tracked only when the optional \scheme{timed?} argument to
\scheme{with-cost-center} is provided and is not false.

The \scheme{with-cost-center} procedure accurately tracks costs, subject
to the caveats above, even when reentered with the same cost center, used
simultaneously in multiple threads, and exited or reentered one or more
times via continuation invocation.

\textbf{thread parameter:} \scheme{generate-allocation-counts}

When this parameter has a true value, the compiler inserts a short sequence of
instructions at each allocation point in generated code to track the amount of
allocation that occurs.
This parameter is initially false.

\textbf{thread parameter:} \scheme{generate-instruction-counts}

When this parameter has a true value, the compiler inserts a short
sequence of instructions in each block of generated code to track the
number of instructions executed by that block.
This parameter is initially false.

\textbf{procedure:} \scheme{(make-cost-center)}

Creates a new \scheme{cost-center} object with all of its recorded costs
set to zero.

\textbf{procedure:} \scheme{(cost-center? \var{obj})}

Returns \scheme{#t} if \var{obj} is a \scheme{cost-center} object, otherwise
returns \scheme{#f}.

\textbf{procedure:} \scheme{(with-cost-center \var{cost-center} \var{thunk})}\\
\textbf{procedure:} \scheme{(with-cost-center \var{timed?} \var{cost-center} \var{thunk})}

This procedure invokes \var{thunk} without arguments and returns its
values.
It also tracks, dynamically, the bytes allocated, instructions executed,
and cpu time elapsed while evaluating the invocation of \var{thunk} and
adds the tracked costs to the cost center's running record of these costs.

Allocation counts are tracked only for code compiled with the parameter
\scheme{generate-allocation-counts} set to true, and
instruction counts are tracked only for code compiled with
\scheme{generate-instruction-counts} set to true.
Cpu time is tracked only if \var{timed?} is provided and not false and
includes cpu time spent in instrumented, uninstrumented, and non-Scheme
code.

\textbf{procedure:} \scheme{(cost-center-instruction-count \var{cost-center})}

This procedure returns instructions executed recorded by
\var{cost-center}.

\textbf{procedure:} \scheme{(cost-center-allocation-count \var{cost-center})}

This procedure returns the bytes allocated recorded by \var{cost-center}.

\textbf{procedure:} \scheme{(cost-center-time \var{cost-center})}

This procedure returns the cpu time recorded by \var{cost-center}.

\textbf{procedure:} \scheme{(reset-cost-center! \var{cost-center})}

This procedure resets the costs recorded by \var{cost-center} to zero.

\subsection{Experimental access to hardware performance counters (8.9.1)}

Two system primitives, \scheme{#%$read-time-stamp-counter} and
\scheme{#%$read-performance-monitoring-counter}, provide access to the
x86 and x86\_64 hardware time-stamp counter register and to the
model-specific performance monitoring registers.

These primitives rely on instructions that might be restricted to run only in
kernel mode, depending on kernel configuration.
The performance monitoring counters must also be configured to enable
monitoring and to specify which event to monitor.
This can be configured only by instructions executed in kernel mode.

\textbf{procedure:} \scheme{(#%$read-time-stamp-counter)}

This procedure returns the current value of the time-stamp counter for
the processor core executing this code.
A general protection fault, which manifests as an invalid memory
reference exception, results if this operation is not permitted by
the operating system.

Since multiple processes might run on the same core between reads of
the time-stamp counter, the counter does not necessarily reflect time
spent only in the current process.
Also, on machines with multiple cores, the executing process might be
swapped to a different core with a different time-stamp counter.

\textbf{procedure:} \scheme{(#%$read-performance-monitoring-counter \var{counter})}

This procedure returns the current value of the model-specific
performance monitoring register specified by \var{counter}.
\var{counter} must be a fixnum and should specify a valid performance
monitoring register.
Allowable values depend on the processor model.
A general protection fault, which manifests as an invalid memory
reference exception, results if this operation is not permitted by
the operating system or if the specified counter does not exist.

In order to get meaningful results, the performance monitoring registers
must be enabled, and the event to be monitored must by configured by
the performance monitoring control register.
This configuration can be done only by code run in kernel mode.

Since multiple processes might run on the same core between reads of
a performance monitoring register, the register does not necessarily reflect
only the activities of the current process.
Also, on machines with multiple cores, the executing process might be
swapped to a different core with its own set of performance monitoring
registers and possibly a different configuration for those registers.

\subsection{New inspector functionality (8.9.1)}

Within the interactive inspector, closure and frame variables can now
be set by name, and the forward (f) and back (b) commands can now be
used to to move among the frames that comprise a continuation.

A new show-local (sl) command can be be used to look at just the local
variables of a stack frame.
This contrasts with the show (s) command, which shows the free variables
of the frame's closure as well.

Errors occurring during inspection, such as attempts to assign immutable
variables, are handled more smoothly than in previous versions.

\subsection{Fasl support for records with non-ptr fields (8.4.1)}

The fasl writer and reader now support records with non-ptr fields,
e.g., integer-32, wchar, etc., allowing constant record instances with
such fields to appear in source code (or be introduced as constants
by macros) into code to be compiled via \scheme{compile-file},
\scheme{compile-library}, \scheme{compile-program},
\scheme{compile-script}, or \scheme{compile-port}.
Ftype-pointer fields are not supported, since storing addresses
in fasl files does not generally make sense.

%-----------------------------------------------------------------------------
\section{Bug Fixes}\label{section:bugfixes}

\subsection{Overflow detection for \protect\scheme{fxsll},
\protect\scheme{fxarithmetic-shift-left}, and
\protect\scheme{fxarithmetic-shift}}

A bug that caused \scheme{fxsll}, \scheme{fxarithmetic-shift-left},
and \scheme{fxarithmetic-shift} to fail to detect overflow in certain
cases was fixed.
[This bug dated back to Version 7.1 or earlier.]

\subsection{Invalid memory reference when \protect\scheme{enum-set-indexer} procedure is not passed a symbol}

A bug that caused the procedure returned by \scheme{enum-set-indexer}
to perform an invalid memory reference when passed an argument that is
not a symbol has been fixed.

\subsection{Storage for inaccessible mutexes and conditions is reclaimed (9.4.1)}

The C heap storage for inaccessible mutexes and conditions is now reclaimed.
[This bug dated back to Version 6.5.]

\subsection{Missing guardian entries when a thread exits (9.4.1)}

A bug that caused guardian entries for a thread to be lost when a
thread exits has been fixed.
[This bug dated back to Version 6.5.]

\subsection{Incorrect code for certain nested \protect\scheme{if} patterns (9.4.1)}

A bug in the source optimizer that produced incorrect code for certain
nested \scheme{if} patterns has been fixed.
For example, the code generated for the following expression:

\schemedisplay
(if (if (if (if (zero? (a)) #f #t) (begin (b) #t) #f)
        (c)
        #f)
    (x)
    (y))
\endschemedisplay

inappropriately evaluated the subexpression \scheme{(b)} when the
subexpression \scheme{(a)} evaluates to 0 and not when \scheme{(a)}
evaluates to 1.
[This bug dated back to Version 9.0.]

\subsection{Leaked or unexpected \protect\scheme{cpvalid-defer} form (9.4.1)}

A bug in the pass of the compiler that inserts valid checks for
\scheme{letrec} and \scheme{letrec*} bindings has been fixed.
The bug resulted in an internal compiler exception with a condition
message regarding a leaked or unexpected \scheme{cpvalid-defer} form.
[This bug dated back to Version 6.9c.]

\subsection{\protect\scheme{string->number} and reader numeric syntax issues (9.4)}

\scheme{string->number} and the reader previously treated all complex
numbers written in polar notation that Chez Scheme cannot represent
exactly as inexact, even with an explicit \scheme{#e} prefix.
For such numbers with the \scheme{#e} prefix, \scheme{string->number}
now returns \scheme{#f} and the reader now raises an exception with
condition type \scheme{&implementation-restriction}.
Both still return an inexact representation for such numbers written without
the \scheme{#e} prefix, even if R6RS requires an exact result, i.e.,
even if they have no decimal point, exponent, or mantissa width.

Ratios with an exponent, like \scheme{1/2e10}, are non-standard and
now cause cause the procedure \scheme{string->number} imported from
\scheme{(rnrs)} to return \scheme{#f}.
When the reader encounters a ratio followed by an exponent while in R6RS
mode (i.e., when reading a library or top-level program and not following
an \scheme{#!chezscheme}, or when following an explicit \scheme{#!r6rs}),
it raises an exception.

Positive or negative zero followed by a large exponent now properly
produces zero rather than an infinity, e.g., \scheme{0e3000} now produces
\scheme{0} rather than \scheme{+inf.0}.

A rounding bug converting some small ratios into floating point numbers,
when those numbers fall into the range of denormalized floats, has
been fixed.
This bug also affected the reading of and conversion of strings into
denormalized floating-point numbers.
[Some of these bugs dated back to Version 3.0.]

\subsection{\protect\scheme{date->time-utc} ignoring zone-offset field (9.4)}

\scheme{date->time-utc} has been fixed to properly take into account the
zone-offset field.
[This bug dated back to Version 8.0.]

\subsection{\protect\scheme{wchar} and \protect\scheme{wchar_t} record field types fail to inline in Windows (9.4)}

On Windows, the source optimizer has been fixed to handle \scheme{wchar} and
\scheme{wchar_t} record field types.

\subsection{path-related procedures cause invalid memory reference with non-string arguments in Windows (9.4)}

On Windows, the path-related procedures now raise an appropriate exception when the path argument is not a string.

\subsection{Mutex acquisition bug (9.4)}

A bug in the handling of mutexes has been fixed.
The bug typically presented as a spurious ``recursively locked'' exception.

\subsection{\protect\scheme{dynamic-wind} mistakenly enabling interrupts (9.3.3)}

A bug causing \scheme{dynamic-wind} to unconditionally enable
interrupts upon a nonlocal exit from the body thunk has been fixed.
Interrupts are now properly enabled only when the optional
\var{critical?} argument is supplied and is not false.
[This bug dated back to Version 6.9c.]

\subsection{Incorrect optimization of various primitives (9.3.1)}

Mistakes in our primitive database that caused the source optimizer
to treat \scheme{append}, \scheme{append!}, \scheme{list*},
\scheme{cons*}, and \scheme{record-type-parent} as always returning
true values have been fixed, along with mistakes that caused the
source optimizer to treat \scheme{null-environment},
\scheme{source-object-bfp}, \scheme{source-object-efp}, and
\scheme{source-object-sfd} as not requiring argument checks.
[This bug dated back to Version 6.0.]

\subsection{Increased allocation ceiling under 32-bit Windows (9.3.1)}

We have worked around a limitation in the number of distinct allocation
areas the Windows VirtualAlloc function permits to be allocated by
allocating fewer, larger chunks of memory, effectively increasing the
maximum size of the heap to the full amount permitted by the operating
system.

\subsection{Syntax errors for \protect\scheme{let} and \protect\scheme{let*} (9.2.1)}

The expander now handles \scheme{let} and \scheme{let*} in such a
way that certain syntax errors previously reported as syntax errors
in \scheme{lambda} are now reported properly as syntax errors in
\scheme{let} or \scheme{let*}.  This includes duplicate identifier
errors for \scheme{let} and errors involving internal definitions
for both \scheme{let} and \scheme{let*}.

\subsection{Dropped \protect\scheme{profile-dump-html} calls (9.0)}

A bug that caused effect-context calls to \scheme{profile-dump-html}
to be dropped at optimize-level 3 has been fixed.
[This bug dated back to Version 7.5.]

\subsection{Proper treatment of imported meta bindings (8.9.3)}

A deficiency in the handling of library dependencies that prevented meta
definitions exported in one library from being used reliably by a macro
defined in another library has been fixed.
Handling imported meta bindings involves tracking
visit-visit-requirements, which for a library \scheme{(A)} is the set of
libraries that must be visited (rather than invoked) when \scheme{(A)}
is visited.
An attempt to assign a meta variable imported from a library now results
in a syntax error.
[This bug dated back to Version 7.9.1.]

\subsection{Reexport of identifiers with properties (8.9.3)}

A bug that prevented an identifier given a property via
\scheme{define-property} from being exported from a library \scheme{(A)},
imported into and reexported from a second library \scheme{(B)}, and
imported from both \scheme{(A)} and \scheme{(B)} into and reexported
from a third library \scheme{(C)} has been fixed.
[This bug dated back to Version 8.1.]

\subsection{Cyclic record-type descriptors (8.4.1)}

The fasl (fast load) format used for compiled files now supports cyclic
record-type descriptors (RTDs), which are produced for recursive ftype
definitions.
Previously, compiling a file containing a recursive ftype definition
and subsequently loading the file resulted in corruption of the ftype
descriptor used to typecheck ftype pointers, potentially leading to
incorrect behavior or invalid memory references.
[This bug dated back to Version 8.2.]

\subsection{Invalid folding of record accesses (8.4.1)}

A bug that caused the optimizer to fold calls to record accessors applied
to a constant value of the wrong type, sometimes resulting in compile-time
invalid memory references or other compile-time errors, has been fixed.
[This bug dated back to Version 8.4.]

\subsection{4GB+ allocation for Windows x86\_64 (8.4.1)}

A bug that prevented objects larger than 4GB to be created under Windows
x86\_64 has been fixed.
[This bug dated back to Version 8.4.]

%-----------------------------------------------------------------------------
\section{Performance Enhancements}\label{section:performance}

\subsection{Improved oblist management (9.3.3)}

As a result of improvements in the handing of the oblist (symbol table),
the storage for a symbol is often reclaimed more quickly after it
becomes inaccessible, less space is set aside for the oblist at
start-up, oblist lookups are faster when the oblist contains a large
number of symbols, and the minimum cost of a maximum-generation
collection has been cut significantly, down from tens of microseconds
to just a handful on contemporary hardware.

\subsection{Reduced maximum-generation collection overhead (9.3.3)}

Various changes in the storage manager have reduced the amount of
extra memory required for managing heap storage and increased the
likelihood that memory can be returned to the O/S as the heap
shrinks.
Returning memory to the O/S is now faster, so the minimum time for
a maximum-generation collection, or any other collection where
release of memory to the O/S is enabled, has been cut.

\subsection{Faster library load times (9.3.1)}

Libraries now load faster at both compile and run time, with more
pronounced improvements when dozens of libraries or more are being
loaded.

\subsection{Partially static record instances (9.3.1)}

The source optimizer now maintains information about partially static
record instances to eliminate field accesses and type checks when a
binding site for a record instance is visible to the access or checking
code.
For example,

\schemedisplay
(let ()
  (import scheme)
  (define-record foo ([immutable ptr a] [immutable ptr b]))
  (define (inc r) (make-foo (foo-a r) (+ (foo-b r) 1)))
  (lambda (x)
    (let* ([r (make-foo 37 x)]
           [r (inc r)]
           [r (inc r)])
      r)))
\endschemedisplay

is reduced by the source optimizer down to:

\schemedisplay
(lambda (x) ($record '#<record type foo> 37 (+ (+ x 1) 1)))
\endschemedisplay

where \scheme{$record} is a low-level primitive for creating record
instances.
That is, the source optimizer eliminates the intermediate record
structures, record references, and type checks, in addition to
creating the record-type descriptor at compile time, eliminating
the record-constructor descriptor, record constructor, and record
accessors produced by expansion of the record definition.

\subsection{More source-optimizer improvements (9.3.1)}

The source optimizer now handles \scheme{apply} with a known-list
final argument, e.g., a constant list or list constructed directly
within the apply operation via \scheme{cons}, \scheme{list}, or
\scheme{list*} (\scheme{cons*}) as if it were an ordinary call,
i.e., without the \scheme{apply} and without the constant list
wrapper or list constructor.
For example:

\schemedisplay
(apply apply apply + (list 1 (cons 2 (list x (cons* 4 '(5 6))))))
\endschemedisplay

folds down to \scheme{(+ 18 x)}.
While not common at the source level, patterns like this can
materialize as the result of other source optimizations,
particularly inlining.

The source optimizer now also reduces applications of \scheme{car} and
\scheme{cdr} to the list-building operators \scheme{cons} and
\scheme{list}, e.g.:

\schemedisplay
(car (cons \var{e_1} \var{e_2})) ;-> (begin \var{e_2} \var{e_1})
(car (list \var{e_1} \var{e_2} \var{e_3})) ;-> (begin \var{e_2} \var{e_3} \var{e_1})
(cdr (list \var{e_1} \var{e_2} \var{e_3})) ;-> (begin \var{e_1} (list \var{e_2} \var{e_3}))
\endschemedisplay

discarding side-effect-free expressions in the \scheme{begin} forms
where appropriate.
It treats similarly calls of \scheme{vector-ref} on \scheme{vector};
\scheme{list-ref} on \scheme{list}, \scheme{list*}, and \scheme{cons*};
\scheme{string-ref} on \scheme{string}; and \scheme{fxvector-ref}
on \scheme{fxvector}, taking care with \scheme{string-ref} and
\scheme{fxvector-ref} not to optimize when doing so might mask an
invalid type of argument to a safe constructor.

Finally, the source optimizer now removes certain unnecessary
\scheme{let} bindings within the constraints of evaluation-order
preservation.
For example,

\schemedisplay
(let ([x \var{e_1}] [y \var{e_2}]) (list (cons x y) 7))
\endschemedisplay

reduces to:

\schemedisplay
(list (cons \var{e_1} \var{e_2}) 7)
\endschemedisplay

Such bindings commonly arise from inlining.  Eliminating them tends
to make the output of \scheme{expand/optimize} more readable.

The impact on performance is minimal, but it can result in smaller
expressions and thus enable more inlining within the same size limits.

\subsection{Improved foreign-pointer address handling (9.3.1)}

Various composed operation on ftypes now avoid allocating
and dereferencing intermediate ftype pointers, i.e., \scheme{ftype-ref},
\scheme{ftype-set!}, \scheme{ftype-init-lock!}, \scheme{ftype-lock!},
\scheme{ftype-unlock!}, \scheme{ftype-spin-lock!},
\scheme{ftype-locked-incr!}, or \scheme{ftype-locked-decr!} applied
directly to the result of \scheme{ftype-ref}, \scheme{ftype-&ref}, or
\scheme{make-ftype-pointer}.

\subsection{New source optimizations (9.2.1)}

The source optimizer does a few new optimizations: it folds
calls to \scheme{symbol->string}, \scheme{string->symbol}, and
\scheme{gensym->unique-string} if the argument is known at compile
time and has the right type; it folds zero-argument calls to
\scheme{vector}, \scheme{string}, \scheme{bytevector}, and
\scheme{fxvector}; and it discards subsumed case-lambda clauses,
e.g., the second clause in
\scheme{(case-lambda [(x . y) \var{e_1}] [(x y) \var{e_2}])}.

\subsection{Reduced stack requirements after large apply (9.2)}

A call to \scheme{apply} with a very long argument list can cause a
large chunk of memory to be allocated for the topmost portion of
the stack.
This space is now reclaimed during the next collection.

\subsection{Improved symbol-hashtables performance (9.2)\label{sec:symbol-hashtable-performance}}

The performance of operations on symbol hashtables has been improved
generally over previous releases by eliminating call overhead for the
hash and equality functions.
Further improvements are possible with the use of the new type-specific
symbol-hashtable operators (Section~\ref{sec:symbol-hashtables}).

\subsection{Reduced library-invocation time, memory consumption (9.1)}

The amount of time required to invoke a library and the amount of memory
occupied by the library when the library is invoked as the result of a
run-time dependency of another library or a top-level program have both
been reduced by ``revisiting'' rather than ``invoking'' the library,
effectively leaving the compile-time information on disk until if and
when it is needed.

\subsection{Discarding relocation tables for static code objects (9.1)}

Unless the command-line parameter \scheme{--retain-static-relocation}
is supplied, the collector now discards relocation tables for code
objects when the code objects are promoted to the static generation,
either at boot time via heap compaction or via a call to \scheme{collect}
with the symbol \scheme{static} as the target generation.
This results in a significant reduction in the memory occupied by the
code object (around 20\% in our tests).

\subsection{Guardian registration (9.1)}

The code to register an object with a guardian is now open-coded, at
the cost of some additional work during the next collection.
The result is a modest net improvement in registration overhead (around
15\% in our tests).
Of potentially greater importance when threaded, each registration no
longer requires synchronization.

\subsection{Generated code improvements (9.1)}

The compiler generates better code in several small ways, resulting
in small decreases in code size and corresponding small
performance improvements in the range of 1--5\% in our tests.

\subsection{Reduced collector overhead for large heaps (9.0)}

In previous releases, a factor in collector performance was the
overall size of the heap (measured both in number of pages and the
amount of virtual memory spanned by the heap).
Through various changes to the data structures used to support the
storage manager, this factor has been eliminated, which can
significantly reduce the cost of collecting a younger generation
with a small number of accessible objects relative to overall heap
size.
In our experiments, the minimum cost of collection on contemporary
hardware exceeded 100 microseconds for heaps of 64MB or more and 5
milliseconds for heaps of 1GB or more.
The minimum cost grew in proportion to the heap size from there.
This is now fixed for all heap sizes at just a few microseconds.

\subsection{Reduced mutation overhead (9.0)}

Improvements in the compiler and storage manager have been made to
reduce the cost of tracking possible pointers from older to younger
generations when objects are mutated.

\subsection{Improved foreign-pointer address handling (8.9.5)\label{ftpaopt}}

Ftype pointers with constant addresses are now created at compile
time, with ftype-pointer address checks optimized away as well.

Bignum allocation overhead is avoided for addresses outside the
fixnum range when the results of two \scheme{ftype-pointer-address}
calls are directly compared or the result of one
\scheme{ftype-pointer-address} call is directly compared with 0.
That is, comparisons like:

\schemedisplay
(= (ftype-pointer-address x) 0)
(= (ftype-pointer-address x) (ftype-pointer-address y))
\endschemedisplay

are effectively optimized to:

\schemedisplay
(ftype-pointer-null? x)
(ftype-pointer=? x y)
\endschemedisplay

This optimization is performed when the comparison procedure is
\scheme{=}, \scheme{eqv?}, or \scheme{equal?} and the arguments
are given in either order.
The optimization is also performed when \scheme{zero?} is applied directly
to the result of \scheme{ftype-pointer-address}.

Bignum allocation overhead is also avoided at optimize-level~3
when \scheme{ftype-pointer-address} is used in combination with
\scheme{make-ftype-pointer} to effect a type cast, as in:

\schemedisplay
(make-ftype-pointer T (ftype-pointer-address x))
\endschemedisplay

Both bignum and ftype-pointer allocation is avoided when the result
of such a cast is used directly as the base pointer in an
\scheme{ftype-ref}, \scheme{ftype-&ref}, \scheme{ftype-set!},
\scheme{ftype-locked-incr!}, \scheme{ftype-locked-decr!},
\scheme{ftype-init-lock!}, \scheme{ftype-lock!}, \scheme{ftype-spin-lock!},
or \scheme{ftype-unlock!} form, as in:

\schemedisplay
(ftype-ref T (fld) (make-ftype-pointer T (ftype-pointer-address x)))
\endschemedisplay

These optimizations do not occur when the calls to
\scheme{ftype-pointer-address} are not nested directly within the outer
form, as when a \scheme{let} binding is used to name the result of the
\scheme{ftype-pointer-address} call, e.g.:

\schemedisplay
(let ([addr (ftype-pointer-address x)]) (= addr 0))
\endschemedisplay

In other places where \scheme{ftype-pointer-address} is used, the compiler
now open-codes the extraction and (if necessary) bignum allocation,
reducing overhead by the cost of a procedure call.

\subsection{Improved performance when profiling (8.9.5)}

In addition to improvements in the tracking of profile counts, the
run-time overhead for gathering profile information has gone down by
5--10\% in our tests and is now typically around 10\% of the total
unprofiled run time.
(Unprofiled code is also slightly faster, but by less than 2\% in
our tests.)

\subsection{New compiler back-end (8.9.1, 8.9.2, 8.9.5)}

Versions starting with 8.9.1 employ a new compiler back end that is
structured as a series of nanopassees and replaces the old linear-time
register allocator with a graph-coloring register allocator.
Compilation with the new back end is substantially slower (up to a factor
of two) than with the old back end, while code generated with the new
back end is faster (14--40\% depending on architecture and optimization
level) in our tests.
These improvements are independent of improvements
resulting from cross-library constant folding and inlining
(Section~\ref{subsection:clcfai}).
The code generated for a specific program might be faster or slower.

\subsection{Open-coding of \protect\scheme{make-guardian} (8.9.4)}

Calls to \scheme{make-guardian} are now open-coded by the compiler to
expose the implicit resulting \scheme{case-lambda} expression so that
calls to the guardian can themselves be inlined, thus reducing the overhead
for registering objects with a guardian and querying the guardian for
resurrected objects.

\subsection{Improved open-coding of \protect\scheme{make-parameter} and \protect\scheme{make-thread-parameter} (8.9.4)}

\scheme{make-parameter} and \scheme{make-thread-parameter}
are now open-coded in all cases to expose the implicit resulting
\scheme{case-lambda} expression.
(They were already open-coded when the second, \emph{filter},
argument was a \scheme{lambda} expression or primitive name.)

\subsection{Cross-library constant folding and inlining (8.9.2)\label{subsection:clcfai}}

The compiler now propagates constants and inlines simple procedures
across library boundaries.
A simple procedure is one that, after optimization of the exporting
library, is smaller than a given threshold, contains no free references
to other bindings in the exporting library, and contains no constants
that cannot be copied without breaking pointer identity.
The size threshold is determined, as for inlining within a library or
other compilation unit, by the parameter \scheme{cp0-score-limit}.
In this case, the size threshold is determined based on the size
\emph{before} inlining rather than the size \emph{after} inlining,
which is often more conservative.
Omitting larger procedures that might generate less code when inlined in
a particular context reduces the amount of information that must be stored
in the exporting library's object code to support cross-library inlining.

One particularly useful benefit of this optimization is that record
predicates, accessors, mutators, and (depending on protocols)
constructors created by a record definition in one library and exported
by another are inlined in the importing library, just as if the record
type were defined in the importing library.

\end{document}