- added invoke-library
syntax.ss, primdata.ss,
8.ms, root-experr*,
libraries.stex, release_notes.stex
- updated the date
release_notes.stex
- libraries contained within a whole program or library are now
marked pending before their invoke code is run so that invoke
cycles are reported as such rather than as attempts to invoke
while still loading.
compile.ss, syntax.ss, primdata.ss,
7.ms, root-experr*
- the library manager now protects against unbound references
from separately compiled libraries or programs to identifiers
ostensibly but not actually exported by (invisible) libraries
that exist only locally within a whole program. this is done by
marking the invisibility of the library in the library-info and
propagating it to libdesc records; the latter is checked upon
library import, visit, and invoke as well as by verify-loadability.
the import and visit code of each invisible no longer complains
about invisibility since it shouldn't be reachable.
syntax.ss, compile.ss, expand-lang.ss,
7.ms, 8.ms, root-experr*, patch*
- documented that compile-whole-xxx's linearization of the
library initialization code based on static dependencies might
not work for dynamic dependencies.
system.stex
- optimized bignum right shifts so the code (1) doesn't look at
shifted-off bigits if the bignum is positive, since it doesn't
need to know in that case if any bits are set; (2) doesn't look
at shifted-off bigits if the bignum is negative if it determines
that at least one bit is set in the bits shifted off the low-order
partially retained bigit; (3) quits looking, if it must look, for
one bits as soon as it finds one; (4) looks from both ends under
the assumption that set bits, if any, are most likely to be found
toward the high or low end of the bignum rather than just in the
middle; and (5) doesn't copy the retained bigits and then shift;
rather shifts as it copies. This leads to dramatic improvements
when the shift count is large and often significant improvements
otherwise.
number.c,
5_3.ms,
release_notes.stex
- threaded tc argument through to all calls to S_bignum and
S_trunc_rem so they don't have to call get_thread_context()
when it might already have been called.
alloc.c, number.c, fasl.c, print.c, prim5.c, externs.h
- added an expand-primitive handler to partially inline integer?.
cpnanopass.ss
- added some special cases for basic arithmetic operations (+, -, *,
/, quotient, remainder, and the div/div0/mod/mod0 operations) to
avoid doing unnecessary work for large bignums when the result
will be zero (e.g,. multiplying by 0), the same as one of the
inputs (e.g., adding 0 or multiplying by 1), or the additive
inverse of one of the inputs (e.g., subtracting from 0, dividing
by -1). This can have a major beneficial affect when operating
on large bignums in the cases handled. also converted some uses
of / into integer/ where going through the former would just add
overhead without the possibility of optimization.
5_3.ss,
number.c, externs.h, prim5.c,
5_3.ms, root-experr, patch*,
release_notes.stex
- added a queue to hold pending signals for which handlers have
been registered via register-signal-handler so up to 63 (configurable
in the source code) unhandled signals are buffered before the
handler has to start dropping them.
cmacros.ss, library.ss, prims.ss, primdata.ss,
schsig.c, externs.h, prim5.c, thread.c, gc.c,
unix.ms,
system.stex, release_notes.stex
- bytevector-compress now selects the level of compression based
on the compress-level parameter. Prior to this it always used a
default setting for compression. the compress-level parameter
can now take on the new minimum in addition to low, medium, high,
and maximum. minimum is presently treated the same as low
except in the case of lz4 bytevector compression, where it
results in the use of LZ4_compress_default rather than the
slower but more effective LZ4_compress_HC.
cmacros,ss, back.ss,
compress_io.c, new_io.c, externs.h,
bytevector.ms, mats/Mf-base, root-experr*
io.stex, objects.stex, release_notes.stex
original commit: 72d90e4c67849908da900d0b6249a1dedb5f8c7f
A 0 relocation is used by fcallable code as a recognizable cookie, and
its relocations must be preserved.
original commit: 38fb3fdf75cf6540d6bd2568f015af6272d22995
In previous versions of Chez Scheme, multiple object files could be
combined by concatinating them into a single file. To support faster
object file loading and loadability verification, recompile information
and information about libraries and top-level programs within an object
file is now placed at the top of the file. The new
concatenate-object-files procedure can be used to combine multiple object
files while moving this information to the top of the combined file.
original commit: d4ef2ad9393578ff3ffe3b712736bc6a4ae7b8eb
Avoid saving a list of per-field vector descriptions when
field names are not going to be relevant and the rest of
the description is easily computed from information that is
alerady available.
original commit: a20e3f305cee3b4a386582dd50cda344a49174c3
Instead of constaining the use of event-detour so much, make it merely
unlikely that the detour will have to allocate when used in a loop
that otherwise doesn't allocate. We'll only have to allocate if the
available stack space turns out to be too small --- and if we do
allocate, it's not the end of the world.
original commit: f1dbed82df415c18c8304bedcee2ecf4912badc7
It's not clear whether there's something wrong with this case or
whether it's exposing a more general problem, but disabling it for
now allows a parallel Racket CS build to proceeed.
original commit: 00f6733e573c068165abc4dbbdb46cdede9f778e
In some procedures, one of the arguments is a function that will surely be called
and the result is the result of the whole expression. These procedures need an
special version of define-specialize that gives more control.
original commit: f2f0401d2b83313e8cb0d5742e89ed098500cbd6
Rewrite the handler of record? and $sealed-record? to make it easier to
understand.
Also, delay the reductions of lambdas in a sequence of arguments. This helps
to reduce for example
(map (lambda (x) (box? b)) (unbox b))
=>
(map (lambda (x) #t) (unbox b))
original commit: 20e478b9280c779e260f5557c2eee74946313a44
Having the trap check allocate is questionable, since it can be
triggered during a loop that otherwise performs no allocation. Also,
on platforms where at most 1 argument is passed in a register, then
sending two arguments to the event handler could potentially need
stack space that isn't there. So, constrain the smaller trap-check
code to cases where no stack space is needed and where no allocation
happens unless the wrong number of arguments are provided.
original commit: 260a7ef5bc0bf851d9848587b0a78bdb4aab59f8
When a proceudre starts with a trap check, move the check to the very
beginning, even before checking the argument count. That way, event
detection can turn into a compact jump to an event handler, instead of
inserting a general call to `$event` in the procedure body.
original commit: 06b12d505698a2378734689370bb9e0f8eda06b9
This is a `$` function because it is defined only for record types
that have pointer-sized fields (i.e., the normal case).
original commit: 47213a7c8450aa52bd18e8f605c02b6c1081eadf
Fix 'reloc to avoid a crash on static-generation code, and add
'reloc+offset to report an offset for each entry.
original commit: 4d4195044377f9c619cfb46056e365044069d5bc
In the general form of a function call, the return point embeds 4
words of information: offset to the start of the enclosing function,
frame size, live-veriable mask, and multiple-value return address. In
the common case, however, the multiple-value return address is either
the same as the return address or it is a `values-error` library
function, and the frame size and live-variable mask fit into a word
with bits to spare. This patch implements a more compact return point
for that common case, which shrinks the 4 words to 2 and also avoids a
relocation (= 1 more word).
Multiple-value returns are more complex with this change (i.e.,
require more code), since they must check whether the return point is
compact or not. But multiple-value returns are far less common than
function calls, so saving function-call space is a clear win.
Overall, this change tends to reduce code size by about 10% on x86_64.
original commit: 1f53b5eabef966db01086cb32e544bbf8deacfca
loadability without actually loading; also, support for unregistering
guarded objects.
- improved error reporting for library compilation-instance errors:
now including the name of the object file from which the "wrong"
compilation instance was loaded, if it was loaded from (or compiled
to) an object file and the original importing library, if it was
previously loaded from an object file due to a library import.
syntax.ss, 7.ss, interpret.ss,
8.ms, root-experr*
- removed situation and for-input? arguments from $make-load-binary,
since the only consumer always passes 'load and #f.
7.ss,
scheme.c
- $separate-eval now prints the stderr and stdout of the subprocess
to help in diagnosing separate-eval and separate-compile issues.
mat.ss
- added unregister-guardian, which can be used to unregister
the unressurected objects registered with any guardian. guardian?
can be used to distinguish guardian procedures from other objects.
cp0.ss, cmacros.ss, cpnanopass.ss, ftype.ss, primdata.ss,
prims.ss,
gcwrapper.c, prim.c, externs.h,
4.ms, primvars.ms
release_notes.stex
smgmt.stex, threads.stex
- added verify-loadability. given a situation (visit, revisit,
or load) and zero or more pathnames (each of which may be optionally
paired with a library search path), verity-loadability checks
whether the set of object files named by those pathnames and any
additional object files required by library requirements in the
given situation can be loaded together. it raises an exception
in each case where actually attempting to load the files would
raise an exception and additionally in cases where loading files
would result in the compilation or loading of source files in
place of the object files. if the check is successful,
verity-loadability returns an unspecified value. in either case,
although portions of the object files are read, none of the
information read from the object files is retained, and none of
the object code is read, so there are no side effects other than
the file operations and possibly the raising of an exception.
library and program info records are now moved to the top of each
object file produced by one of the file compilation routines,
just after recompile info, with a marker to allow verity-loadability
to stop reading once it reads all such records. this change is
not entirely backward compatible; the repositioning of the records
can be detected by a call to list-library made from a loaded file
before the definition of one or more libraries. it is fully
backward compatible for typical library files that contain a
single library definition and nothing else. adding this feature
required changes to the object-file format and corresponding
changes in the compiler and library manager. it also required
moving cross-library optimization information from library/ct-info
records (which verity-loadability must read) to the invoke-code
for each library (which verity-loadability does not read) to
avoid reading and permanently associating record-type descriptors
in the code with their uids.
compile.ss, syntax.ss, expand-lang.ss, primdata.ss, 7.ss,
7.ms, misc.ms, root-experr*, patch*,
system.stex, release_notes.stex
- fixed a bug that bit only with the compiler compiled at
optimize-level 2: add-library/rt-records was building a library/ct-info
wrapper rather than a library/rt-info wrapper.
compile.ss
- fixed a bug in visit-library that could result in an indefinite
recursion: it was not checking to make sure the call to $visit
actually added compile-time info to the libdesc record. it's not
clear, however, whether the libdesc record can be missing
compile-time information on entry to visit-library, so the code
that calls $visit (and now checks for compile-time information
having been added) might not be reachable. ditto for
revisit-library.
syntax.ss
syntax.ss, primdata.ss,
7.ms, root-experr*, patch*,
system.stex, release_notes.stex
- added some argument-error checks for library-directories and
library-extensions, and fixed up the error messages a bit.
syntax.ss,
7.ms, root-experr*
- compile-whole-program now inserts the program record into the
object file for the benefit of verify-loadability.
syntax.ss,
7.ms, root-experr*
- changed 'loading' import-notify messages to the more precise
'visiting' or 'revisiting' in a couple of places.
syntax.ss,
7.ms, 8.ms
original commit: b911ed47190727b0e1d6a88c0e473d1757accdcd
Allow a library-defined function to be inlined when the inlined
expressions refer to other library-defined functions. Since the
library function's body may already have inlined calls, don't allow
further inlining of calls within the inlined code.
This commit also adds `$app/no-inline`, which can be used to prevent
inlining of a function. For consumers other than Racket on Chez
Scheme, probably it would make sense to provide a nicer-looking
syntactic form that expands to use the internal `$app/no-inline`
function.
original commit: 628d57e1bd2e658aad4da97a3e85bda72c38f6ab
On x86_64, a POPCNT instruction is usually available, and it can speed
up `fxpopcount` operations by a factor of 2-3.
Since POPCNT isn't always available, code using `fxpopcount` is
compiled to a call to a generic implementation. The linker substitutes
a POPCNT instruction when it determines at runtime that POPCNT is
available.
Some measurements on a 2018 MacBook Pro (2.7 GHz Core i7) using the
program below:
popcnt = this implementation, POPCNT discovered
nocnt = this implementation, POPCNT considered unavailable
optcnt = compile to use POPCNT directly (no linker work)
cpcnt = compile to inlined generic (no linker work, no POPCNT)
Since the generic implementation is always a 64-bit popcount, it's not
as good as an inlined version for `fxpopcount32`, but otherwise the
link-edit approach to POPCNT works well:
fxpopcount fxpopcount32
popcnt: 0.098s
nocnt: 0.284s
optcnt 0.109s [slower means noise?]
cpcnt: 0.279s 0.188s
(optimize-level 3)
(time
(let loop ([v #f] [i 100000000])
(if (fx= i 0)
v
(loop (fxpopcount i) (fx- i 1)))))
original commit: 5f090e509f8fe5edc777ed9f0463b20c2e571336
Instead of using `%` to compute the index into an oblist, use a power
of 2 for the oblist length and bit masking to compute an index. (Maybe
the old hashing function was bad; the current hashing function should
produce good hash-code variation at the level of bits.) Also, make the
oblist array a little sparser to reduce bucket chaining.
original commit: fb87fcb8e47902b80654789d059a25bd4a7a8def
After a bignum computation using temporary thread registers W, U, or V
is complete, clear ther register. (The X and Y registers hold only
small bignums, so clearing them doesn't matter in the same way.)
original commit: a9e11fcf9e86aee5d149764476e1fabfeee12f84
Try `fxquotient` with a `fx*` check to implement `/` on fixnums.
That's fast enough to be much faster when it works, and only slows
down a more general `/` a little.
original commit: e91430be9b71f4913965db688a15f6d7206b38f3