Flonum operations like `fltruncate` and `flsin` are implemented by
calling functions from the C library. Unboxing these involves a
generalazation the `foreign-call` intermediate form to handle unboxing
and to work in a non-tail position (especially by telling the register
allocator that caller-saved registers will be trashed). An internal
'atomic convention on a foreign call indicates that no callback into
Scheme is possible, so some setup/teardown (including stashing
callee-saved registers) can be skipped.
original commit: fd89919634d0d5272e046b47bb81bcc66e22a741
This is a follow-up to 276f8da076, where `(%tc-ref cp)` was supposed
to be preserved by moving it into %cp, but intrinisics for bytevector
arguments can kill %cp. Use a temporary to expose things properly to
the register allocator.
original commit: 3a29db06a452e46e69ebcde524b3b9acb435dec3
* Fix calculation of segment index for 32-bit platforms
* Fix allocation of mark-bit and list-bit arrays in certain unusual
cases.
* Fix dirty sweep of records on marked pages that have non-pointer
fields.
* Fix allocation of eveen-sized immobile vectors; a pad word needs to
be cleared.
* Fix and extend the heap checker (which was used to find several of
the other problems).
original commit: 8b5e65f5eafac5aea7394901e1dd2f2fc3ccf2bd
Part of the repair makes it ok to re-sweep an ephemeron, which is more
consistent with evertything else.
original commit: 2c11bb39129b1492108390a704eb08deaa5d6bcc
The C compiler doesn't generate a tail call in a place where I
expected one, and maybe it's better to branch at the call site anyway.
original commit: 70fa8e7f7bd891c548c877cabdd15073aa2aa01b
Change the GC so that it can mark and sweep objects in-place, instead
of always copying. This change is helpful for reducing peak memory
use while performing a collection on a large, old heap.
Some non-copying support was already in place for locked objects,
but the new implementation is faster and more general. As an
alternative to locking, the storage manager now provides "immobile"
allocation (currently only for bytevectors, vectors, and boxes),
which allocates an object that won't move but that can be GCed if
it's not referenced. A locked object is an object that has been
immobiled and that is on a global list --- mostly the old,
non-scalable implementation of locked objects brought back, since
immobile objects cover the cases that need to scale.
original commit: aecb7b736cb1d52764c292fa6364a674958dfde3
When collecting to the maximum generation with object counts enabled,
a structure type would effectively become permanently reachable.
Also, add `bytes-finalized` to report how many bytes were associated
with guardian-based finalization by the most recent collection.
original commit: 852f5e2de95a26d3500321c4d4d732407945a57a
Replace repetitive C code in "gc.c" and "vfasl.c" with an
implementation using a little "Parenthe-C" language, which is a
somewhat declarative description of object tracing. From that
descrition, we generate different kinds of tracing functions, such as
the copy function or the sweep function.
The little language is still bascially C, just with parentheses and
parameterization that is much better than trying to use the C
preprocessor. (The "mkgc.ss" file includes the compiler from
Parenthe-C to C.)
Besides replacing existing code, we also generate a new traversal to
implement `compute-object-sizes`. Finally, the GC can now perform a
fused `collect` and `compute-object-sizes` in a single traversal.
Also improve the way that locked objects are detected during GC. This
can make a significant difference (on the order of 10-20% for a full
collection) when locked objects are long-lived.
original commit: de1f5c41d729ac75822a1f1e633ec6d042c883dc
Only half(!) of the needed space was actually allocated. The extra
space is ony used after a GC, however, and a GC makes the extra room,
so that's why things haven't fallen over completely, but that's more
subtle than intended.
original commit: 3d72bc14b9247d6764809cb651403dbb4063a905
For a collect rendezvous, call the collect-notify handler in
the main thread if it is active. A collect-notify handler can
then make sure the main thread is active and try again, if
that's useful to an application.
original commit: 0bc286e81827f029dd02a3627a192edd053b3b91
This operation effectively allows sending an expression back to a
continuation, instead of just a value. It's the same as Marc Feeley's
`continuation-slice` operation, but adjusted slightly to support
continuation attachments.
original commit: d0e36e72d20a6eaa5d9d8b795da5e77abde75289
While "\44\26\2\f6" currently works as a terminator for non-compressed
fasl streams, the working byte sequence varies as the fasl format
changes. Add "\177" as a simpler and unchanging terminator.
original commit: 332019360491be6cedd2063c9a8056183d764bbb
- added invoke-library
syntax.ss, primdata.ss,
8.ms, root-experr*,
libraries.stex, release_notes.stex
- updated the date
release_notes.stex
- libraries contained within a whole program or library are now
marked pending before their invoke code is run so that invoke
cycles are reported as such rather than as attempts to invoke
while still loading.
compile.ss, syntax.ss, primdata.ss,
7.ms, root-experr*
- the library manager now protects against unbound references
from separately compiled libraries or programs to identifiers
ostensibly but not actually exported by (invisible) libraries
that exist only locally within a whole program. this is done by
marking the invisibility of the library in the library-info and
propagating it to libdesc records; the latter is checked upon
library import, visit, and invoke as well as by verify-loadability.
the import and visit code of each invisible no longer complains
about invisibility since it shouldn't be reachable.
syntax.ss, compile.ss, expand-lang.ss,
7.ms, 8.ms, root-experr*, patch*
- documented that compile-whole-xxx's linearization of the
library initialization code based on static dependencies might
not work for dynamic dependencies.
system.stex
- optimized bignum right shifts so the code (1) doesn't look at
shifted-off bigits if the bignum is positive, since it doesn't
need to know in that case if any bits are set; (2) doesn't look
at shifted-off bigits if the bignum is negative if it determines
that at least one bit is set in the bits shifted off the low-order
partially retained bigit; (3) quits looking, if it must look, for
one bits as soon as it finds one; (4) looks from both ends under
the assumption that set bits, if any, are most likely to be found
toward the high or low end of the bignum rather than just in the
middle; and (5) doesn't copy the retained bigits and then shift;
rather shifts as it copies. This leads to dramatic improvements
when the shift count is large and often significant improvements
otherwise.
number.c,
5_3.ms,
release_notes.stex
- threaded tc argument through to all calls to S_bignum and
S_trunc_rem so they don't have to call get_thread_context()
when it might already have been called.
alloc.c, number.c, fasl.c, print.c, prim5.c, externs.h
- added an expand-primitive handler to partially inline integer?.
cpnanopass.ss
- added some special cases for basic arithmetic operations (+, -, *,
/, quotient, remainder, and the div/div0/mod/mod0 operations) to
avoid doing unnecessary work for large bignums when the result
will be zero (e.g,. multiplying by 0), the same as one of the
inputs (e.g., adding 0 or multiplying by 1), or the additive
inverse of one of the inputs (e.g., subtracting from 0, dividing
by -1). This can have a major beneficial affect when operating
on large bignums in the cases handled. also converted some uses
of / into integer/ where going through the former would just add
overhead without the possibility of optimization.
5_3.ss,
number.c, externs.h, prim5.c,
5_3.ms, root-experr, patch*,
release_notes.stex
- added a queue to hold pending signals for which handlers have
been registered via register-signal-handler so up to 63 (configurable
in the source code) unhandled signals are buffered before the
handler has to start dropping them.
cmacros.ss, library.ss, prims.ss, primdata.ss,
schsig.c, externs.h, prim5.c, thread.c, gc.c,
unix.ms,
system.stex, release_notes.stex
- bytevector-compress now selects the level of compression based
on the compress-level parameter. Prior to this it always used a
default setting for compression. the compress-level parameter
can now take on the new minimum in addition to low, medium, high,
and maximum. minimum is presently treated the same as low
except in the case of lz4 bytevector compression, where it
results in the use of LZ4_compress_default rather than the
slower but more effective LZ4_compress_HC.
cmacros,ss, back.ss,
compress_io.c, new_io.c, externs.h,
bytevector.ms, mats/Mf-base, root-experr*
io.stex, objects.stex, release_notes.stex
original commit: 72d90e4c67849908da900d0b6249a1dedb5f8c7f
A 0 relocation is used by fcallable code as a recognizable cookie, and
its relocations must be preserved.
original commit: 38fb3fdf75cf6540d6bd2568f015af6272d22995
Instead of constaining the use of event-detour so much, make it merely
unlikely that the detour will have to allocate when used in a loop
that otherwise doesn't allocate. We'll only have to allocate if the
available stack space turns out to be too small --- and if we do
allocate, it's not the end of the world.
original commit: f1dbed82df415c18c8304bedcee2ecf4912badc7
Having the trap check allocate is questionable, since it can be
triggered during a loop that otherwise performs no allocation. Also,
on platforms where at most 1 argument is passed in a register, then
sending two arguments to the event handler could potentially need
stack space that isn't there. So, constrain the smaller trap-check
code to cases where no stack space is needed and where no allocation
happens unless the wrong number of arguments are provided.
original commit: 260a7ef5bc0bf851d9848587b0a78bdb4aab59f8
When a proceudre starts with a trap check, move the check to the very
beginning, even before checking the argument count. That way, event
detection can turn into a compact jump to an event handler, instead of
inserting a general call to `$event` in the procedure body.
original commit: 06b12d505698a2378734689370bb9e0f8eda06b9
Fix 'reloc to avoid a crash on static-generation code, and add
'reloc+offset to report an offset for each entry.
original commit: 4d4195044377f9c619cfb46056e365044069d5bc
In the general form of a function call, the return point embeds 4
words of information: offset to the start of the enclosing function,
frame size, live-veriable mask, and multiple-value return address. In
the common case, however, the multiple-value return address is either
the same as the return address or it is a `values-error` library
function, and the frame size and live-variable mask fit into a word
with bits to spare. This patch implements a more compact return point
for that common case, which shrinks the 4 words to 2 and also avoids a
relocation (= 1 more word).
Multiple-value returns are more complex with this change (i.e.,
require more code), since they must check whether the return point is
compact or not. But multiple-value returns are far less common than
function calls, so saving function-call space is a clear win.
Overall, this change tends to reduce code size by about 10% on x86_64.
original commit: 1f53b5eabef966db01086cb32e544bbf8deacfca
loadability without actually loading; also, support for unregistering
guarded objects.
- improved error reporting for library compilation-instance errors:
now including the name of the object file from which the "wrong"
compilation instance was loaded, if it was loaded from (or compiled
to) an object file and the original importing library, if it was
previously loaded from an object file due to a library import.
syntax.ss, 7.ss, interpret.ss,
8.ms, root-experr*
- removed situation and for-input? arguments from $make-load-binary,
since the only consumer always passes 'load and #f.
7.ss,
scheme.c
- $separate-eval now prints the stderr and stdout of the subprocess
to help in diagnosing separate-eval and separate-compile issues.
mat.ss
- added unregister-guardian, which can be used to unregister
the unressurected objects registered with any guardian. guardian?
can be used to distinguish guardian procedures from other objects.
cp0.ss, cmacros.ss, cpnanopass.ss, ftype.ss, primdata.ss,
prims.ss,
gcwrapper.c, prim.c, externs.h,
4.ms, primvars.ms
release_notes.stex
smgmt.stex, threads.stex
- added verify-loadability. given a situation (visit, revisit,
or load) and zero or more pathnames (each of which may be optionally
paired with a library search path), verity-loadability checks
whether the set of object files named by those pathnames and any
additional object files required by library requirements in the
given situation can be loaded together. it raises an exception
in each case where actually attempting to load the files would
raise an exception and additionally in cases where loading files
would result in the compilation or loading of source files in
place of the object files. if the check is successful,
verity-loadability returns an unspecified value. in either case,
although portions of the object files are read, none of the
information read from the object files is retained, and none of
the object code is read, so there are no side effects other than
the file operations and possibly the raising of an exception.
library and program info records are now moved to the top of each
object file produced by one of the file compilation routines,
just after recompile info, with a marker to allow verity-loadability
to stop reading once it reads all such records. this change is
not entirely backward compatible; the repositioning of the records
can be detected by a call to list-library made from a loaded file
before the definition of one or more libraries. it is fully
backward compatible for typical library files that contain a
single library definition and nothing else. adding this feature
required changes to the object-file format and corresponding
changes in the compiler and library manager. it also required
moving cross-library optimization information from library/ct-info
records (which verity-loadability must read) to the invoke-code
for each library (which verity-loadability does not read) to
avoid reading and permanently associating record-type descriptors
in the code with their uids.
compile.ss, syntax.ss, expand-lang.ss, primdata.ss, 7.ss,
7.ms, misc.ms, root-experr*, patch*,
system.stex, release_notes.stex
- fixed a bug that bit only with the compiler compiled at
optimize-level 2: add-library/rt-records was building a library/ct-info
wrapper rather than a library/rt-info wrapper.
compile.ss
- fixed a bug in visit-library that could result in an indefinite
recursion: it was not checking to make sure the call to $visit
actually added compile-time info to the libdesc record. it's not
clear, however, whether the libdesc record can be missing
compile-time information on entry to visit-library, so the code
that calls $visit (and now checks for compile-time information
having been added) might not be reachable. ditto for
revisit-library.
syntax.ss
syntax.ss, primdata.ss,
7.ms, root-experr*, patch*,
system.stex, release_notes.stex
- added some argument-error checks for library-directories and
library-extensions, and fixed up the error messages a bit.
syntax.ss,
7.ms, root-experr*
- compile-whole-program now inserts the program record into the
object file for the benefit of verify-loadability.
syntax.ss,
7.ms, root-experr*
- changed 'loading' import-notify messages to the more precise
'visiting' or 'revisiting' in a couple of places.
syntax.ss,
7.ms, 8.ms
original commit: b911ed47190727b0e1d6a88c0e473d1757accdcd
On x86_64, a POPCNT instruction is usually available, and it can speed
up `fxpopcount` operations by a factor of 2-3.
Since POPCNT isn't always available, code using `fxpopcount` is
compiled to a call to a generic implementation. The linker substitutes
a POPCNT instruction when it determines at runtime that POPCNT is
available.
Some measurements on a 2018 MacBook Pro (2.7 GHz Core i7) using the
program below:
popcnt = this implementation, POPCNT discovered
nocnt = this implementation, POPCNT considered unavailable
optcnt = compile to use POPCNT directly (no linker work)
cpcnt = compile to inlined generic (no linker work, no POPCNT)
Since the generic implementation is always a 64-bit popcount, it's not
as good as an inlined version for `fxpopcount32`, but otherwise the
link-edit approach to POPCNT works well:
fxpopcount fxpopcount32
popcnt: 0.098s
nocnt: 0.284s
optcnt 0.109s [slower means noise?]
cpcnt: 0.279s 0.188s
(optimize-level 3)
(time
(let loop ([v #f] [i 100000000])
(if (fx= i 0)
v
(loop (fxpopcount i) (fx- i 1)))))
original commit: 5f090e509f8fe5edc777ed9f0463b20c2e571336
Instead of using `%` to compute the index into an oblist, use a power
of 2 for the oblist length and bit masking to compute an index. (Maybe
the old hashing function was bad; the current hashing function should
produce good hash-code variation at the level of bits.) Also, make the
oblist array a little sparser to reduce bucket chaining.
original commit: fb87fcb8e47902b80654789d059a25bd4a7a8def
After a bignum computation using temporary thread registers W, U, or V
is complete, clear ther register. (The X and Y registers hold only
small bignums, so clearing them doesn't matter in the same way.)
original commit: a9e11fcf9e86aee5d149764476e1fabfeee12f84
It's not available with musl, either, musl intentionally
doesn't provide a preprocessor test, and we're avoiding
(for now) `configure`-time tests in the style of autoconf.
original commit: a9bfb72027fc83ed6bb690d033bc6fed0629dba7
Use the high bit of a byte to continue instead of the low bit.
That way, ASCII strings look like themselves in uncompressed fasl
form.
original commit: 89a8d24cc051123a7b2b6818c5c4aef144d48797
Uninterned symbols are slightly more expensive to allocate than 0- or
1-argument calls to `gensym`, but they're much cheaper to hash (and
print). They're also more consistently distinct when unfasled, and the
fasled form is determinsitic.
original commit: 3167083008031b1f880e76a6f573563c7d9c888c
The result of `mktime` is -1 for an error. The result is also -1 if
the time is 1 second before the epoch. That's not useful, so ignore
it.
original commit: aa8ca31cef223128fd8ed1abdc76beb31a0e077a