Change the GC so that it can mark and sweep objects in-place, instead
of always copying. This change is helpful for reducing peak memory
use while performing a collection on a large, old heap.
Some non-copying support was already in place for locked objects,
but the new implementation is faster and more general. As an
alternative to locking, the storage manager now provides "immobile"
allocation (currently only for bytevectors, vectors, and boxes),
which allocates an object that won't move but that can be GCed if
it's not referenced. A locked object is an object that has been
immobiled and that is on a global list --- mostly the old,
non-scalable implementation of locked objects brought back, since
immobile objects cover the cases that need to scale.
original commit: aecb7b736cb1d52764c292fa6364a674958dfde3
Replace repetitive C code in "gc.c" and "vfasl.c" with an
implementation using a little "Parenthe-C" language, which is a
somewhat declarative description of object tracing. From that
descrition, we generate different kinds of tracing functions, such as
the copy function or the sweep function.
The little language is still bascially C, just with parentheses and
parameterization that is much better than trying to use the C
preprocessor. (The "mkgc.ss" file includes the compiler from
Parenthe-C to C.)
Besides replacing existing code, we also generate a new traversal to
implement `compute-object-sizes`. Finally, the GC can now perform a
fused `collect` and `compute-object-sizes` in a single traversal.
Also improve the way that locked objects are detected during GC. This
can make a significant difference (on the order of 10-20% for a full
collection) when locked objects are long-lived.
original commit: de1f5c41d729ac75822a1f1e633ec6d042c883dc
For a collect rendezvous, call the collect-notify handler in
the main thread if it is active. A collect-notify handler can
then make sure the main thread is active and try again, if
that's useful to an application.
original commit: 0bc286e81827f029dd02a3627a192edd053b3b91
In the general form of a function call, the return point embeds 4
words of information: offset to the start of the enclosing function,
frame size, live-veriable mask, and multiple-value return address. In
the common case, however, the multiple-value return address is either
the same as the return address or it is a `values-error` library
function, and the frame size and live-variable mask fit into a word
with bits to spare. This patch implements a more compact return point
for that common case, which shrinks the 4 words to 2 and also avoids a
relocation (= 1 more word).
Multiple-value returns are more complex with this change (i.e.,
require more code), since they must check whether the return point is
compact or not. But multiple-value returns are far less common than
function calls, so saving function-call space is a clear win.
Overall, this change tends to reduce code size by about 10% on x86_64.
original commit: 1f53b5eabef966db01086cb32e544bbf8deacfca
The `unlock-object` operation was O(N) with N currently locked objects
--- so, O(N^2) to lock N objects and then unlock them --- because
locked objects were stored in and searched in a global list. Also, GC
was O(N) at any generation with N locked objects across generations,
since every locked object was scanned.
Fix these poblems so that locking and unlocking is practically O(1)
and GC is not poportional to locked objects. More precisely, locking
and unlocking is now O(C) for locking an individual object C times to
be balanced by C unlocks. (Since multiple locks on a single object
is rare, this performance seems good enough.)
The implementation replaces the global list with segment-specific
lists. Backpointers are managed using the general generational
support, so that unmodified, old-generation locked objects do not
need to be swept duing a new-generation collection.
original commit: a57d256ca73a3d507792c471facb7e35afbe88b3
Also adds `get-initial-thread`, since threa values are useful with
`compute-size[-increments]`.
Changes the compiler to inline `weak-pair?` and `ephemeron-pair?`,
since that provides better performance for `compute-size-increments`.
original commit: 57d0cc13f8e932972cba3837b4f54e9c86786091
- when thread_get_room exhausts the local allocation area, it now
goes through a common path with S_get_more_room to allocate a new
local allocation area when appropriate. this can greatly reduce
the use of global allocation (and the number of tc mutex acquires
in threaded builds) when a lot of small objects are allocated by
C code with no intervening Scheme-side allocation or dirty writes.
alloc.c, types.h, externs.h
original commit: 93dfa7674a95837e5a22bc622fecc50b0224f60d
to simplify ($fxu< (most-positive-fixnum) e) => (fx< e 0) so we
don't have any incentive in special casing length checks where
the maximum length happens to be (most-positive-fixnum).
5_4.ss, 5_6.ss, bytevector.ss, cmacros.ss, cp0.ss, cpnanopass.ss,
mkheader.ss, primdata.ss, prims.ss,
fasl.c, gc.c, types.h
root-experr*, patch*
original commit: 9eb63deda025fd4560b54746b21a881c01af46d6
because these fields can be accessed from multiple threads concurrently.
Updated $yield and $thread-check in mats/thread.ms to be more tolerant of timing variability.
original commit: 0a6a1e14e7ecb9e39fa7a10a8584ed2fec24cbf4