Commit Graph

254 Commits

Author SHA1 Message Date
Matthew Flatt
b2f74f014e add AArch64 (aka Arm64) support as tarm64le
original commit: 9964f27f64cc743fd1dbff7418fce940a4291b01
2020-07-09 06:32:41 -06:00
Matthew Flatt
bdd1eaa874 add tarm32le
Besides adding supportt for `__collect-safe` and other repairs,
introduce a write-write fence with the write barrier, which is
intended to avoid one thread using an object created in another thread
before the object's initializing writes are visible.

original commit: 543bd16739c08e5a8f88c470b52db0f23a27d260
2020-06-29 05:55:47 -06:00
Matthew Flatt
9bdc112b4d ppc32: fix icache flush
original commit: d9bf4ebbc5fe32a1d3d35ba096a54e7b78d1f33c
2020-06-22 17:35:47 -06:00
Matthew Flatt
257a29216e update for ppc32
Besides updating for unboxed floating point, the ppc32 build uses a
return register, and the continuation-attachments implementation was
not right for that mode.

original commit: dd2d01fb26ace819c73f258b9b53739f9dda1d34
2020-06-20 07:36:02 -06:00
Matthew Flatt
d1f20019ae unbox more flonum operations
Flonum operations like `fltruncate` and `flsin` are implemented by
calling functions from the C library. Unboxing these involves a
generalazation the `foreign-call` intermediate form to handle unboxing
and to work in a non-tail position (especially by telling the register
allocator that caller-saved registers will be trashed). An internal
'atomic convention on a foreign call indicates that no callback into
Scheme is possible, so some setup/teardown (including stashing
callee-saved registers) can be skipped.

original commit: fd89919634d0d5272e046b47bb81bcc66e22a741
2020-06-13 14:25:52 -06:00
Matthew Flatt
4b322677fa flush instruction cache on vfasl load
original commit: 57a7c47dcf1f602d208d14f51f456edb3e2689ae
2020-06-12 14:41:00 -06:00
Matthew Flatt
23e3597778 fix vfasl for library/C entry 0
original commit: ab36ca79585b69db135b9edeadbc26e9a071f813
2020-06-11 17:24:17 -06:00
Matthew Flatt
6395bd92ff fix foreign-callable handling of bytevector arguments
This is a follow-up to 276f8da076, where `(%tc-ref cp)` was supposed
to be preserved by moving it into %cp, but intrinisics for bytevector
arguments can kill %cp. Use a temporary to expose things properly to
the register allocator.

original commit: 3a29db06a452e46e69ebcde524b3b9acb435dec3
2020-06-06 19:44:40 -06:00
Matthew Flatt
bbbd5a76ac fix vfasl relocation for arm32
original commit: e15c51c2c29aea545fbb4790f36b15002b7a25a5
2020-06-06 14:29:32 -06:00
Matthew Flatt
0adffe2c19 fix psuedo-random state C view for arm32
original commit: 348c1798d88eea3504961effe7953103044e3ee4
2020-06-06 12:16:11 -06:00
Matthew Flatt
a106c50798 gc repairs
* Fix calculation of segment index for 32-bit platforms

 * Fix allocation of mark-bit and list-bit arrays in certain unusual
   cases.

 * Fix dirty sweep of records on marked pages that have non-pointer
   fields.

 * Fix allocation of eveen-sized immobile vectors; a pad word needs to
   be cleared.

 * Fix and extend the heap checker (which was used to find several of
   the other problems).

original commit: 8b5e65f5eafac5aea7394901e1dd2f2fc3ccf2bd
2020-05-15 14:40:55 -06:00
Matthew Flatt
96616baa47 unbreak non-threaded build
original commit: c077acf7dd65bcb397e846c786ac546888b5798a
2020-05-15 07:19:51 -06:00
Paulo Matos
74ee485b21 Ensure that the literal 1 is wide enough for a shift (#23)
Fixes runtime error found by ubsan.
original commit: 65e05772a1ee14d73c368f311e837b00af771a23
2020-05-07 17:34:45 +02:00
Matthew Flatt
c7f4261611 fix ephemerons when dirty and reachable during counting
Part of the repair makes it ok to re-sweep an ephemeron, which is more
consistent with evertything else.

original commit: 2c11bb39129b1492108390a704eb08deaa5d6bcc
2020-04-28 09:02:44 -06:00
Matthew Flatt
a9e37d0548 sync simpler handling of tc U, V, W, X, Y
They apparently don't need to be preserved across a GC.

original commit: 830d176bdaf0c19c44e5f4037da0de621d3d9957
2020-04-26 20:13:54 -06:00
Matthew Flatt
120082f3f9 add list-assuming-immutable?
Build in a Racket-style `list?` using GC cooperation to make recording
the result cheaper.

original commit: 32189af3e4dfc3596fba3163fd1a8295b830448b
2020-04-25 15:33:56 -06:00
Matthew Flatt
7ba7a815b0 tweak copy-vs-mark dispatching
The C compiler doesn't generate a tail call in a place where I
expected one, and maybe it's better to branch at the call site anyway.

original commit: 70fa8e7f7bd891c548c877cabdd15073aa2aa01b
2020-04-24 10:20:50 -06:00
Matthew Flatt
752ee94563 avoid fragmentation at the chunk level
original commit: 5b52a846af7f5d9c030e6dc71f46d83b3f1b8e4c
2020-04-23 17:25:03 -06:00
Matthew Flatt
d755dbc00f cs: fix phantom bytes effect on maximum-memory-bytes
original commit: 78f2c1e3ee1329f44742a23c28a76538eef8cbdd
2020-04-22 16:30:47 -06:00
Matthew Flatt
f53f20b5b9 GC marking (non-copying) mode
Change the GC so that it can mark and sweep objects in-place, instead
of always copying. This change is helpful for reducing peak memory
use while performing a collection on a large, old heap.

Some non-copying support was already in place for locked objects,
but the new implementation is faster and more general. As an
alternative to locking, the storage manager now provides "immobile"
allocation (currently only for bytevectors, vectors, and boxes),
which allocates an object that won't move but that can be GCed if
it's not referenced. A locked object is an object that has been
immobiled and that is on a global list --- mostly the old,
non-scalable implementation of locked objects brought back, since
immobile objects cover the cases that need to scale.

original commit: aecb7b736cb1d52764c292fa6364a674958dfde3
2020-04-22 07:10:02 -06:00
Matthew Flatt
f4de537e1c gc: generate sweep_dirty_object
The `sweep_dirty_intersecting` function still had hand-implemented
sweep cases.

original commit: c51b46b3cc71ed0dbc523071dce3cc496965e0b6
2020-04-18 10:40:15 -06:00
Matthew Flatt
c4ffe39efb fix leak related to object counts
When collecting to the maximum generation with object counts enabled,
a structure type would effectively become permanently reachable.

Also, add `bytes-finalized` to report how many bytes were associated
with guardian-based finalization by the most recent collection.

original commit: 852f5e2de95a26d3500321c4d4d732407945a57a
2020-04-16 16:16:13 -06:00
Matthew Flatt
63baf24ad5 repairs for locking
Fix clearing of locked-object information and copying adjacent pairs.

original commit: 53d092c50c1c24017c52b6e002e6073b81747e09
2020-04-04 16:05:20 -06:00
Matthew Flatt
5458323280 fix segment initialization for new fields
original commit: 90f358a2a33f90d9b64b6750988f679a6fcfcc7d
2020-04-04 12:43:04 -06:00
Matthew Flatt
afebbdd6a9 convert GC to "mkgc.ss" implementation
Replace repetitive C code in "gc.c" and "vfasl.c" with an
implementation using a little "Parenthe-C" language, which is a
somewhat declarative description of object tracing. From that
descrition, we generate different kinds of tracing functions, such as
the copy function or the sweep function.

The little language is still bascially C, just with parentheses and
parameterization that is much better than trying to use the C
preprocessor. (The "mkgc.ss" file includes the compiler from
Parenthe-C to C.)

Besides replacing existing code, we also generate a new traversal to
implement `compute-object-sizes`. Finally, the GC can now perform a
fused `collect` and `compute-object-sizes` in a single traversal.

Also improve the way that locked objects are detected during GC. This
can make a significant difference (on the order of 10-20% for a full
collection) when locked objects are long-lived.

original commit: de1f5c41d729ac75822a1f1e633ec6d042c883dc
2020-04-04 10:21:16 -06:00
Matthew Flatt
8656bbae7e fix ephemeron allocation
Only half(!) of the needed space was actually allocated. The extra
space is ony used after a GC, however, and a GC makes the extra room,
so that's why things haven't fallen over completely, but that's more
subtle than intended.

original commit: 3d72bc14b9247d6764809cb651403dbb4063a905
2020-04-04 10:01:04 -06:00
Matthew Flatt
f828cb1eaa fix emphemeron-key tracking in a segment with locked objects
original commit: 9d1252b176e972f92030599dae0ce159c9d36c5b
2020-04-01 07:53:32 -06:00
Matthew Flatt
de465e4f92 fix vfasl problems
Fix problems with record meta-types and symbol interning interleaved
with vfasl loading.

original commit: 2d98d94b3c4d634ba882f10eaebc627a5d9a1ccd
2020-03-28 08:34:48 -06:00
Matthew Flatt
c920f3953d collect in main thread when active
For a collect rendezvous, call the collect-notify handler in
the main thread if it is active. A collect-notify handler can
then make sure the main thread is active and try again, if
that's useful to an application.

original commit: 0bc286e81827f029dd02a3627a192edd053b3b91
2020-03-23 15:32:00 -06:00
Matthew Flatt
5f57648104 add call-in-continuation
This operation effectively allows sending an expression back to a
continuation, instead of just a value. It's the same as Marc Feeley's
`continuation-slice` operation, but adjusted slightly to support
continuation attachments.

original commit: d0e36e72d20a6eaa5d9d8b795da5e77abde75289
2020-03-12 04:48:39 -06:00
Matthew Flatt
d2961790b0 add fasl terminator
While "\44\26\2\f6" currently works as a terminator for non-compressed
fasl streams, the working byte sequence varies as the fasl format
changes. Add "\177" as a simpler and unchanging terminator.

original commit: 332019360491be6cedd2063c9a8056183d764bbb
2020-03-05 17:05:22 -07:00
Matthew Flatt
5b7f4e2fd8 unbreak Windows build
original commit: 6c062f550486dfb9b25dfc62f6d1a829bbce1d1b
2020-02-22 19:41:02 -07:00
Matthew Flatt
995e53ca71 Merge github.com:cisco/ChezScheme
original commit: 8cf52012e2a7b5928cb2602bb17e0128ae0f2776
2020-02-22 15:18:47 -07:00
dybvig
d0b405ac8b library-manager, numeric, and bytevector-compres improvements
- added invoke-library
    syntax.ss, primdata.ss,
    8.ms, root-experr*,
    libraries.stex, release_notes.stex
- updated the date
    release_notes.stex
- libraries contained within a whole program or library are now
  marked pending before their invoke code is run so that invoke
  cycles are reported as such rather than as attempts to invoke
  while still loading.
    compile.ss, syntax.ss, primdata.ss,
    7.ms, root-experr*
- the library manager now protects against unbound references
  from separately compiled libraries or programs to identifiers
  ostensibly but not actually exported by (invisible) libraries
  that exist only locally within a whole program.  this is done by
  marking the invisibility of the library in the library-info and
  propagating it to libdesc records; the latter is checked upon
  library import, visit, and invoke as well as by verify-loadability.
  the import and visit code of each invisible no longer complains
  about invisibility since it shouldn't be reachable.
    syntax.ss, compile.ss, expand-lang.ss,
    7.ms, 8.ms, root-experr*, patch*
- documented that compile-whole-xxx's linearization of the
  library initialization code based on static dependencies might
  not work for dynamic dependencies.
    system.stex
- optimized bignum right shifts so the code (1) doesn't look at
  shifted-off bigits if the bignum is positive, since it doesn't
  need to know in that case if any bits are set; (2) doesn't look
  at shifted-off bigits if the bignum is negative if it determines
  that at least one bit is set in the bits shifted off the low-order
  partially retained bigit; (3) quits looking, if it must look, for
  one bits as soon as it finds one; (4) looks from both ends under
  the assumption that set bits, if any, are most likely to be found
  toward the high or low end of the bignum rather than just in the
  middle; and (5) doesn't copy the retained bigits and then shift;
  rather shifts as it copies.  This leads to dramatic improvements
  when the shift count is large and often significant improvements
  otherwise.
    number.c,
    5_3.ms,
    release_notes.stex
- threaded tc argument through to all calls to S_bignum and
  S_trunc_rem so they don't have to call get_thread_context()
  when it might already have been called.
    alloc.c, number.c, fasl.c, print.c, prim5.c, externs.h
- added an expand-primitive handler to partially inline integer?.
    cpnanopass.ss
- added some special cases for basic arithmetic operations (+, -, *,
  /, quotient, remainder, and the div/div0/mod/mod0 operations) to
  avoid doing unnecessary work for large bignums when the result
  will be zero (e.g,. multiplying by 0), the same as one of the
  inputs (e.g., adding 0 or multiplying by 1), or the additive
  inverse of one of the inputs (e.g., subtracting from 0, dividing
  by -1).  This can have a major beneficial affect when operating
  on large bignums in the cases handled.  also converted some uses
  of / into integer/ where going through the former would just add
  overhead without the possibility of optimization.
    5_3.ss,
    number.c, externs.h, prim5.c,
    5_3.ms, root-experr, patch*,
    release_notes.stex
- added a queue to hold pending signals for which handlers have
  been registered via register-signal-handler so up to 63 (configurable
  in the source code) unhandled signals are buffered before the
  handler has to start dropping them.
    cmacros.ss, library.ss, prims.ss, primdata.ss,
    schsig.c, externs.h, prim5.c, thread.c, gc.c,
    unix.ms,
    system.stex, release_notes.stex
- bytevector-compress now selects the level of compression based
  on the compress-level parameter.  Prior to this it always used a
  default setting for compression.  the compress-level parameter
  can now take on the new minimum in addition to low, medium, high,
  and maximum.  minimum is presently treated the same as low
  except in the case of lz4 bytevector compression, where it
  results in the use of LZ4_compress_default rather than the
  slower but more effective LZ4_compress_HC.
    cmacros,ss, back.ss,
    compress_io.c, new_io.c, externs.h,
    bytevector.ms, mats/Mf-base, root-experr*
    io.stex, objects.stex, release_notes.stex

original commit: 72d90e4c67849908da900d0b6249a1dedb5f8c7f
2020-02-21 13:48:47 -08:00
Matthew Flatt
745482e3e4 vfasl: repairs for fcallables
A 0 relocation is used by fcallable code as a recognizable cookie, and
its relocations must be preserved.

original commit: 38fb3fdf75cf6540d6bd2568f015af6272d22995
2020-02-20 13:24:47 -07:00
Matthew Flatt
5d45d6dca2 adjust event-detour path again to apply more often
Instead of constaining the use of event-detour so much, make it merely
unlikely that the detour will have to allocate when used in a loop
that otherwise doesn't allocate. We'll only have to allocate if the
available stack space turns out to be too small --- and if we do
allocate, it's not the end of the world.

original commit: f1dbed82df415c18c8304bedcee2ecf4912badc7
2020-02-09 09:43:26 -07:00
Matthew Flatt
baf3bba9de constrain smaller trap-check code to avoid allocation
Having the trap check allocate is questionable, since it can be
triggered during a loop that otherwise performs no allocation. Also,
on platforms where at most 1 argument is passed in a register, then
sending two arguments to the event handler could potentially need
stack space that isn't there. So, constrain the smaller trap-check
code to cases where no stack space is needed and where no allocation
happens unless the wrong number of arguments are provided.

original commit: 260a7ef5bc0bf851d9848587b0a78bdb4aab59f8
2020-02-07 15:27:07 -07:00
Matthew Flatt
d4981dd8c3 less code for trap checks
When a proceudre starts with a trap check, move the check to the very
beginning, even before checking the argument count. That way, event
detection can turn into a compact jump to an event handler, instead of
inserting a general call to `$event` in the procedure body.

original commit: 06b12d505698a2378734689370bb9e0f8eda06b9
2020-02-07 10:56:15 -07:00
Matthew Flatt
27e21e6e7d code inspector: improvements to reloc reporting
Fix 'reloc to avoid a crash on static-generation code, and add
'reloc+offset to report an offset for each entry.

original commit: 4d4195044377f9c619cfb46056e365044069d5bc
2020-01-29 16:22:52 -07:00
Matthew Flatt
26ff90e8e6 more compact return points for function calls
In the general form of a function call, the return point embeds 4
words of information: offset to the start of the enclosing function,
frame size, live-veriable mask, and multiple-value return address. In
the common case, however, the multiple-value return address is either
the same as the return address or it is a `values-error` library
function, and the frame size and live-variable mask fit into a word
with bits to spare. This patch implements a more compact return point
for that common case, which shrinks the 4 words to 2 and also avoids a
relocation (= 1 more word).

Multiple-value returns are more complex with this change (i.e.,
require more code), since they must check whether the return point is
compact or not. But multiple-value returns are far less common than
function calls, so saving function-call space is a clear win.

Overall, this change tends to reduce code size by about 10% on x86_64.

original commit: 1f53b5eabef966db01086cb32e544bbf8deacfca
2020-01-24 19:19:32 -07:00
dybvig
48db0a9405 various library-manager improvements including the ability to verify
loadability without actually loading; also, support for unregistering
guarded objects.
- improved error reporting for library compilation-instance errors:
  now including the name of the object file from which the "wrong"
  compilation instance was loaded, if it was loaded from (or compiled
  to) an object file and the original importing library, if it was
  previously loaded from an object file due to a library import.
    syntax.ss, 7.ss, interpret.ss,
    8.ms, root-experr*
- removed situation and for-input? arguments from $make-load-binary,
  since the only consumer always passes 'load and #f.
    7.ss,
    scheme.c
- $separate-eval now prints the stderr and stdout of the subprocess
  to help in diagnosing separate-eval and separate-compile issues.
    mat.ss
- added unregister-guardian, which can be used to unregister
  the unressurected objects registered with any guardian.  guardian?
  can be used to distinguish guardian procedures from other objects.
    cp0.ss, cmacros.ss, cpnanopass.ss, ftype.ss, primdata.ss,
    prims.ss,
    gcwrapper.c, prim.c, externs.h,
    4.ms, primvars.ms
    release_notes.stex
    smgmt.stex, threads.stex
- added verify-loadability.  given a situation (visit, revisit,
  or load) and zero or more pathnames (each of which may be optionally
  paired with a library search path), verity-loadability checks
  whether the set of object files named by those pathnames and any
  additional object files required by library requirements in the
  given situation can be loaded together.  it raises an exception
  in each case where actually attempting to load the files would
  raise an exception and additionally in cases where loading files
  would result in the compilation or loading of source files in
  place of the object files.  if the check is successful,
  verity-loadability returns an unspecified value.  in either case,
  although portions of the object files are read, none of the
  information read from the object files is retained, and none of
  the object code is read, so there are no side effects other than
  the file operations and possibly the raising of an exception.
  library and program info records are now moved to the top of each
  object file produced by one of the file compilation routines,
  just after recompile info, with a marker to allow verity-loadability
  to stop reading once it reads all such records.  this change is
  not entirely backward compatible; the repositioning of the records
  can be detected by a call to list-library made from a loaded file
  before the definition of one or more libraries.  it is fully
  backward compatible for typical library files that contain a
  single library definition and nothing else.  adding this feature
  required changes to the object-file format and corresponding
  changes in the compiler and library manager.  it also required
  moving cross-library optimization information from library/ct-info
  records (which verity-loadability must read) to the invoke-code
  for each library (which verity-loadability  does not read) to
  avoid reading and permanently associating record-type descriptors
  in the code with their uids.
    compile.ss, syntax.ss, expand-lang.ss, primdata.ss, 7.ss,
    7.ms, misc.ms, root-experr*, patch*,
    system.stex, release_notes.stex
- fixed a bug that bit only with the compiler compiled at
  optimize-level 2: add-library/rt-records was building a library/ct-info
  wrapper rather than a library/rt-info wrapper.
    compile.ss
- fixed a bug in visit-library that could result in an indefinite
  recursion: it was not checking to make sure the call to $visit
  actually added compile-time info to the libdesc record.  it's not
  clear, however, whether the libdesc record can be missing
  compile-time information on entry to visit-library, so the code
  that calls $visit (and now checks for compile-time information
  having been added) might not be reachable.  ditto for
  revisit-library.
    syntax.ss
    syntax.ss, primdata.ss,
    7.ms, root-experr*, patch*,
    system.stex, release_notes.stex
- added some argument-error checks for library-directories and
  library-extensions, and fixed up the error messages a bit.
    syntax.ss,
    7.ms, root-experr*
- compile-whole-program now inserts the program record into the
  object file for the benefit of verify-loadability.
    syntax.ss,
    7.ms, root-experr*
- changed 'loading' import-notify messages to the more precise
  'visiting' or 'revisiting' in a couple of places.
    syntax.ss,
    7.ms, 8.ms

original commit: b911ed47190727b0e1d6a88c0e473d1757accdcd
2020-01-23 10:43:17 -08:00
Matthew Flatt
45381612b2 fix popcount support to work on Windows
Avoid RDI, since it's preserved in the Windows ABI.

original commit: 68b2f597ec67ed8752998807bd0c9fc66667c752
2020-01-11 16:41:46 -07:00
Matthew Flatt
540c58bbe8 use POPCNT instruction when available on x86_64
On x86_64, a POPCNT instruction is usually available, and it can speed
up `fxpopcount` operations by a factor of 2-3.

Since POPCNT isn't always available, code using `fxpopcount` is
compiled to a call to a generic implementation. The linker substitutes
a POPCNT instruction when it determines at runtime that POPCNT is
available.

Some measurements on a 2018 MacBook Pro (2.7 GHz Core i7) using the
program below:

 popcnt = this implementation, POPCNT discovered
 nocnt  = this implementation, POPCNT considered unavailable
 optcnt = compile to use POPCNT directly (no linker work)
 cpcnt  = compile to inlined generic (no linker work, no POPCNT)

Since the generic implementation is always a 64-bit popcount, it's not
as good as an inlined version for `fxpopcount32`, but otherwise the
link-edit approach to POPCNT works well:

            fxpopcount      fxpopcount32
 popcnt:       0.098s
 nocnt:        0.284s
 optcnt        0.109s  [slower means noise?]
 cpcnt:        0.279s         0.188s

 (optimize-level 3)
 (time
  (let loop ([v #f] [i 100000000])
    (if (fx= i 0)
        v
        (loop (fxpopcount i) (fx- i 1)))))

original commit: 5f090e509f8fe5edc777ed9f0463b20c2e571336
2020-01-11 11:04:48 -07:00
Matthew Flatt
81ea967aea add stencil vectors and fxpopcount
original commit: ec766fca869b5e0407c4f54230b72619af73b40b
2020-01-06 05:34:28 -07:00
Matthew Flatt
2efa342323 speed up objlist
Instead of using `%` to compute the index into an oblist, use a power
of 2 for the oblist length and bit masking to compute an index. (Maybe
the old hashing function was bad; the current hashing function should
produce good hash-code variation at the level of bits.) Also, make the
oblist array a little sparser to reduce bucket chaining.

original commit: fb87fcb8e47902b80654789d059a25bd4a7a8def
2020-01-01 15:08:52 -07:00
Matthew Flatt
bbbda808e5 propagate $AR and $ARFLAGS to submodule builds
original commit: 652aed04f243ce4a7f5c71f18d1754952380a479
2019-12-31 07:58:12 -07:00
Matthew Flatt
69444da5a0 clear temporary bignum registers
After a bignum computation using temporary thread registers W, U, or V
is complete, clear ther register. (The X and Y registers hold only
small bignums, so clearing them doesn't matter in the same way.)

original commit: a9e11fcf9e86aee5d149764476e1fabfeee12f84
2019-12-30 07:03:19 -07:00
Matthew Flatt
c8ea435c85 make strings within symbols always immutable
original commit: 7859d16dac7bae6ab836e2200003583dc572deba
2019-12-16 17:11:49 -07:00
Matthew Flatt
f858bec12a skip <xlocale.h> on Linux
It's not available with musl, either, musl intentionally
doesn't provide a preprocessor test, and we're avoiding
(for now) `configure`-time tests in the style of autoconf.

original commit: a9bfb72027fc83ed6bb690d033bc6fed0629dba7
2019-12-11 14:41:07 -07:00
Matthew Flatt
01a40286c2 propagate CC and CPPFLAGS to ZLib and LZ4 builds
original commit: cbb7c5f21a879ee90293c3abf99d344c4fc42b7f
2019-12-09 08:34:50 -07:00