Since Chez Scheme now performs the kind of closure conversion that
Racket does --- ensuring that a closure is not allocated if it is
bound to an identifier that is used only in application positions ---
the variant in schemify is not longer run. The hacky macro-based
lifter in the "rumble" layer can also go.
The lifting pass is still preserved in schemify, because it is still
useful to cify. It's not clear whether interpreter mode (which is used
during macro expansion for compile-time code that doesn't cross a
module boundary) is better off with or without schemify's lift, but
it's gone for now.
Replace the vfasl writer (which was in C) with a new implementation
(in Scheme). The main result is that the vfasl writer can be used in
cross-build mode.
Racket uses the vfasl format for its boot images, because they can
load faster --- cutting the Chez Scheme plus boot files startup time
in half, which saves about 40msec on a typical machine. That's not
enough to matter for something like DrRacket, but it can matter for
small Racket scripts. Formerly, cross builds disabled vfasl
generation.
A vfasl file is roughly an image of code and data as it will appear in
memory, and a relatively fast linking step makes the image work in a
running process. The old implementation was in C because it reused GC
structures and code, treating fasl creation as copying objects into a
vfasl image instead of a new generation. The new implementation is
more like a fasl reader, loading objects into a vfasl image instead of
the live heap. The two implementations are about the same amount of
code and both involve a certain amount of repeated implementation
(i.e., imitating a collection or fasl load), but the Scheme
implementation is more flexible and works for cross compilation.
New `#%app/no-return` and `#%app/value` functions at the Chez Scheme
level allow schemify to communicate that function calls will not
return or will return a single value. The schmeify pass may have this
information because a Racket-level primitive is declared that way
(such as `error` or `raise-argument-error` for no-return, or most
functions for single-valued) or because single-valuedness is inferred.
There's currently no inference for no-return functions, because those
are relatively rare. An `#%app/value` is used by schemify only for
imported, non-inlined functions, since cp0 can already deal with local
functions and primitives.
There's a start here at adapting the "optimize.rktl" test suite for CS
--- and that effort triggered these improvements plus some other
low-hanging fruit. But a lot more is needed to adapt "optimize.rktl"
and to make some additional optimizations happen.
Avoid a global table to register structure procedures, and instead use
a wrapper procedure. At the same time, adjust schemify to more
agressively inline structure operations, which can avoid a significant
performance penalty for local structure types.
Closes#3535
Implement weak and ephemeron generic hashtables, and repair weak and
ephemeron `eqv?` hashtables to be weak on numbers.
Racket's implementation of weak `equal?`-based tables now uses weak
generic tables. Datum interning, which needs a weak `equal?`-based
table and was the bottleneck for unfasling literal data, is much
faster. DrRacket's footprint is 1% smaller.
Use the fixnum-mixing idea of dc82685ce0 on pointers, too.
This change provides a significant improvement on the "hash-mem.rkt"
benchmark, for example, because the allocated objects have a size that
triggers a repeating pattern in the low bits of the allocated address.
Expose machine-level operations that stay within the fixnum range,
which can be useful for things like hash-code computations where it's
ok to just lose bits. Operations like `unsafe-fx+` turn out to do that
already in the current implementation, but with no guarantee (and with
no checking of arguments).
For Racket BC, before this commit, JIT-inlined `fxlshift` incorrectly
handled a negative second argument like `arithmetic-shift` instead of
erroring.
The `DIST_CATALOGS_q` variable was propagated incorrectly, which
mainly affected the `client-compile-any` target that is used for
creating source bundles.
Finally give in and add an option to compile a module as unsafe. This
was going to be easy, since the option already exists at the linklet
level, but it turns out that a lot of plumbing was needed to propagate
the argument, and even more to preserve unsafety with cross-module
inlining.
Macros can effectively conditoinalize their expansion in unsafe mode
by generating the pattern
(if (variable-reference-from-unsafe? (#%variable-reference))
<unsafe variant>
<safe variant>)
The compiler will only keep one of the two variants, because it
promises to optimize `(variable-reference-from-unsafe?
(#%variable-reference))` to a literal boolean. The expander will still
expand both variants, however, so avoid putting code in both variants
that itself can have safety variants.
To make room in the type encoding, remove immutable fxvectors from
Chez Scheme --- which had been added just to go along with immutable
strings, vectors, and bytevectors, but immutable fxvectors do not seem
useful, and they have no counterpart in Racket.
Change the way boot images are sent to Chez Scheme by the Racket CS
wrapper, especially in the case where boot images are embedded in an
executable (which is always true for a distriution build). The revised
approach avoids a little filesystem work, and it may help Chez Scheme
pull bytes in faster.
The collector tries to use roughly the same amount of parallelism as
the number of active threads, but make sure that it doesn't fall back
to an ownership-mangling non-parallel collection if all but one place
happens to be stalled.
Instead of one big lock, have one big lock and one small lock for just
manipulating allocation state. This separation better reflects the
locking needs of parallel collection, where the main collecting thread
holds the one big lock, and helper threads need to take a different
lock while changing allocation state.
This commit also includes changes to cooperate better with LLVM's
thread sanitizer. It doesn't work for parallel collection, but it
seems to work otherwise and helped expose a missing lock and a
questionable use of a global variable.
Move per-thread allocation information to a separate object, so the
needs of the collector are better separated from the thread
representation. This refactoring sets up a change in the collector to
detangle sweeper threads from Scheme threads.
Change the internal parallelism strategy for the GC to record an owner
for each allocated segment of memory, and have the owner be solely
responsible for copying or marking objects of the segment. When
sweeping, a collecting thread handles references to objects that it
owns or that have been copied or marked already, and it asks another
collecting thread to resweep an object that refers to objects owned by
that that thread. At worst, an object ends up being swept by all
collecting threads, one at a time, but that's unlikely for a given
object.
The approach seems likely to scale better than a lock-based approach,
even the one that used a lightweight, CAS-based lock and retries on
lock failure.
All allocation is now thread-local, which recovers a small bit of
performance that was lost when adding thread-local allocation
alongside global allocation.
Parallelism uses thread contexts created by Chez Scheme threads (which
correspond to Racket places and future-running threads), but it
creates its own OS-level threads to perform collection. The number of
collection-helper threads is limited to the number of active Chez
Scheme threads. Only the main "sweep" pass runs in parallel --- that
is, after roots have been traversed, and before weak references and
finalization are handled --- but that's the bulk of collection work.
Also, memory-accounting collections always run as single-threaded.
Improved support for thread-location allocation (and using more
fine-grained locks in fasl reading) may provide a small direct
benefit, but the change is mainly intended as setup for more
parallelism in the collector.
The `flsingle` function takes a flonum and discards precision that
wouldn't fit in a single-flonum. If single-precision arithmetic is
somehow useful, then combine flonum operations with `flsingle` to
discard precision on the result; even on Racket BC, that's likely to
perform better than using generic arithmetic on single flonums.
This is analogous to PLT_SETUP_OPTIONS, but for the `raco pkg update`
step. This is useful for adding an option like `--pull try` to ignore
failures to update local clones.
The top-level makefile now builds Racket CS as `racket` by default.
Use `racket bc` to build Racket BC as `racket`. Use `make both` to
build both CS and BC (the latter with the `bc` suffix) overlayed in a
single build. By using `make both` insted of `make cs` plus `make bc`,
you can avoid redundant package downloads and documentation rendering.
To build Racket BC as `racket`, use `racket bc RACKETBC_SUFFIX=`, but
you must consistently use `RACKETBC_SUFFIX=` with `make` every time.
The `--enable-cs` and `--enable-csdefault` flags now enabled just the
CS build, as long as pb boot files are present or
`--enable-racket=...` is supplied.
Currently, `make` is the same as `make bc`, but the idea is that
`make` can become the same as `make cs` when some other pieces are in
place.
The source of the top-level makefile is now ".makefile", and it's
turned into "Makefile" using "racket/src/makemake.rkt". The
transformation makes non-GNU `make` variants and `nmake` act like GNU
make to propagate variables, which makes abstraction through targets
plus variables (admitedly an abuse of `make`) more reliable and
consistent.
Why abuse `make` this way? Because `make` variants and `nmake` are
similar enough that to constitute a portable scripting language, and
one that conveniently provides a large number of entry points.