Add `record-ptr-offset` so that rktboot can use that instead of
assuming an offest of 1.
Also, introduce `type-untyped` and use it instead of `typemod` when
the intent is to leave an address unchanged by tagging. That makes the
implementation a little clearer, and it reduces the code that would
have to be changed to modify the tagging discipline (e.g., to tag by
adding instead of subtracting from an address).
A reference bytevector holds a mixture of addresses within GCable
objects and foreign addresses, where "address" corresponds to the
payload of a bytevector or flvector object. The GC knows to apply a
suitable offset to the reference, so that object counts as reachable
from a reference bytevector, and the reference bytevector is updated
if the object is relocated during a collection.
With this change, the restriction in Racket CS against passing
non-atomic memory to a foreign function can be lifted. For example,
`(_list i _string)` can be useful as the type of a foreign-call
argument.
Making reference bytevectors a subtype of bytevectors is not an
obvious choice, given that writing to a reference bytevector with
byte-level operations can easily corrupt it. But this choice makes
various things simpler and easier.
Replace lots of mostly-duplicated "Mf-<platform>" and "<platform>.def"
files with just a few "Mf-unix" and and "unix.def" files plus
configuration within "configure" and "workarea". Also change
"version.h" to infer more OS details (as was used for pb, anyway).
This change simplifies setting up configurations for different
platforms, and it makes it easier to share among similar
configurations.
Another try at the commit reverted by 471f18c02d, but this time with a
`SINGLE_BRANCH_FLAG` makefile variable that can be set to turn it off
for the rare system that has Git older than version 1.7.10.
Exposing `$immediate?` as just "immediate" will be useful to cptypes.
Meanwhile, introduce "fixmediate" as the term for a union of "fixnum"
and "immediate" (i.e., values that are not allocated).
The new terminology helps avoid internal inconsistencies, such as the
`Simmediatep` kernel macro meaning "immediate" while the `$immediate?`
primitive meant the union.
The predicate for a seald structure type can be faster than a
predicate for a non-sealed structure type, and Chez Scheme takes
advantage of that opportunity.
The BC JIT could be improved to take advanatge of sealed structure
types, but it isn't.
This commit also fixes CS checking of a supertype for certain shapes
of prefab struct-type declarations.
Instead of keeping a rectord type's ancestry in a vector ordered from
subtype to supertype, keep the vector the other way around, and
include a record type at the end of its own vector. This order makes a
more direct test possible when checking for a known record type,
especially one without a supertype, and it's generally easier to
reason about.
The revised compilation of record predicates trades some speed in one
case for smaller generated code and speed in other cases. The case
that becomes slower is when a predicate succeeds because the record is
an immediate instance of the record type. The cases that go faster are
when a predicate fails for a non-instance record or when a predicate
succeeds for a non-immediate instance. It's possible that an
immediate-instance shortcut is worthwhile on the grounds that it's a
common case for a predicate used as contract, but we opt for simpler
and less code for now, because the difference is small.
Also, add `record-instance?`, which in unsafe mode can skip the check
that its first argument is a record, and cptypes can substitute
`record-instance?` for `record?` in some cases. This change was worked
out independently and earlier by @yjqww6, especially the cptypes part.
Related to #3679
Use a comment trick to make `nmake` see a different default target
than `make`, which will save a small amount of hassle from forgetting
to type `nmake win`.
After pulling in patches that change the Chez Scheme version to 9.5.5
(with not much changing besides the version, since we've pulled other
patches), update the Racket version number to reflect the change to
compiled files.
Change the representation of records to keep an ancestor vector
instead of just a parent, so a record-type predicate (for a non-sealed
type) can be constant-time.
The `fake-installers[-from-built]` target drives a distribution build
like `installers[-from-built]`, but instead of building on clients,
just copies a "README" as an installer.
Since Chez Scheme now performs the kind of closure conversion that
Racket does --- ensuring that a closure is not allocated if it is
bound to an identifier that is used only in application positions ---
the variant in schemify is not longer run. The hacky macro-based
lifter in the "rumble" layer can also go.
The lifting pass is still preserved in schemify, because it is still
useful to cify. It's not clear whether interpreter mode (which is used
during macro expansion for compile-time code that doesn't cross a
module boundary) is better off with or without schemify's lift, but
it's gone for now.
Replace the vfasl writer (which was in C) with a new implementation
(in Scheme). The main result is that the vfasl writer can be used in
cross-build mode.
Racket uses the vfasl format for its boot images, because they can
load faster --- cutting the Chez Scheme plus boot files startup time
in half, which saves about 40msec on a typical machine. That's not
enough to matter for something like DrRacket, but it can matter for
small Racket scripts. Formerly, cross builds disabled vfasl
generation.
A vfasl file is roughly an image of code and data as it will appear in
memory, and a relatively fast linking step makes the image work in a
running process. The old implementation was in C because it reused GC
structures and code, treating fasl creation as copying objects into a
vfasl image instead of a new generation. The new implementation is
more like a fasl reader, loading objects into a vfasl image instead of
the live heap. The two implementations are about the same amount of
code and both involve a certain amount of repeated implementation
(i.e., imitating a collection or fasl load), but the Scheme
implementation is more flexible and works for cross compilation.
New `#%app/no-return` and `#%app/value` functions at the Chez Scheme
level allow schemify to communicate that function calls will not
return or will return a single value. The schmeify pass may have this
information because a Racket-level primitive is declared that way
(such as `error` or `raise-argument-error` for no-return, or most
functions for single-valued) or because single-valuedness is inferred.
There's currently no inference for no-return functions, because those
are relatively rare. An `#%app/value` is used by schemify only for
imported, non-inlined functions, since cp0 can already deal with local
functions and primitives.
There's a start here at adapting the "optimize.rktl" test suite for CS
--- and that effort triggered these improvements plus some other
low-hanging fruit. But a lot more is needed to adapt "optimize.rktl"
and to make some additional optimizations happen.
Avoid a global table to register structure procedures, and instead use
a wrapper procedure. At the same time, adjust schemify to more
agressively inline structure operations, which can avoid a significant
performance penalty for local structure types.
Closes#3535
Implement weak and ephemeron generic hashtables, and repair weak and
ephemeron `eqv?` hashtables to be weak on numbers.
Racket's implementation of weak `equal?`-based tables now uses weak
generic tables. Datum interning, which needs a weak `equal?`-based
table and was the bottleneck for unfasling literal data, is much
faster. DrRacket's footprint is 1% smaller.
Use the fixnum-mixing idea of dc82685ce0 on pointers, too.
This change provides a significant improvement on the "hash-mem.rkt"
benchmark, for example, because the allocated objects have a size that
triggers a repeating pattern in the low bits of the allocated address.
Expose machine-level operations that stay within the fixnum range,
which can be useful for things like hash-code computations where it's
ok to just lose bits. Operations like `unsafe-fx+` turn out to do that
already in the current implementation, but with no guarantee (and with
no checking of arguments).
For Racket BC, before this commit, JIT-inlined `fxlshift` incorrectly
handled a negative second argument like `arithmetic-shift` instead of
erroring.
The `DIST_CATALOGS_q` variable was propagated incorrectly, which
mainly affected the `client-compile-any` target that is used for
creating source bundles.
Finally give in and add an option to compile a module as unsafe. This
was going to be easy, since the option already exists at the linklet
level, but it turns out that a lot of plumbing was needed to propagate
the argument, and even more to preserve unsafety with cross-module
inlining.
Macros can effectively conditoinalize their expansion in unsafe mode
by generating the pattern
(if (variable-reference-from-unsafe? (#%variable-reference))
<unsafe variant>
<safe variant>)
The compiler will only keep one of the two variants, because it
promises to optimize `(variable-reference-from-unsafe?
(#%variable-reference))` to a literal boolean. The expander will still
expand both variants, however, so avoid putting code in both variants
that itself can have safety variants.
To make room in the type encoding, remove immutable fxvectors from
Chez Scheme --- which had been added just to go along with immutable
strings, vectors, and bytevectors, but immutable fxvectors do not seem
useful, and they have no counterpart in Racket.
Change the way boot images are sent to Chez Scheme by the Racket CS
wrapper, especially in the case where boot images are embedded in an
executable (which is always true for a distriution build). The revised
approach avoids a little filesystem work, and it may help Chez Scheme
pull bytes in faster.
The collector tries to use roughly the same amount of parallelism as
the number of active threads, but make sure that it doesn't fall back
to an ownership-mangling non-parallel collection if all but one place
happens to be stalled.
Instead of one big lock, have one big lock and one small lock for just
manipulating allocation state. This separation better reflects the
locking needs of parallel collection, where the main collecting thread
holds the one big lock, and helper threads need to take a different
lock while changing allocation state.
This commit also includes changes to cooperate better with LLVM's
thread sanitizer. It doesn't work for parallel collection, but it
seems to work otherwise and helped expose a missing lock and a
questionable use of a global variable.
Move per-thread allocation information to a separate object, so the
needs of the collector are better separated from the thread
representation. This refactoring sets up a change in the collector to
detangle sweeper threads from Scheme threads.
Change the internal parallelism strategy for the GC to record an owner
for each allocated segment of memory, and have the owner be solely
responsible for copying or marking objects of the segment. When
sweeping, a collecting thread handles references to objects that it
owns or that have been copied or marked already, and it asks another
collecting thread to resweep an object that refers to objects owned by
that that thread. At worst, an object ends up being swept by all
collecting threads, one at a time, but that's unlikely for a given
object.
The approach seems likely to scale better than a lock-based approach,
even the one that used a lightweight, CAS-based lock and retries on
lock failure.
All allocation is now thread-local, which recovers a small bit of
performance that was lost when adding thread-local allocation
alongside global allocation.
Parallelism uses thread contexts created by Chez Scheme threads (which
correspond to Racket places and future-running threads), but it
creates its own OS-level threads to perform collection. The number of
collection-helper threads is limited to the number of active Chez
Scheme threads. Only the main "sweep" pass runs in parallel --- that
is, after roots have been traversed, and before weak references and
finalization are handled --- but that's the bulk of collection work.
Also, memory-accounting collections always run as single-threaded.
Improved support for thread-location allocation (and using more
fine-grained locks in fasl reading) may provide a small direct
benefit, but the change is mainly intended as setup for more
parallelism in the collector.
The `flsingle` function takes a flonum and discards precision that
wouldn't fit in a single-flonum. If single-precision arithmetic is
somehow useful, then combine flonum operations with `flsingle` to
discard precision on the result; even on Racket BC, that's likely to
perform better than using generic arithmetic on single flonums.