Includes new `force-host-out?' arg to `compile-to-file'.
When the host and target machines match during
"cross"-compilation (eg. M1 Mac to iOS), we still need to generate
host .so files so that the build works out.
Includes documentation notes about cross-compiling CS for iOS
and makefile improvements.
The changes also include improvements to `raco exe`.
Racket CS cannot currently read fasl files for platforms other than
the host, but `compiler/embed` has to be able to read compiled code in
order to figure out what code needs to be embedded into an output
image and which runtime paths need to be included. This change makes
it so that host code is used to figure all of that information out,
but that code is then replaced by target machine code before it is
written to the output image. The new logic only applies when the
right cross-compilation flags are set (per `cross-multi-compile?`).
A `semaphore-wait` or `semaphore-post` has a shortcut that uses a CAS
operation, which means that a future could affect a semaphore if it's
allowed to take that shortcut. But futures aren't supposed to succeed
in that way, because thread-level synchronization should generally
suspend a future. Disallow the shortcut when in a future.
When recompiling from machine-independent form to an VM- and
platform-specific form, cross-module inlining could fail due to an
module path index being resolved in the wrong mode (loading versus
non-loading).
As a concrete example, "racket/draw/private/bitmap.rkt" tended to be
recompiled in a way that did not inline `_ubyte` as `_uint8`, which in
turn made `ptr-set!` and `ptr-ref` operations much slower, which would
make certain bitmap operations drastically slower.
Related to racket/drracket#350
Although compiling modules really should not write output, one problem
is that `parser-tools/yacc` has long reported shift/reduce conflicts
to stderr.
Only parallel mode was treating stdout/stderr output as failure, which
made `raco setup` inconsistent. Make parallel mode allow output, like
sequential mode does.
Related to #3457
In other words, follow a note on line 77 of "atomic.rkt":
;; There's a small chance that `end-atomic-callback`
;; was set by the scheduler after the check and
;; before we exit atomic mode. Make sure that rare
;; possibility remains ok.
That note was originally written in the context of changes for
futures, but it also applies to plain old thread scheduling, where
`engine-timeout` can be installed as a callback, but it's not allowed
if an engine isn't running.
This PR in a sense reverts f83cec1b04 and attempts to directly fix
the bug that the commit tries to address with an approach similar to the
original one.
The problem with the aforementioned commit is that, Gosper's hack
only works efficiently when the length of `l` is small enough that
the number representation can fit into a fixnum
(so that all bit operations take constant time).
When the length of `l` is large, the number representation could
become a bignum with length proportional to the length of `l`.
This is not ideal because it causes the time complexity of the algorithm
to be `O({|l| choose k} |l|)` instead of `O({|l| choose k} k)`, which
would be a significant performance degradation when `|l|` is much
larger than `k`.
Currently, in Racket BC, the following program:
```
(srcloc->string (make-srcloc "x.rkt" #f #f 90 #f))
(srcloc->string (make-srcloc "x.rkt" #f 80 #f #f))
(srcloc->string (make-srcloc "x.rkt" #f #f #f #f))
(srcloc->string (make-srcloc "x.rkt" #f 80 90 #f))
(srcloc->string (make-srcloc "x.rkt" 70 #f 90 #f))
(srcloc->string (make-srcloc "x.rkt" 70 80 #f #f))
(srcloc->string (make-srcloc "x.rkt" 70 80 90 #f))
```
results in:
```
"x.rkt::90"
"x.rkt::80"
"x.rkt::-1"
"x.rkt::80"
"x.rkt:70:90"
"x.rkt:70:80"
"x.rkt:70:80"
```
This output is very confusing and inconsistent.
When we see "x.rkt::90", we can never be sure if it's a srcloc
whose position is 90, or a srcloc whose column is 90.
The same applies for "x.rkt:70:90".
Moreover, the srloc "x.rkt::-1" is weird and is arguably incorrect
(see #1371).
For CS, the output would sometimes contain `#f`, and that is fixed
here by not trying to add position information when it's not available.
Non-atomic mode is questionable at best with BC, and more so with CS.
With BC, a thread swap at least copies the fragment of the C stack
that is between a foreign call and a callback; that can work with some
C libraries, but it goes wrong often enough that it really shouldn't
be relied on. On the plus side, if it's going to go wrong, at least it
tends to go wrong right away (i.e., something in the C frame tends to
get broken in normal runs).
With CS, the C stack is not captured as part of a thread, and so
non-atomic mode 's even more broken. Non-atomic mode could work if you
know that no other thread is running that can involve foreign calls
and callbacks, but that assumption is usually not a good one. Worse,
when things finally go wrong, it's likely several scheduling steps and
thread interleavings away from the problem (e.g., #3459).
There's a chance that this change will stop some existing programs
from working ok on CS. It's more likely to make existing programs work
as intended on CS; the common case is that atomic mode is technically
needed even though non-atomic mode happened to work in practice with
BC.
I think that in general, `null-terminated?` should admit all patterns
that _could_ match a list, instead of just the ones that _only_ match
a list. That means that `(app f)` patterns and many others should also
be valid, though those are harder to implement, so I haven't tried
those yet.
This change is mostly for making CS consistent with BC, but in the
case of `(object-name #'x)`, it also makes BC consistent with BC
before the macro-expander rewrite.
Closesracket/pconvert#9
Commit 9a3eb15d8b broke atomic-timeout handling. As aresult, for
example, using the scroll thumb on Mac OS could freeze DrRacket as
long as something is running for a canvas refresh.
Change the representation of records to keep an ancestor vector
instead of just a parent, so a record-type predicate (for a non-sealed
type) can be constant-time.
Reluctantly, with intentionally oxymoronic names, and with the key
caveat: using these requires making correct assumptions about Racket's
implementation.
With BC, a related assumption was that `unsafe-set-mcar!` and
`unsafe-set-mcdr!` mutate pairs, but that's not the case with CS. So,
adding these functions supports a kind of portability between BC and
CS.
The UTF-8-ish decoder incorrectly allowed a surrogate pair encoded as
two unpaired surrogates, and it treated an unpaired surrogate at the
of a stream as complete instead of a potential error.
Related to #3578
Related to #1500.
This improves the following aspects of the CI config:
* The config now tracks the current stable version of Racket so
package authors don't have to remember to upgrade the config on new
releases. This is a double-edged sword, because it makes it easy to
use features of new Racket version and potentially break
backwards-compatibility by accident. Running CI against a set of
static Racket versions doesn't have this problem since any such change
would end up failing. Maybe a better change here would be to
interpolate the current `(version)` into the CI file instead.
* The job is now set to fail on the first error it encounters on the
assumption that most packages fail due to internal issues and not due
to mismatches between Racket versions. The intent with this change is
to use fewer resources overall when possible. Additionally, the
packages' dependencies are now validated during the setup step.
* Lastly, the install step now avoids building documentation for
dependencies, which can shave off significant amounts of time.
Calling `rationalize` with `+nan.0` as the second argument was causing
a "no exact representation error." This commit changes it to produce
`+nan.0`.
There was an unexercised set of tests for `rationalize` in the test
suite which, once called, demonstrate the bug.
Those tests also specify that `rationalize` should produce an exact
result when the first argument is exact and the second is an infinity.
That's not what the implementation does; it coerces the result to
inexact. I changed the test cases to match the implementation, which
is consistent with other Schemes (Chez, MIT) and standards (R6RS).
When a {u,s}16vector points to memory that's not bytes (e.g. from ffi)
then referencing or setting the memory results in a Chez error:
```
foreign-set!: unrecognized type 'int16
```
The fix is to change the type argument to `'integer-16` and
`'unsigned-16`.
TTo keep stack alignment correct, the `objc_msgSendSuper_stret`
function needs to be used with a structure return type on i386,
instead of making the implicit return-pointer argument explicit.
(For BC, libffi apparently makes the wrong style work anyway.)
A wrapper to align the stack during activation was dropped if the
return type was `void` for a foreign callable, and a callee-popped
argument was not handled right for a foreign call.
It's not necessarily ok to inline a function wrapper by
`make-wrapper-procedure`, because the wrapper might get mutated. We
could add immutable wrapper procedures, but we can also just revert to
a previous approach for code that needed the optimization.
Currently, this repair matters only for PPC32 Mac OS, which is the
only place where alignment of some primitive atomic type is not the
same as its size.
The non-standard ARM Mac OS ABI doesn't just use a different
convention if the function has varargs: it puts each vararg in a
different place than a non-vararg argument of the same type and
position. So, `foreign-procedure` and `foreign-callable` need to know
where varargs start. A `__varargs` declaration is shorthand for
`(__varargs_after 1)`.
For PPC32 Mac OS, we retain the trick that makes varargs foreign calls
work without a `__varargs` declaration, but `(__varargs_after <n>)`
fixes up callable support --- in the extremely unlikely case that
someone needs general varargs callables on PPC32 Mac OS.
Since Chez Scheme now performs the kind of closure conversion that
Racket does --- ensuring that a closure is not allocated if it is
bound to an identifier that is used only in application positions ---
the variant in schemify is not longer run. The hacky macro-based
lifter in the "rumble" layer can also go.
The lifting pass is still preserved in schemify, because it is still
useful to cify. It's not clear whether interpreter mode (which is used
during macro expansion for compile-time code that doesn't cross a
module boundary) is better off with or without schemify's lift, but
it's gone for now.
This commit fixes two bugs:
* `sync/enable-break` didn't implement the guarantee that an event is
selected or a break exception raised, but not both; the problem was
in the handling of making the break-enable state ignored after
committing to an event
* `sync` didn't cancel asynchronous pending commits when a break is
received at certain points; the bug in `sync/enable-break` masked
this bug for existing test cases
Closes#3574
A seemingly-unintentional choice made the following not behave
as expected:
(define-inline (f x) (+ x 1))
(f (f 2))
because the `(f 2)` was not inlined.
Reported by @mflatt and Liwei Chou.
Replace the vfasl writer (which was in C) with a new implementation
(in Scheme). The main result is that the vfasl writer can be used in
cross-build mode.
Racket uses the vfasl format for its boot images, because they can
load faster --- cutting the Chez Scheme plus boot files startup time
in half, which saves about 40msec on a typical machine. That's not
enough to matter for something like DrRacket, but it can matter for
small Racket scripts. Formerly, cross builds disabled vfasl
generation.
A vfasl file is roughly an image of code and data as it will appear in
memory, and a relatively fast linking step makes the image work in a
running process. The old implementation was in C because it reused GC
structures and code, treating fasl creation as copying objects into a
vfasl image instead of a new generation. The new implementation is
more like a fasl reader, loading objects into a vfasl image instead of
the live heap. The two implementations are about the same amount of
code and both involve a certain amount of repeated implementation
(i.e., imitating a collection or fasl load), but the Scheme
implementation is more flexible and works for cross compilation.
Reading `1.0e45` produced a different (and less precise) result than
`1e35`. The problem was in the reader's fast path for simple flonum
conversions, where it converts the mantissa and exponent separately
and then combines them. 10^44 is not represented exactly as a flonum,
so there's imprecision when multiplicy it by 10 versus multiplying
1e45 by 1.
Closes#3548
Callbacks from C generally need to be in atomic mode, but they don't
need to have interrupts disabled at the Chez Scheme level, because
that disables GC.
Without this change, dragging a scrollbar or resizing the window in
DrRacket would suspend GCs as long as the mouse button is pressed ---
which could allocate arbitrary amounts of memory fairly quickly
meanwhile.
Cross-library inlining is willing to inline a procedure body that
refers to a system primitive, but wasn't willing to propagate a
system primitive directly. Enable that, and use it to simplify
`unsafe-struct` inlining.
Related to #3546
Copy any syntax-original property from the parentheses assodictaed
with a `#%app` made explicit, so that originalness is tracked in
the 'origin property.
New `#%app/no-return` and `#%app/value` functions at the Chez Scheme
level allow schemify to communicate that function calls will not
return or will return a single value. The schmeify pass may have this
information because a Racket-level primitive is declared that way
(such as `error` or `raise-argument-error` for no-return, or most
functions for single-valued) or because single-valuedness is inferred.
There's currently no inference for no-return functions, because those
are relatively rare. An `#%app/value` is used by schemify only for
imported, non-inlined functions, since cp0 can already deal with local
functions and primitives.
There's a start here at adapting the "optimize.rktl" test suite for CS
--- and that effort triggered these improvements plus some other
low-hanging fruit. But a lot more is needed to adapt "optimize.rktl"
and to make some additional optimizations happen.
The new encoding of struct constructors and predicates collided with
the encoding of another kind of procedures --- ones that are
unmarshaled on demand in especially large modules. The resulting
symptom was that `object-name` was broken for on-demand procedures.
Avoid a global table to register structure procedures, and instead use
a wrapper procedure. At the same time, adjust schemify to more
agressively inline structure operations, which can avoid a significant
performance penalty for local structure types.
Closes#3535
A procedures that is a value for a structure-type property was not
always inspected correctly. For example, if such a procedure was the
only one to mutate a module variable, then the variable might not be
detected as mutable.
Since CS doesn't use secondary hash code internally, the
`equal-secondary-hash-code` function wasn't really implemented.
Implement it reasonably for applications that might use it to
implement other data structures.
Testing exposed other problems related to error reporting for a broken
hash-function result and for using values within immutable hash
tables.
Closes#3536
Fixes an issue where the distribution assembler expects these to be
`path?`s.
I originally considered making this change in the ctool command in
`cext-lib` instead, but then I noticed the documented contract for
`assemble-distribution` is `(or/c #f path-string?)`.
AArch64 Mac OS will not run an unsigned executable. But it will happily run an
executable with an "ad hoc" signature, which is little more than a sequence of
SHA-256 hashes of the file content.
Meanwhile, new tests highlight how the `pretty-print` family of
functions is inconsistent with the non `pretty-` variants (worth
changing, considering backward compatbility?).
Currently, the pattern matcher always call `vector->list`
on the input vector if a ddk is detected.
However, when there is exactly one ddk and users wish not to
bind an identifier to the ddk, the whole conversion is not needed.
Instead, we can selectively `vector-ref` the prefix and the suffix
of the vector and match them directly against patterns.
Also strengthen an existing test.
They were being converted to fixnum operations in unsafe mode, but
the intent was to use fixnum operations only when the argument are
known to be fixnums.
The latest Mac OS tools automatically sign code when linking, and
"collect-path.rkt" break that signature by changing the executable.
Avoid that problem by removing the signature, first. (Leave it to
distribution tools to install a new signature.)
Code segments are executable and potentially non-writeable in a thread
--- except when specifically enabled, at which point code segments are
made non-executable for that thread. Non-code segments are always made
non-executable.
Commit 8858b34bd92ac8d2b6511dc9ca17ebfa06a1bd93 ("Add LZ4 support and
use it by default for compressing files") added LZ4 support and the
lz4 submodule, but did not update the NOTICE file, unlike zlib support
is handled.
Implement weak and ephemeron generic hashtables, and repair weak and
ephemeron `eqv?` hashtables to be weak on numbers.
Racket's implementation of weak `equal?`-based tables now uses weak
generic tables. Datum interning, which needs a weak `equal?`-based
table and was the bottleneck for unfasling literal data, is much
faster. DrRacket's footprint is 1% smaller.
Use the fixnum-mixing idea of dc82685ce0 on pointers, too.
This change provides a significant improvement on the "hash-mem.rkt"
benchmark, for example, because the allocated objects have a size that
triggers a repeating pattern in the low bits of the allocated address.
Use data instead of code to shrink ".zo" sizes by 10-30%.
When Racket code contains a literal that cannot be serialized directly
by Chez Scheme (such as a keyword or an immutable string that should
be datum-interned), the old approach was to generate Scheme code to
construct the literal through a lifted `let` binding. To handle paths
associated with procedures, however, Chez Scheme's `fasl-write` had
been extended to allow arbitrary values to be intercepted during fasl
and passed back in to `fasl-read`. Using that strategy for all Racket
literals simplifies the implementation and reduces compiled code. It
also makes closures smaller, while increases the number of
relocations. DrRacket's foorprint shrinks by about 1%, but the main
affect is on disk space for a Racket installation.
Show the machine code that constructs lifted constants for a linket.
Also, add a `--partial-fasl` option that shows fasl content in a rawer
form, which is useful for checking how content is presented and that
nothing is getting lost in other reconstructed views.
The 'os* mode is like 'os, but it provides a more specific result for
Unix variants, such as 'linux on Linux.
The 'os* and 'arch modes together are the information that we've
previously accessed indirectly via `(system-library-subpath #f)`.
Closes#3510
Mixing for sequences did not produce enough variety related to the
length of the sequence. For example, '(0 0) and '(0 0 0) and '(0 0 0
0) had the same hash code.
Allow cptypes to convert a `fx{+,-,*}/wraparound` call to unsafe if it
proves that the arguments are fixnums (unlike, say, `fx+`, where the
possibility of overflow remains).
Change hash-code calculations to use `fx+/wraparound` and
`fxsll/wraparound` instead of `#3%fx+` and `#3%fxsll`. The resulting
code should be the same in the default unsafe compilation mode for the
Racket core, but the `wraparound` versions can be checked in safe
mode.
Also, fill in missing tests in Chez Scheme, which exposed cp0 problems
with `fx-/wraparound`.
Expose machine-level operations that stay within the fixnum range,
which can be useful for things like hash-code computations where it's
ok to just lose bits. Operations like `unsafe-fx+` turn out to do that
already in the current implementation, but with no guarantee (and with
no checking of arguments).
For Racket BC, before this commit, JIT-inlined `fxlshift` incorrectly
handled a negative second argument like `arithmetic-shift` instead of
erroring.
When the thread-scheduler timer fires while a thread is in atomic
mode, the thread could check for breaks even when it shouldn't. Worse,
if the atomic region was to implement terminating a thread, the path
to check for a break could end up ressurecting the thread from the
persspective of `thread-dead?`.
A mutable hash table only uses the lower few bits of a hash code,
because it masks the hash code by [one less than] the power-of-two
size of the bucket array. That truncation interacts badly with the
hashing function for fixnums, which is the identity function; if the
the lower several bits of the fixnum stay the same for many keys and
the upper bits change, then there are many hash collisions --- and
that's a relatively likely distribution.
Fix the mutable hash-table implementations by mixing the hash code to
let higher bits influence the lower bits: xor the high half of the
bits with the lower half (which doesn't lose information, because
xoring again would recover the original number), then do the same for
the high one-fourth of bits in the low half, and then (on a 64-bit
platform) the high one-eighth of the low one-fourth of the bits.
Instead if blaming the way the mutable hash-table implementations only
use the lower bits of a hash code, we could blame the hash function on
fixnums for not performing this kind of mixing. In this patch, though,
we take the view that the hash function's job is to map variation in
its domain to variation in the fixnum hash code, and then the hash
table's job is to use that fixnum effectively. That separation of
responsibilities is now documented with `gen:equal+hash`.
There are also improvements here to the hashing function for bignums
in CS and to the secondary hashung function for fixnums and bignums in
BC.
Thanks to Alex Harsanyi for reporting the problem.
On raw racket clones when compiling with `--enable-racket` (i.e. no
pb), there are `find` warnings that DEST does not exist and
compilation still succeeds.
This change however, ensures that we do not have build warnings and
that the symlinks are properly created.
Finally give in and add an option to compile a module as unsafe. This
was going to be easy, since the option already exists at the linklet
level, but it turns out that a lot of plumbing was needed to propagate
the argument, and even more to preserve unsafety with cross-module
inlining.
Macros can effectively conditoinalize their expansion in unsafe mode
by generating the pattern
(if (variable-reference-from-unsafe? (#%variable-reference))
<unsafe variant>
<safe variant>)
The compiler will only keep one of the two variants, because it
promises to optimize `(variable-reference-from-unsafe?
(#%variable-reference))` to a literal boolean. The expander will still
expand both variants, however, so avoid putting code in both variants
that itself can have safety variants.
For example, if the result of `(when (box? x) (unbox x))` is not used,
then the `(unbox x)` still must be done, because the box might be an
impersonator. In contrast, `(when (box*? x) (unbox* x))` can be
dropped, since `unbox*` is an authentic unbox.
This change applies to unsafe operations like `unsafe-struct-ref`,
too, and applies to struct accessors for non-authentic structure
types.
Racket CS already preserves operations appropriately.
Relevant to #3487
To make room in the type encoding, remove immutable fxvectors from
Chez Scheme --- which had been added just to go along with immutable
strings, vectors, and bytevectors, but immutable fxvectors do not seem
useful, and they have no counterpart in Racket.
Since the reader now treats #\uFEFF as whitespace, adjust the printer to
escape \uFEFFa.
Thanks, Xsmith!
Writing an actual #\uFEFF character seems bad, even escaped, but
Racket's symbol syntax doesn't have a kind of escaping that uses
different characters than the one to be represented.
Closes#3486
Streamline rktio byte-result copying (main improvement), use fixnum
arithmetic more consistently (minor improvement), and change
`in-bytes`, etc., to avoid some checks in unsafe mode (intermediate
improvement).
When bytes within a Windows path cannot be converted using
`bytes->string/locale` (i.e., when the bytes do not fit a UTF-8
encoding), then leave the bytes alone, instead of triggering a failure
from `bytes->string/locale`.
Fixing this bug uncovered others: `string-locale-downcase` did not
work on an empty byte string on a little-endian machine, and
`in-bytes` and similar reported range errors in terms of "vectors".
Change the way boot images are sent to Chez Scheme by the Racket CS
wrapper, especially in the case where boot images are embedded in an
executable (which is always true for a distriution build). The revised
approach avoids a little filesystem work, and it may help Chez Scheme
pull bytes in faster.
Builing Racket CS on a 64-bit platform requires a little more than 1.5
GB of memory due to whole-program optimization of the Racket core
immplementation. Add a `--disable-wpo` configure option, which keeps
memory use below 0.5 MB to provide the option of building Racket in a
more constrained environment.
Instead of selecting flags through a mixture of `configure` and
"c/Mf-*", determine defaults in `configure`. If `CFLAGS` is provided,
then the defaults are not used. Flags needed for threaded mode are an
exception, and `LDFLAGS` and `LIBS` are expanded even if supplied ---
unless `--disable-auto-flags` is specified.
Closes#3467
The `--sync-docs-only` flag is intended for contexts like pkg-build as
a backstop against unintended and time-consuming activity when the
intent is to assemble documentation.
While sending an absolute path to a foreign library is usually the
right idea to ensure that it's relative to `(current-directory)`, the
`_path` FFI type should not do that automatically --- because
sometimes it's useful to send a relative path to a foreign library,
but most because it hasn't been defined that way in BC.
In building up conversion tables, information from "SpecialCasing.txt"
data was incorrectly merged with data from "UnicodeData.txt" so that
not-quite-so-special casings were fumbled. For Unicode 7.0, the bug
turned out to affect only string downcasing of U+1E9E.
Closes#3475
Thanks to Xsmith!
Changes to use xlocale fixed problems with places and locale settings,
but it caused the initial process-wide locale to stay with the C
locale, which is bad for things like libedit. To get the good part of
the old behavior bback, set the process-wide locale to "" on startup.
When an exact (non-integer) rational is reconstructed from a place
message, normalization could involve bignum operations --- and those
operations might use a cache, but the cache should not refer to the
pages that are specific to a place message being constructed.
Normalization isn't necessary, since the parts were already in a
rational, so the repair is just it skip it.
Relevant to #3456
A table of subprocess handles to finalize was not place-local as it
should have been. The same was true of a table of resolved IP
addresses to finalize.
Related to #3456
The Racket variant of Chez Scheme includes special treatment of deeply
nested structures to avoid a C-stack overflow on unfasl, but the
relevant callbacks had gotten mangled.
Closes#3454
When a would-be loop is called in a would-be loop that turns out not
to be a loop due to an intervening non-loop layer, the outer would-be
loop was not detected as a non-loop.
When the mutability decision on a variable is delayed, but then the
variable is discovered to be mutable before the delay is triggered,
then mutability information could get lost.
Mutability analysis may determine that a `set!` is dead code, and then
the mutated variable might be optimized away, so don't leave the dead
`set!` behind.
When the srcloc-enriched S-expression representation a function that
is available for cross-module inlining has a source that is not a
path, string, or symbol, then the source has to be dropped in the
module's serialized form.
On a 64-bit machine, the problem was a `<` versus `<=`.
On a 32-bit machine where 4294967087 is not a fixnum, the
problem was in how bignums are handled.
Previously, `error` would raise an `exn:fail` with the following message,
because it treated the first of the three strings as a format string
and the other two as its arguments:
> error: format string requires 0 arguments, given 2;
> arguments were:
> " is a procedure, to always escape when called"
> " (by calling raise-blame-error with the arguments it was given"`
The main change is to disable forced GCs during the documentation
phase in a parallel `raco setup`. Since each GC is global (instead of
place local) in Racket CS, these forced GCs created a scaling problem.
Furthermore, they're not really necessary; peak memory use tends to
remain the same without them in a parallel build, since the
".zo"-building phase doens't have extra collections and tends to
establish peak memory use.
A non-parallel build still includes explicitly forced GCs. In that
mode, the cost is modest while reducing peak use by 30%.
A new environment variable appends timestamps (in process time since
startup) for each status line, which is useful for generating more
detailed build plots.
Since a GC with accounting doesn't run in parallel, it discards
owner-thread information (ironically) that is used for parallel
collection. Losing ownership information could reduce parallelism in
later collections --- but only if they run without accounting.
This commit does not turn on ownership tracking, however, because
experiments suggest that it isn't worthwhile for Racket:
* Preserving ownership information slows down a major collection in
accounting mode by 5-10%.
* Racket uses accounting mode only for full collections, and the loss
of ownership information would mostly affect only later full
collections. Since accounting momde tends to stick around, very
little parallelism is actually lost.
* Without further work, accounting mode cannot benefit from
parallelization, because it works by stepping sequentially through
accounting domains, and that aligns with thread ownership. A
possible improvement is to look ahead in the accounting-domain
object sequence to find one that starts in another ownership
domain, and then constrain concurrent tracing from that object to
keep the count private and not leave the ownership domain until the
accounting sequence has caught up.
Besides making it easier to turn on "parallel" collection for
accounting mode, the small refactorings here make it easier to bring
more things, like guardian and weak-pair handling, into the parallel
part of a collection. So, the changes are probably worth keeping for
that purpose.
The collector tries to use roughly the same amount of parallelism as
the number of active threads, but make sure that it doesn't fall back
to an ownership-mangling non-parallel collection if all but one place
happens to be stalled.
Make the exchange of objects by parallel sweepers --- which is needed
when an object being swept by one thread refers to an object owner by
another thread --- much simpler and slightly faster.
The original design of exchanging memory regions didn't really work
out, and it had degrenerated to essentially exchanging objects.
Explicitly exchanging objects instead of regions makes the GC simpler,
because received objects can be just added to the regular sweep stack,
and less additional space<->type correspondence is needed.
Fix mishandling of an expanded `if` where the test position is a
syntax object. The expander's compiler pass from expanded objects to
linkets knows that the syntax object isn't useful, but it tried to be
helpful by preserving the syntax object's content as quoted --- and
that content turns out not to be available, so the syntax object was
replaced by #f, instead.
Closes#3436
Rename definitions that are not exported and that are not used as
inferred function names. The rename is based on a hash of the
right-hand side, instead of being just the position of the definition
in the module, and that should help further reduce diffs in schemified
output after small changes to the source.
Instead of one big lock, have one big lock and one small lock for just
manipulating allocation state. This separation better reflects the
locking needs of parallel collection, where the main collecting thread
holds the one big lock, and helper threads need to take a different
lock while changing allocation state.
This commit also includes changes to cooperate better with LLVM's
thread sanitizer. It doesn't work for parallel collection, but it
seems to work otherwise and helped expose a missing lock and a
questionable use of a global variable.
This ensures that the moveq instruction is inside an IT block.
Backward compatible change - still builds on ARMv6 with Thumb.
This breakage is generally not noticeable on a RPi3 because, although the
CPU is ARMv7, it is generally running an ARMv6 OS.
Disable incremental promotion when backreferences are enabled,
otherwise the backreference list for a generation can have
references to younger-generation objects.
Casting a ptr to an integer of a different size causes a warning - this is only noticeable on 32bits. Fix this by using `TO_VOIDP` and `TO_PTR` helper macros.
This PR fixes two issues regarding disappeared-use in reqprov.rkt
1. Copy/preserve disappeared-use for all provide subforms instead
of only the first one. The following program doesn't have an arrow from
`struct-out` to `#lang racket` prior this PR, but it does after this PR.
#lang racket
(provide a? (struct-out a))
(struct a ())
2. Fix incorrect `disappeared-use` for `struct-out`: it is inappropriate
to check if an identifier is bound to a struct info in provide
pre-transformer because it would not be able to handle backlinks.
This PR moves the attachment from provide pre-transformer to provide
transformer, allowing Check Syntax to draw an arrow for the identifier
`abc` in:
#lang racket
(provide (struct-out abc))
(struct abc ())
Small fixes to enable LTO compilation
To do this in racket/src:
./configure CFLAGS="-O3 -march=native -flto"
LDFLAGS="-O3 -march=native -flto"
--prefix=...
One fix deals with variable `errnum` that might be used uninitialized.
The other is to ensure that `boot_file_data` is marked as used
otherwise, it is removed by LTO and ends up crashing build
in `embed-boot.rkt`.
Instead of having leftover Scheme threads swept by the main thread,
distribute Scheme threads among sweeper threads, and swap into each
thread's content to preserve its allocation ownership. This change
makes parallelism more consistent for different timings that end up
assigning sweepers differently.
Move per-thread allocation information to a separate object, so the
needs of the collector are better separated from the thread
representation. This refactoring sets up a change in the collector to
detangle sweeper threads from Scheme threads.
Change the internal parallelism strategy for the GC to record an owner
for each allocated segment of memory, and have the owner be solely
responsible for copying or marking objects of the segment. When
sweeping, a collecting thread handles references to objects that it
owns or that have been copied or marked already, and it asks another
collecting thread to resweep an object that refers to objects owned by
that that thread. At worst, an object ends up being swept by all
collecting threads, one at a time, but that's unlikely for a given
object.
The approach seems likely to scale better than a lock-based approach,
even the one that used a lightweight, CAS-based lock and retries on
lock failure.
That is, use pairs on the property in more places, as the pair
already was computed and the value-blame function already does
the needful when it sees a pair on the property.
When the GC needs to copy/mark a record, it previously forced a
copy/mark on the record's type descriptor, since the copy/mark needs
information from the descriptor. But the needed information is not in
danger of being overwritten for forwading (since it's after the first
two words of the type descriptor), so it's ok to use the old reference
as-is --- at least in non-counting mode. Simplifying record-type
handling and deferring the record-type update to the sweep phase, the
same as for other components of the record type, makes the GC slightly
faster.
Fall back to `current-directory` when `current-load-relative-directory`
is #f.
This change also affects `deserialize` --- not because byte code
loading uses it directly in this case, but because they share a helper
function, which exposes the issue. This implementation change is
worrying (even though it makes the implementation match the
documentation), but unless we discover that some use of serialization
needs absolute paths deseialized as relative, is seems better to be
consistent everywhere about falling back to `currenrt-directory`. This
aspect of the change can be reverted separately (by adding more code)
if needed.
Closesracket/drracket#421
All allocation is now thread-local, which recovers a small bit of
performance that was lost when adding thread-local allocation
alongside global allocation.
Parallelism uses thread contexts created by Chez Scheme threads (which
correspond to Racket places and future-running threads), but it
creates its own OS-level threads to perform collection. The number of
collection-helper threads is limited to the number of active Chez
Scheme threads. Only the main "sweep" pass runs in parallel --- that
is, after roots have been traversed, and before weak references and
finalization are handled --- but that's the bulk of collection work.
Also, memory-accounting collections always run as single-threaded.
Improved support for thread-location allocation (and using more
fine-grained locks in fasl reading) may provide a small direct
benefit, but the change is mainly intended as setup for more
parallelism in the collector.
There's a trade-off between keeping the distribution sizes small and
making ".boot" files available for convenient embedding, even though
embedding is relatively rare. For Unix platforms, since you have to
build from source to get a static library for embedding anyway, we'll
leave out ".boot" files. For Mac OS, the distribution's "Racket"
framework includes ".boot" files --- even though the framework is
itself unused for a normal distribution build, since signing and
notarization are handled by embedded the boot files in an executable,
but the framework was kept for a kind of backward compatibility. For
Windows, the Racket DLL can be used for embedding, so the ".boot"
files would be the only missing piece; also, they were already
included in a cross-built distribution.
Update "Inside" to note that ".boot" files must be built on Unix and
to clarify the location of ".boot" files on Mac OS.
Closes#3377
The A32 instruction set has an interesting encoding of immediate
values where a larger value sometimes fits in a smaller set of
instructions. That turns out to be a bad property for loading a return
address, because it means that the as label computations push code
further away, a contracting return-address calculation can pull code
back nearer, and this push-and-pull can keep the label allocator from
arriving at a fixpoint.
This became a bigger problem with 8834597c1f, which creates
return-label references that go backwards and where the offset can be
much larger than the normal, forward references.
See #3378: The possibility of mutation should be considered in
`:do-in` in somme rare cases, while it's not clear that there's
anything better to be done for mutation of the list accumulators in
`for/lists`. At least make the pitfalls clearer in the documentation.
This commit doesn't update nanopass itself, but adapts `rktboot`
so it can be used with the main Chez Scheme bbranch. It also
adjust "cpnanopass.ss" to avoid different behavior between the
old and newer versions of nanopass.
The Scheme stack pointer was left in call state when the number of
results is wrong, with the intent of exposing the enclosing function's
frame for debugging purposes, but there's no guanrantee that the
result address is still on the stack (i.e., the continuation make have
been captured). Reinstall the return address before calling the
exception handler.
Avoid coercing an integer to a flonum when doing so loses
precision. It's especially helpfu lto preserve oddness
versus evenness.
Closes#3360Closes#3100
Experiment with removing built-in Racket functions in stack traces to
make the trace less noisy. "Built-in" is defined as code that exists
in the built-in modules. On CS, built-in code is detected as residing
in the static generation. (Also, on CS, the code must have a name but
no source location or detailed debugging information to be
suppressed.) On BC, a code object has a bit set if it's loaded at boot
time.
This change makes stack traces look more like Racket BC traces before
the macro expander was implemented in Racket. The frames for built-in
functions have been useful for implementing the expander and Racket
CS, but probably they're just noise for most users most of the time.
Set the `PLT_SHOW_BUILTIN_CONTEXT` environment variable to preserve
all available frames in the stack trace.
Some text-editing tools on Windows include a BOM character (encoded)
at the start of a file that is intended as UTF-8. The general
recommendation for UTF-8 is to *not* include a BOM --- but, well,
Windows. When a BOM is there, meanwhile, the recommendation is to
preserve it in the stream, so always discarding an initial BOM at the
file-port level is not a good idea. A new file mode would make sense,
but distinctions like 'text and 'binary mode have turned out to be
best avoided.
Although I'm not sure it's really a good idea, treating a BOM
character as whitespace in the reader (at least in comment positions)
is an easy way around the problem for text files that are intended as
programs.
Closes#1114
The `current-locale` parameer value is transferred to the C lbrary via
`setlocale` on demand by Racket functions that depend on the locale,
but some libraries (like libedit) can also depend on the locale
setting. By default, the C library starts with the "C" locale, so it's
worthwhile to install the default locale on startup, even if that's
not a complete solution to support changes to `current-locale`.
This fixes at least one "potential" bug in the file
`collects/pkg/private/create.rkt`, where `else` in the
`cond` is bound to `else` from `match` instead of `racket/base`.
(though it turns out that the format will always be
truthy, making the program happen to be correct.)
Without this change, the below build fails for me.
This reflects the GRACKETLDFLAGS in the makefile one folder above. However I have no idea if this has wider implications...
```
mkdir build
cd build
..\configure --enable-bcdefault
make
```
Change the ".zo" format to convert a linklet bundle hash table to a
list, which avoids probblems with stencil-vector encodings and cross
cimpilation between 32-bit and 64-bit platforms. Avoiding records,
stencil values, and hash code should make the fasled form simpler.
Refine the approach in 91abd020d1 so that it's only used when needed
to work for the combination of custodian management and unsafe
finalization.
Also, improve the documentation to clarify the constraints on
`register-finalize` due to its implementation in CS by ordered
finalization. This constraint is also reflected in a new `#:ordered?`
argument to `register-custodian-shutdown`. Any existing code that uses
`register-custodian-shutdown` plus `register-finalizer` directly
instead of `register-finalizer-and-custodian-shutdown` would need to
be updated for Racket CS, but code like that should be rare to
nonexistent.
Closes#3352
Reverts the repair attempt in 91abd020d1. The problem with switching
to a "late" reference is that it's based on ordered finalization in
Racket CS, which doesn't work on values that can refer back to
themselves.
Propagate `LIBS` to rktio's configure, and also move some flags in
`LIBS` that should be in `LDFLAGS`. The immediate result is to repair
the detection of iconv for rktio on FreeBSD.
Closes#3353
Change the regular weak reference in a custodian, which allows
`will-executor`-based finalization, to a "late" weakk reference, which
allows both `will-executor` and `register-finalizer` finalization.
Closes#3352
- the collector now promotes objects one generation higher at a time
by default. previously, it promoted every live oldspace object to
the selected target generation, which could result in objects
prematurely skipping one or more generations and thus being
retained longer than their ages justify. the biggest cost in
terms of code complexity and performance is the recording of
pointers from older newspace objects to younger newspace objects
that could not previously occur.
gc.c, alloc.c, externs.h
- the collect procedure now takes an additional optional minimum
target generation argument to allow the new default behavior to
be overridden.
7.ss, primdata.ss,
gcwrapper.c,
7.ms, root-experr*,
smgmt.stex, release_notes.stex
- added cn flag to control collect-notify
mats/Mf-base
- resweep_weak_pairs now sets sweep_loc to orig_next_loc rather than
first_loc since the latter could result in unnecessary sweeping of
existing target-generation weak pairs.
gc.c
- added set of S_child_processes[newg] to S_child_processes[oldg]
in S_do_gc code handling decreases in the maximum generation.
gcwrapper.c
- a specialized variant of the collector is used in the common case
where the max copied generation is 0, the min and max target
generations are 1, and there are no locked generation 0 objects
is now used. with the default collection parameters and no locking
of generation 0 objects, these collections account for 3/4 of all
collections.
gc.c, gc-011.c (new), gcwrapper.c, externs.h, c/Mf-base
- maybe-fire-collector no longer tries to be so precise and instead
just counts the number of generation-bytes allocated since the
last gc. suprisingly, rebuilding the s directory requires about
the same number of collections with this coarser (and less
expensive) measurement. this change also fixes a problem with
too-frequent collections when the maximum-generation is set to
zero. to make the determination even less expensive, a running
total of bytes in each generation is now maintained in a new
bytes_of_generation vector, and maybe-fire-collector is no longer
called when the collector is running.
alloc.c, gc.c, gcwrapper.c, globals.h
- copy now copies two pairs at once only if they are in the same
segment, which saves a few memonry references and tests and turns
out not to reduce the number of opportunities significantly in
tested programs.
gc.c
- occupied_segments, first_loc, base_loc, next_loc, bytes_left,
bytes_of_space, sweep_loc, and orig_next_loc are now indexed
by [g][s] rather than [s][g] to improve locality in the default
(and common) case where there are only a handful of active
generations.
globals.h, types.h, segment.c, gc.c, gcwrapper.c, prim5.c
- now maintaining 16-byte architectural stack alignment (if the
incoming stack is so aligned) on all x86 platforms except
i3nt/ti3nt. more recent versions of gcc sometimes generate sse
instructions that require 16-byte stack alignment.
x86.ss
[Merge for Racket includes additional changes to combine with in-place
marking - mflatt]