An optimization relatively late in the BC bytecode compiler pipeline
was wrong for `begin0`. The transformation and bug must be a very old,
since it's intended to help the bytecode interpreter.
Thanks to Sage for reporting and Alexis for initial debugging.
When linking with libracket.a or libracket3m.a, librktio.a is needed.
(The instructions in "Inside" have apparently been wrong since rktio
was split out.)
The main (slightly) effective change here is to avoid disturbing loop
patterns within the Rumble layer's implementation.
Most of the commit is a commented out, updated version of the Scheme
implementation of MRG32k3a `random`. With the latest improvements for
unboxed floating-point arithmetic, performance is relatively good, but
it doesn't catch up to the C compiler's output. On an x86_64 MacBook
(i7 4870HQ) using LLVM or a Raspberry Pi 3 using GCC, it's about 50%
slower compared to C (in contrast to 300% slower before unboxing).
It's almost the same speed on a older x86_64 Linux machine (i7 2600)
using GCC. Where the C compiler wins, maybe it's due to the use of
SIMD instructions in the C output for x86_64 and Arm32. Switching to
the Scheme implementation of `random` would probably be fine, but
aisde from the satisfaction of being in Scheme, there's no reason to
pay the sometimes 50% penalty for now.
Caching compiled JIT fragments in a SQLite database did not turn out
to be a viable path, so remove partial support for it. JIT mode in
general is rarely a good option, but it's at least completely worked
out, so left in for now.
Update the Guide's performance section with current information for
Racket CS, and also document the Racket CS compilation mode and
inspection environment variables. Make a couple of environment
variables work more consistently: PLTDISABLEGC for CS and PLT_ZO_PATH
for BC.
When the runtime thread `touch`es a future that is blocked on an
atomic action (just as JIT compilation), the runtime thread would
eagerly run the action, but still leave the future on the
atomic-action queue. Atomic actions tend to be ok to run a second time
(including JIT compilation), so a problem may not show up immediately,
but a semaphore can get out of sync and cause problems later.
Change `fl->fx` to truncate as it converts, which is typically done
anyway by a machine instruction to convert from floating-point to
integer values. This makes `fl->fx` different from `inexact->exact`
or `fl->exact-integer`, but it brings BC and CS in line.
Follows Chez Scheme and Guile. Turns `(exp 10000.+0.0i)` into
`+inf.0+0.0i` instead of `+inf.0+nan.0i`, which is analagous to
the behavior for exact 0 in the complex part.
Fixes#3194.