Besides updating for unboxed floating point, the ppc32 build uses a
return register, and the continuation-attachments implementation was
not right for that mode.
original commit: dd2d01fb26ace819c73f258b9b53739f9dda1d34
An optimization relatively late in the BC bytecode compiler pipeline
was wrong for `begin0`. The transformation and bug must be a very old,
since it's intended to help the bytecode interpreter.
Thanks to Sage for reporting and Alexis for initial debugging.
When linking with libracket.a or libracket3m.a, librktio.a is needed.
(The instructions in "Inside" have apparently been wrong since rktio
was split out.)
The main (slightly) effective change here is to avoid disturbing loop
patterns within the Rumble layer's implementation.
Most of the commit is a commented out, updated version of the Scheme
implementation of MRG32k3a `random`. With the latest improvements for
unboxed floating-point arithmetic, performance is relatively good, but
it doesn't catch up to the C compiler's output. On an x86_64 MacBook
(i7 4870HQ) using LLVM or a Raspberry Pi 3 using GCC, it's about 50%
slower compared to C (in contrast to 300% slower before unboxing).
It's almost the same speed on a older x86_64 Linux machine (i7 2600)
using GCC. Where the C compiler wins, maybe it's due to the use of
SIMD instructions in the C output for x86_64 and Arm32. Switching to
the Scheme implementation of `random` would probably be fine, but
aisde from the satisfaction of being in Scheme, there's no reason to
pay the sometimes 50% penalty for now.
Caching compiled JIT fragments in a SQLite database did not turn out
to be a viable path, so remove partial support for it. JIT mode in
general is rarely a good option, but it's at least completely worked
out, so left in for now.
Update the Guide's performance section with current information for
Racket CS, and also document the Racket CS compilation mode and
inspection environment variables. Make a couple of environment
variables work more consistently: PLTDISABLEGC for CS and PLT_ZO_PATH
for BC.
Flonum operations like `fltruncate` and `flsin` are implemented by
calling functions from the C library. Unboxing these involves a
generalazation the `foreign-call` intermediate form to handle unboxing
and to work in a non-tail position (especially by telling the register
allocator that caller-saved registers will be trashed). An internal
'atomic convention on a foreign call indicates that no callback into
Scheme is possible, so some setup/teardown (including stashing
callee-saved registers) can be skipped.
original commit: fd89919634d0d5272e046b47bb81bcc66e22a741
Shift addition of boxing as needed into the main loop, infer unboxed
variables and `mref`s, and centralize lifting of the `unboxed-fp`
declaration.
original commit: ed8ca4b6c77bdd436b0dee467a8350a450a44fb3
When the runtime thread `touch`es a future that is blocked on an
atomic action (just as JIT compilation), the runtime thread would
eagerly run the action, but still leave the future on the
atomic-action queue. Atomic actions tend to be ok to run a second time
(including JIT compilation), so a problem may not show up immediately,
but a semaphore can get out of sync and cause problems later.
Change `fl->fx` to truncate as it converts, which is typically done
anyway by a machine instruction to convert from floating-point to
integer values. This makes `fl->fx` different from `inexact->exact`
or `fl->exact-integer`, but it brings BC and CS in line.
The comparison was off for 32-bit plaforms, because it didn't allow
fractional increments, The comparison was off for 64-bit platforms,
bbecause it didn't account for round-trip failure when starting from
the largest fixnum.
original commit: 74eb0583ae1b6212fbde459d7486c3d4a0498401
Follows Chez Scheme and Guile. Turns `(exp 10000.+0.0i)` into
`+inf.0+0.0i` instead of `+inf.0+nan.0i`, which is analagous to
the behavior for exact 0 in the complex part.
Fixes#3194.
Simplify and normalize backend elements for loading, storing, and
converting floating-point numbers, taking better advantage of
new support for floating-pointer registers.
original commit: 4066af9cf3799392ef785a77da69f7cfff74d2fe
This is a follow-up to 276f8da076, where `(%tc-ref cp)` was supposed
to be preserved by moving it into %cp, but intrinisics for bytevector
arguments can kill %cp. Use a temporary to expose things properly to
the register allocator.
original commit: 3a29db06a452e46e69ebcde524b3b9acb435dec3
This reverts commit aa230ac79bed1efa02779bb7bbcde5c009818b74, so it
can be replaced with a solution that is less clumsy and less fragile.
original commit: 533940fdc6905d810deabb457d7004a031a3ac05