Allow a library-defined function to be inlined when the inlined
expressions refer to other library-defined functions. Since the
library function's body may already have inlined calls, don't allow
further inlining of calls within the inlined code.
This commit also adds `$app/no-inline`, which can be used to prevent
inlining of a function. For consumers other than Racket on Chez
Scheme, probably it would make sense to provide a nicer-looking
syntactic form that expands to use the internal `$app/no-inline`
function.
original commit: 628d57e1bd2e658aad4da97a3e85bda72c38f6ab
On x86_64, a POPCNT instruction is usually available, and it can speed
up `fxpopcount` operations by a factor of 2-3.
Since POPCNT isn't always available, code using `fxpopcount` is
compiled to a call to a generic implementation. The linker substitutes
a POPCNT instruction when it determines at runtime that POPCNT is
available.
Some measurements on a 2018 MacBook Pro (2.7 GHz Core i7) using the
program below:
popcnt = this implementation, POPCNT discovered
nocnt = this implementation, POPCNT considered unavailable
optcnt = compile to use POPCNT directly (no linker work)
cpcnt = compile to inlined generic (no linker work, no POPCNT)
Since the generic implementation is always a 64-bit popcount, it's not
as good as an inlined version for `fxpopcount32`, but otherwise the
link-edit approach to POPCNT works well:
fxpopcount fxpopcount32
popcnt: 0.098s
nocnt: 0.284s
optcnt 0.109s [slower means noise?]
cpcnt: 0.279s 0.188s
(optimize-level 3)
(time
(let loop ([v #f] [i 100000000])
(if (fx= i 0)
v
(loop (fxpopcount i) (fx- i 1)))))
original commit: 5f090e509f8fe5edc777ed9f0463b20c2e571336
Instead of using `%` to compute the index into an oblist, use a power
of 2 for the oblist length and bit masking to compute an index. (Maybe
the old hashing function was bad; the current hashing function should
produce good hash-code variation at the level of bits.) Also, make the
oblist array a little sparser to reduce bucket chaining.
original commit: fb87fcb8e47902b80654789d059a25bd4a7a8def
After a bignum computation using temporary thread registers W, U, or V
is complete, clear ther register. (The X and Y registers hold only
small bignums, so clearing them doesn't matter in the same way.)
original commit: a9e11fcf9e86aee5d149764476e1fabfeee12f84
Try `fxquotient` with a `fx*` check to implement `/` on fixnums.
That's fast enough to be much faster when it works, and only slows
down a more general `/` a little.
original commit: e91430be9b71f4913965db688a15f6d7206b38f3
It's not available with musl, either, musl intentionally
doesn't provide a preprocessor test, and we're avoiding
(for now) `configure`-time tests in the style of autoconf.
original commit: a9bfb72027fc83ed6bb690d033bc6fed0629dba7
Don't run cptypes, when cp0 is disabled, for example with
(run-cp0 (lamba (cp0 x) x)
This is easier to understand because run-cp0 is a single point to control
all the cp reductions. The reductions in cptypes can be independently disable
using enable-type-recovery.
original commit: b23645e669fbf02806a261a2d87160fdbe06db93
Use the high bit of a byte to continue instead of the low bit.
That way, ASCII strings look like themselves in uncompressed fasl
form.
original commit: 89a8d24cc051123a7b2b6818c5c4aef144d48797
Uninterned symbols are slightly more expensive to allocate than 0- or
1-argument calls to `gensym`, but they're much cheaper to hash (and
print). They're also more consistently distinct when unfasled, and the
fasled form is determinsitic.
original commit: 3167083008031b1f880e76a6f573563c7d9c888c
The result of `mktime` is -1 for an error. The result is also -1 if
the time is 1 second before the epoch. That's not useful, so ignore
it.
original commit: aa8ca31cef223128fd8ed1abdc76beb31a0e077a
With this flag the primitive is not tested in primvars.ms but other
parts of the compiler can use the signature/flags.
Also, add a signature to every system boolean primitive.
primvars.ms, primdata.ss
original commit: ee023c673bda6557bc223de7f8b0e732600619bc
Probably makes no difference right now, since these internal functons
rarely show up in the process of optimizing user programs, but just in
case.
original commit: b54a288b31731368dbcf57c95b78f0a162c29147
Can't simply use a continuation reified by an attachment
operation, because it is probably a 1-shot continuation
that needs to be promoted.
original commit: 8201aff06df8011ffbc41f217d50e4c430d75bb5
Effectively, change a `call/cc` call to `let/cc` when it appears in
the tail position of a function. This change takes advantage of
continuation-reification support that was built for continuation
attachments.
original commit: 4c015a5b55f7d04839a0efd8e5554fc237e4663b
The MRG32k3a generator is fast when using unboxed floating-point
arithemtic. Since the Scheme compiler doesn't yet support that,
build MRG32k3a into the kernel and provide access via
`pseudo-random-generator` functions.
original commit: 3dd74679a6c2705440488d8c07c47852eb50a94b
If normal 1-shot continuations are mixed with opportunistic 1-shot
continuations created by `call-setting-continuation-attachment`, then
promoting an opportunistic 1-shot at a GC is wrong unless the whole
chain is promoted.
original commit: 2dfac475666763b60935e382386af4438f3029e0
Include an extra part in the Chez Scheme version number, which both
helps indicates the Racket fork and versions it.
original commit: 00678e29bb9f05de2ccaec8585126e967cdcc6f4