Commit Graph

209 Commits

Author SHA1 Message Date
Matthew Flatt
27e21e6e7d code inspector: improvements to reloc reporting
Fix 'reloc to avoid a crash on static-generation code, and add
'reloc+offset to report an offset for each entry.

original commit: 4d4195044377f9c619cfb46056e365044069d5bc
2020-01-29 16:22:52 -07:00
Matthew Flatt
26ff90e8e6 more compact return points for function calls
In the general form of a function call, the return point embeds 4
words of information: offset to the start of the enclosing function,
frame size, live-veriable mask, and multiple-value return address. In
the common case, however, the multiple-value return address is either
the same as the return address or it is a `values-error` library
function, and the frame size and live-variable mask fit into a word
with bits to spare. This patch implements a more compact return point
for that common case, which shrinks the 4 words to 2 and also avoids a
relocation (= 1 more word).

Multiple-value returns are more complex with this change (i.e.,
require more code), since they must check whether the return point is
compact or not. But multiple-value returns are far less common than
function calls, so saving function-call space is a clear win.

Overall, this change tends to reduce code size by about 10% on x86_64.

original commit: 1f53b5eabef966db01086cb32e544bbf8deacfca
2020-01-24 19:19:32 -07:00
Matthew Flatt
45381612b2 fix popcount support to work on Windows
Avoid RDI, since it's preserved in the Windows ABI.

original commit: 68b2f597ec67ed8752998807bd0c9fc66667c752
2020-01-11 16:41:46 -07:00
Matthew Flatt
540c58bbe8 use POPCNT instruction when available on x86_64
On x86_64, a POPCNT instruction is usually available, and it can speed
up `fxpopcount` operations by a factor of 2-3.

Since POPCNT isn't always available, code using `fxpopcount` is
compiled to a call to a generic implementation. The linker substitutes
a POPCNT instruction when it determines at runtime that POPCNT is
available.

Some measurements on a 2018 MacBook Pro (2.7 GHz Core i7) using the
program below:

 popcnt = this implementation, POPCNT discovered
 nocnt  = this implementation, POPCNT considered unavailable
 optcnt = compile to use POPCNT directly (no linker work)
 cpcnt  = compile to inlined generic (no linker work, no POPCNT)

Since the generic implementation is always a 64-bit popcount, it's not
as good as an inlined version for `fxpopcount32`, but otherwise the
link-edit approach to POPCNT works well:

            fxpopcount      fxpopcount32
 popcnt:       0.098s
 nocnt:        0.284s
 optcnt        0.109s  [slower means noise?]
 cpcnt:        0.279s         0.188s

 (optimize-level 3)
 (time
  (let loop ([v #f] [i 100000000])
    (if (fx= i 0)
        v
        (loop (fxpopcount i) (fx- i 1)))))

original commit: 5f090e509f8fe5edc777ed9f0463b20c2e571336
2020-01-11 11:04:48 -07:00
Matthew Flatt
81ea967aea add stencil vectors and fxpopcount
original commit: ec766fca869b5e0407c4f54230b72619af73b40b
2020-01-06 05:34:28 -07:00
Matthew Flatt
2efa342323 speed up objlist
Instead of using `%` to compute the index into an oblist, use a power
of 2 for the oblist length and bit masking to compute an index. (Maybe
the old hashing function was bad; the current hashing function should
produce good hash-code variation at the level of bits.) Also, make the
oblist array a little sparser to reduce bucket chaining.

original commit: fb87fcb8e47902b80654789d059a25bd4a7a8def
2020-01-01 15:08:52 -07:00
Matthew Flatt
bbbda808e5 propagate $AR and $ARFLAGS to submodule builds
original commit: 652aed04f243ce4a7f5c71f18d1754952380a479
2019-12-31 07:58:12 -07:00
Matthew Flatt
69444da5a0 clear temporary bignum registers
After a bignum computation using temporary thread registers W, U, or V
is complete, clear ther register. (The X and Y registers hold only
small bignums, so clearing them doesn't matter in the same way.)

original commit: a9e11fcf9e86aee5d149764476e1fabfeee12f84
2019-12-30 07:03:19 -07:00
Matthew Flatt
c8ea435c85 make strings within symbols always immutable
original commit: 7859d16dac7bae6ab836e2200003583dc572deba
2019-12-16 17:11:49 -07:00
Matthew Flatt
f858bec12a skip <xlocale.h> on Linux
It's not available with musl, either, musl intentionally
doesn't provide a preprocessor test, and we're avoiding
(for now) `configure`-time tests in the style of autoconf.

original commit: a9bfb72027fc83ed6bb690d033bc6fed0629dba7
2019-12-11 14:41:07 -07:00
Matthew Flatt
01a40286c2 propagate CC and CPPFLAGS to ZLib and LZ4 builds
original commit: cbb7c5f21a879ee90293c3abf99d344c4fc42b7f
2019-12-09 08:34:50 -07:00
Matthew Flatt
50e529364d fasl: move uptr continue bit from low to high
Use the high bit of a byte to continue instead of the low bit.
That way, ASCII strings look like themselves in uncompressed fasl
form.

original commit: 89a8d24cc051123a7b2b6818c5c4aef144d48797
2019-12-06 16:43:26 -07:00
Matthew Flatt
de2dedcdd7 Add uninterned symbols
Uninterned symbols are slightly more expensive to allocate than 0- or
1-argument calls to `gensym`, but they're much cheaper to hash (and
print). They're also more consistently distinct when unfasled, and the
fasled form is determinsitic.

original commit: 3167083008031b1f880e76a6f573563c7d9c888c
2019-12-04 12:43:35 -07:00
Matthew Flatt
ddf4322ef2 ignore result of mktime
The result of `mktime` is -1 for an error. The result is also -1 if
the time is 1 second before the epoch. That's not useful, so ignore
it.

original commit: aa8ca31cef223128fd8ed1abdc76beb31a0e077a
2019-11-23 19:54:30 -05:00
Matthew Flatt
18d18b7ff6 add pseudo-random generator API
The MRG32k3a generator is fast when using unboxed floating-point
arithemtic. Since the Scheme compiler doesn't yet support that,
build MRG32k3a into the kernel and provide access via
`pseudo-random-generator` functions.

original commit: 3dd74679a6c2705440488d8c07c47852eb50a94b
2019-10-07 10:58:39 -06:00
Matthew Flatt
174c416f9e repair for opportunistic 1-shot
If normal 1-shot continuations are mixed with opportunistic 1-shot
continuations created by `call-setting-continuation-attachment`, then
promoting an opportunistic 1-shot at a GC is wrong unless the whole
chain is promoted.

original commit: 2dfac475666763b60935e382386af4438f3029e0
2019-09-24 11:41:50 -06:00
Matthew Flatt
4e3b829227 add $app
Using `#3%$app` disables a `procedure?` check in an application.

original commit: d7960da9e3c3a864a4df42cb8bb71d9b205aeb95
2019-09-19 07:30:42 -06:00
Matthew Flatt
c57de26c1d add call-consuming-continuation-attachment
Also, rename `call-with-current-continuation-attachment` to
`call-getting-continuation-attachment`.

original commit: e2a00e6d641b92918c4911c27ba14949748fd291
2019-09-11 17:07:11 -06:00
Matthew Flatt
b842a134fd continuation-attachment performance
Add a shortcut check when refiying the continuation frame in tail
position, which is significantly cheaper when the frame is already
there. We pay down the check by skipping an attachment-lists check
that is not needed if the frame is newly reified.

Aslo, add a one-shot continuation-frame cache, which makes a shallow
temporary attachment cheaper, as in

 (let loop ([i N])
   (if (zero? i)
       0
       (loop (call-setting-continuation-attachment
              i
              (lambda ()
                (f (sub1 i)))))))

The cache is just one frame. Keeping a chain of allocated-by-not-GCed
frames doesn't pay off.

Meanwhile, remove the leftover `$shift-attachment` library entry.

original commit: 1f454f536b1d7efe20fe9e793cda31e54e31e5f4
2019-09-11 09:34:42 -06:00
Matthew Flatt
502b0b5f50 repair for locked-object handling and multiply-locked values
Weak pairs, ephemeron pairs, some symbols, and some ports were handled
incorerctly when locked multiple times.

original commit: 847fc1c84496f67cd363c8411d0023339f4d6246
2019-09-01 08:57:14 -06:00
Matthew Flatt
2f4d59de0f remove unused binding
original commit: a4732d58666d80e78af5e1cde4c796d3eeae20e7
2019-09-01 07:13:23 -06:00
Matthew Flatt
c195288251 scalable object locking
The `unlock-object` operation was O(N) with N currently locked objects
--- so, O(N^2) to lock N objects and then unlock them --- because
locked objects were stored in and searched in a global list. Also, GC
was O(N) at any generation with N locked objects across generations,
since every locked object was scanned.

Fix these poblems so that locking and unlocking is practically O(1)
and GC is not poportional to locked objects. More precisely, locking
and unlocking is now O(C) for locking an individual object C times to
be balanced by C unlocks. (Since multiple locks on a single object
is rare, this performance seems good enough.)

The implementation replaces the global list with segment-specific
lists. Backpointers are managed using the general generational
support, so that unmodified, old-generation locked objects do not
need to be swept duing a new-generation collection.

original commit: a57d256ca73a3d507792c471facb7e35afbe88b3
2019-09-01 07:03:16 -06:00
Matthew Flatt
ce9df2f827 Merge github.com:cisco/ChezScheme
original commit: c5d71168eb4315f7e8ec9c0acf615fa0b9a2fc88
2019-07-26 04:29:00 -06:00
Alexander Shopov
3fec9b8bba Try to eliminate dead stores (#444)
Signed-off-by: Alexander Shopov <ash@kambanaria.org>
original commit: 84a6a6ab36294c73dbdc617d19c42fada42c3a15
2019-07-25 15:05:48 -04:00
Alexander Shopov
f3cc313d96 Add additional check to prevent going before start of buffer (#446)
p is a pointer that iterates over path, which is buffer.
We should not try to get to an address preceding its start.
Since there was an execution path that leads to that,
guard against it with an additional check.

Signed-off-by: Alexander Shopov <ash@kambanaria.org>
original commit: de8d0e742f44c80735a682bd05019246c2087d56
2019-07-25 15:00:18 -04:00
5pyd3r
0df195f066 fix ee_read_char to handle ^@ properly
original commit: e962a03987470d0a3937446c10af3a94793ffc43
2019-07-25 14:48:38 -04:00
Bob Burger
6ab0111073 Merge branch 'bsd'
# Conflicts:
#	LOG

original commit: b6f861e6266f42f8cb0c4d2db9c3ebed5b98e35c
2019-07-25 14:35:27 -04:00
Alexander Shopov
8c891262a1 Use setenv rather than putenv on non WIN32 environments
Signed-off-by: Alexander Shopov <ash@kambanaria.org>

original commit: 8bf1e18853d5feeb64aadb631c35641cd0ab4748
2019-07-25 16:06:48 +02:00
Matthew Flatt
368d079d24 adjust build for BSDs, MinGW cross-compile, and more configuration
Includes joint work with @abmclin, @pmatos, and @jessealama.

original commit: 2649600c68ff57efb63d6d5d10c9d9f73368f59a
2019-07-06 13:16:57 -06:00
Matthew Flatt
71846161f9 Merge branch 'bsd' of github.com:mflatt/ChezScheme
original commit: 198477a40c2c580924d95491e63d80e1f9a39c0d
2019-07-05 07:30:37 -06:00
Matthew Flatt
c38194c0ca adjust build for BSDs, MinGW cross-compile, and more configuration
Includes joint work with @abmclin, @pmatos, and @jessealama.

original commit: 70559d074f70dcadec5cea3619f75f91fcda77eb
2019-07-03 18:54:04 -06:00
Paulo Matos
a3f325bbea mark functions that never return as NORETURN
original commit: 6377313ecb063273b573139c9e91de263e191e60
2019-07-02 11:30:59 -06:00
Matthew Flatt
dd0fe4ac40 unbreak MSVC build
Move `NORETURN` of 2e3a618b00 to start of function declaration, where
it works for both GCC and MSVC.

original commit: 10fc4a2406ecd34fa686d9d643ee63d7c12d6f97
2019-06-23 05:57:53 -06:00
Matthew Flatt
9f1fe73797 change build to use archives instead of merging objects
Merging ".o" files to one "kernel.o" can be convenient for further
linking, but it requires running `ld` directly. Running `ld` directly
sometimes runs into a mismatch between the C compiler and the default
`ld`. It's better to use the more typical approach of collecting
objects into an archive.

original commit: 7d5b60c7566570655e567495d86d546101cf8fb4
2019-06-21 18:53:33 -06:00
Matthew Flatt
a043c4b3a8 mark functions that never return as NORETURN
@pmatos did all the work here in racket/ChezScheme#8 and
racket/racket#2344.

original commit: 2e3a618b0072d547b6c5abe6dd8dbac36a98c10e
2019-06-21 14:26:01 -06:00
Matthew Flatt
e8bd9b83cd repair {Free,Open,Net}BSD build
Use OSSP UUID on {Free,Open}BSD and native UUID on NetBSD.

Building on OpenBSD requires a filesystem mounted with wxneeded.

original commit: e964d7d01a6d115e469c01626896b683d421d599
2019-06-11 09:34:09 -06:00
Matthew Flatt
81191397b5 Merge github.com:cisco/ChezScheme
original commit: bb65f1a8e429683e2925cf1678145efe0ade59bb
2019-06-07 08:56:14 -06:00
Paulo Matos
af63b73bad Update shift left that might cause ub
A few shift lefts cause ub because of `(1 << n)` where `n` is 31. 
The constant 1 is signed causing ub. Initially my fix was to do `(1U << n)` however, I have seen the pattern `((U32)1 << n)` elsewhere in the file so decided to follow this.

Caught by ubsan racketcs.
original commit: a902c9ab67010f521f786e2027d4e197d78975a4
2019-06-06 08:57:25 -06:00
Paulo Matos
4988a45c06 Make variables unsigned to avoid ub in calculation
According to ubsan we get several times into undefined behaviour due to signed overflow:
foreign.c:91:21: runtime error: signed integer overflow: 3291370622602663862 * 3 cannot be represented in type 'long int'

This happens only when the symbol name is relatively large like as for the call:
symhash (s=0x5555558caab8 "(cs)set_enable_object_backreferences")
original commit: 1e1c91869443d8a22beeebfcbe6fa14f9c3e2a6e
2019-06-05 22:49:55 +02:00
Steven Watson
21c7dd839d Added support for building chez with VS2019. (#435)
added support for Microsoft Visual Studio 2019 on Windows
original commit: 549b4468b619a9377332509472a4346ac223b5ae
2019-06-04 16:37:57 -04:00
Matthew Flatt
2cf27c4727 Merge github.com:cisco/ChezScheme
original commit: 8118200e237d756f83be54e8bf3eabb4af2388ed
2019-05-22 10:46:59 -06:00
Bob Burger
62907754b4 fix multiply of -2^30 to itself on 64-bit platforms
original commit: 566c7a98ec4e070a26450781ffc2b9054860e4ed
2019-05-02 15:19:58 -04:00
Matthew Flatt
40ced8629e repair multiply of (- (expt 2 30)) to itself
On a 64-bit platform, the test for "short" arguments to
avoid overflow was incorrect, because `(- (expt 2 30))`
counted as short.

original commit: 6d05b70e86987c0e7a07f221ba5def492300aaaf
2019-05-01 09:20:35 -06:00
dyb
82b2cda639 compress-level parameter, improvement in lz4 compression, and various other related improvements
- added compress-level parameter to select a compression level for
  file writing and changed the default for lz4 compression to do a
  better job compressing.  finished splitting glz input routines
  apart from glz output routines and did a bit of other restructuring.
  removed gzxfile struct-as-bytevector wrapper and moved its fd
  into glzFile.  moved DEACTIVATE to before glzdopen_input calls
  in S_new_open_input_fd and S_compress_input_fd, since glzdopen_input
  reads from the file and could block.  the compress format and now
  level are now recorded directly the thread context.  replaced
  as-gz? flag bit in compressed bytevector header word with a small
  number of bits recording the compression format at the bottom of
  the header word.  flushed a couple of bytevector compression mats
  that depended on the old representation.  (these last few changes
  should make adding new compression formats easier.)  added
  s-directory build options to choose whether to compress and, if
  so, the format and level.
    compress-io.h, compress-io.c, new-io.c, equates.h, system.h,
    scheme.c, gc.c,
    io.ss, cmacros.ss, back.ss, bytevector.ss, primdata.ss, s/Mf-base,
    io.ms, mat.ss, bytevector.ms, root-experr*,
    release_notes.stex, io.stex, system.stex, objects.stex
- improved the effectiveness of LZ4 boot-file compression to within
  15% of gzip by increasing the lz4 output-port in_buffer size to
  1<<18.  With the previous size (1<<14) LZ4-compressed boot files
  were about 50% larger.  set the lz4 input-port in_buffer and
  out_buffer sizes to 1<<12 and 1<<14.  there's no clear win at
  present for larger input-port buffer sizes.
    compress-io.c
- To reduce the memory hit for the increased output-port in_buffer
  size and the corresponding increase in computed out_buffer size,
  one output-side out_buffer is now allocated (lazily) per thread
  and stored in the thread context.  The other buffers are now
  directly a part of the lz4File_out and lz4File_in structures
  rather than allocated separately.
    compress-io.c, scheme.c, gc.c,
    cmacros.ss
- split out the buffer emit code from glzwrite_lz4 into a
  separate glzemit_lz4 helper that is now also used by gzclose
  so we can avoid dealing with a NULL buffer in glzwrite_lz4.
  glzwrite_lz4 also uses it to writing large buffers directly and
  avoid the memcpy.
    compress-io.c
- replaced lz4File_out and lz4File_in mode enumeration with the
  compress format and inputp boolean.  using switch to check and
  raising exceptions for unexpected values to further simplify
  adding new compression formats in the future.
    compress-io.c
- replaced the never-defined struct lz4File pointer in glzFile
  union with the more specific struct lz4File_in_r and Lz4File_out_r
  pointers.
    compress-io.h, compress-io.c
- added free of lz4 structures to gzclose.  also changed file-close
  logic generally so that (1) port is marked closed before anything is
  freed to avoid dangling pointers in the case of an interrupt or
  error, and (2) structures are freed even in the case of a write
  or close error, before the error is reported.  also now mallocing
  glz and lz4 structures after possibility of errors have passed where
  possible and freeing them when not.
    compress-io.c,
    io.ss
- added return-value checks to malloc calls and to a couple of other
  C-library calls.
    compress-io.c
- corrected EINTR checks to look at errno rather than return codes.
    compress-io.c
- added S_ prefixes to the glz* exports
    externs.h, compress-io.c, new-io.c, scheme.c, fasl.c
- added entries for mutex-name and mutex-thread
    threads.stex

original commit: 722ffabef4c938bc92c0fe07f789a9ba350dc6c6
2019-04-18 05:47:19 -07:00
Matthew Flatt
b9e43c6e78 add "scheme.h" dependency for "main.c" to Windows makefiles
original commit: 413cf148327345847aa3d1f6b839e77d74a8996e
2019-04-09 17:11:26 -06:00
Matthew Flatt
2da5fd740e Merge branch 'hashmix' of github.com:mflatt/ChezScheme
original commit: b620bd23a962989db5f5b489eb67a1fa45ee123d
2019-04-07 10:14:10 +02:00
Matthew Flatt
ffc02a9877 improve hash mixing
original commit: d7469cedd67a950931a561ce14388fe7e628770d
2019-04-07 09:37:37 +02:00
Matthew Flatt
e622a495b6 Add LZ4 support and use it by default for compressing files
original commit: 8858b34bd92ac8d2b6511dc9ca17ebfa06a1bd93
2019-04-06 07:32:37 +02:00
Bob Burger
19b130e41c update Windows spin-loop count for deleting files and directories
original commit: b597e161fcb8c5ebb8f7f8e1aa27b2f136c13064
2019-03-26 14:16:54 -04:00
Matthew Flatt
c6d3a1dd69 make nul act as a stream terminator for LZ4 sequences
original commit: 06f4aab43a35b3a3f956cf510c76c0edb4f1a866
2019-03-22 13:52:53 -06:00