Rename fields in a page record and split some of them with `union` to
better document the intent of each field.
This change is intended to have no effect on the GC's behavior. One
tricky case is the line dropped around line 3542 of "newgc.c". That
line reset `scan_boundary` (formerly `previous_size`), which on the
surface is inconsistent with leving objects before the boundary
without `marked` bits set. However, that line is reachable only
when geneation-1 objects are being marked (objects newly moved
there would not be unmarked), in which case `san_boundary` should
already be reset.
Using `--enable-racket=auto` causes a Racket for the current platform
to be built in a "local" subdirectory of the build directory as
support for cross-compilation.
The original idea was to count phantom bytes as "administrative
overhead", but issues discussed in #962 identified problems
with that idea. Finish shifting the accounting to treat
phantom bytes as payload allocation.
The misplacement of `SCHEME_PRIM_SOMETIMES_INLINED` caused the
optimizer to produce different results when the JIT is statically
disabled, for example.
This bug is an old one, in a sense, because travesing fields
in a closure could have moved the prefix with earlier versions
of the collector. It shows up now because we're changing fields
one indirection closer.
Compact fewer blocks by moving them only when other
blocks have room.
Also, fix block protection tracking in the case of a page
count that isn't divisible by 8.
In the common case of a minor GC without a generation 1/2
or a major GC without compaction, a single pass suffices
to both mark and update references.
This change reduces overall GC time by 10%-25% on typical
programs.
The GC supported allocation for an array of objects where
the first one provides a tag, but at this point it was
used only in some corners. Change those corner and simplify
the GC by removing support for arrays of tagged objects.
The main corner to clean up is in the handling of a macro-expansion
observer and inferred names. Move those into the compile-time
environment. It's possible that name inference has been
broken by the changes, but in addition to passing the tests,
the generated bytecode for the base collections is exactly the
same as before the change.
Although a block cache is set up to group most page-protection changes
into a single OS call, allocating new old-generation pages was not
covered. Adjust the block cache to group those.
This change has a small effect on performance, but it seems
better to have a few system calls in place of thousands.
First bug:
When the optimize converts
(let-values ([(X ...) (values M ...)])
....)
to
(let ([X M] ...)
....)
it incorrectly attached a virtual timestamp to each "[X M]" binding
that corresponds to the timestamp after the whole `(values M ...)`.
The solution is to approximate tracking the timestamp for invidual
expressions.
Second bug:
The compiler could reorder a continuation-capturing expression past
an allocation.
The solution is to track allocations with a new virtual clock.
Make `eval-syntax`, `compile-syntax`, and `expand-syntax` more
consistent (with intent and each other) by not installing a fallback
automatically. In particular, a fallback is not installed for a
`module` form, so that different ways of expanding a `module` form
produce consistent results (e.g., for ambiguous bindings).
Putting <PlatformToolset> in the new places makes the projects
work when more than one version of Visual Studio is installed.
Maybe the old place was always the wrong place, or maybe
VS 2010 wanted it in the old place. Either way, sprinkling
the version in more places seems unlikely to hurt.
User-scope package installation matching the version of
Racket being built could affect the collections visible
during `raco setup` for `make base`. In particular, the
presence of `setup/scribble` could cause all built docs
to be discarded.
Also, add the `--no-user-path` flag to `racket` (which
has long been documented as an alias for `-U`).
The table as a tree is traversed to prune empty branches,
but the travseral is needed only toward branches that
have changed. Skipping the traversal can save several
milliseconds on each collection.
Name handling formerly interned symbols along the
way to allocating a plain string, which takes effort
and causes changes to the symbol table, which forces
a minor GC to traverse the whole symbol table. Skip
unnecessary symbol-interning steps.
Refine the changes in 16c198805b so that `(define id ... id ... )` at
the top level compiles more consistently when `id` is an identifier
whose lexical context does not include `#%top`.
When `compile` is used on a top-level definition, do not
create a binding in the current namespace, but arrange for
a suitable binding to be in place for the target namespace.
Closes#1036
This repair adjusts the bug fix of commit 769ad3e98. That older commit
ensured that `sync/enable-break` doesn't both break and accept a
channel message or semaphore wait. But it effectively disables those
actions if the break is continued.
Instead of (partially!) ending the `sync` get out of semaphore
and channel queues so that no event can be selected during
the break, and then get back in line if the break is continued.
When a path is made relative for marshaling to bytecode, record
a list of byte strings in stead of a platform-specific relative
path.
For syntax-object source locations, convert any non-relative path to a
string that shows just the last couple of path elements preceded by
".../". This conversion avoids embedding absolute paths in bytecode,
but at the cost of some information. A more complete and consistent
solution would invove using a module-path index instead of a path, but
that would be a big change at several layers.
Make room in the bytecode format for source locations and 'paren-shape
property values for syntax objects. Saving source locations increases
bytecode size by about 10% on average.
Also, convert the internal representation of syntax properties to
use immutable hash tables, instead of lists.
The `prop:expansion-contexts` property can control the expansion
of a rename transformer in much the same that conditionals on
`(syntax-local-context)` can control the expansion of other
transformers.
All places uses the same accounting bit for objects
that are in the shared space. Each place also flips
the bit value it wants on each accounting, so if two
places are accounting at the same time with opposite
bit values and can reach the same objects, they can
interefere. It's even possible for them to race
through cycles and cause each other to loop forever.
Add a lock to ensure that there's only one bit value
in play for the shared space at any given time. A
place must stall if other places are busy with memory
accounting and an opposite bit value.
While a place message is received by a thread but not yet
deserialized, if the message contains references to objects in the
shared space, and if a "master" GC happens (which crosses all places),
make sure that the references in the still-serialized message are
traversed.
Adjust installation tools to support cross-installation (i.e.,
installation for a platform other than the current one) as triggered
by "system.rktd" in "lib" having different information than the
running Racket executable.
Also, change floating-point handling to be like the MSVC build by
default, where the process is left in double-precision mode and
the mode is changed for exfl operations.
Includes repairs for integer-size mismatches in uses of Windows
threads.
The error message for the guard used an incorrect contract.
Also removed an unused line that allows a box value in the
property. I don't think it was possible to trigger this line
anyway because of the dynamic check.
In a case like
(let-values ([(X ...) (with-continuation-mark M_k M_v
(values M ...))])
....)
where the bytecode compiler cannot convert to a sequence of `let`
bindings, make the JIT implement `values` as delivering argument
results directly to the corresponding variable locations.
Progress toward making the bytecode compiler deterministic, so that a
fresh `make base` always produces exactly the same bytecode from the
same sources. Most changes involve avoiding hash-table order
dependencies and adjusting scope identity. The namespace used to load
a reader extension is also better defined. Plus many other little
changes.
The identity of a scope that is unmarshaled from a bytecode file now
incorporates the hash of the file, and the relative order of scopes is
preserved in a bytecode file. This combination allows compilation to
start with modules that loaded and compiled in different orders
(including delayed loading of bytecode fragments within one file).
Formerly, a reader extension triggered by `#lang` or `#reader` was
loaded in whatever namespace happens to be current. That's
unpredictable and can pollute a module build at the level of bytecode.
To help make builds deterministic, reader extensions are now loaded in
a root namespace of the current namespace.
Deterministic compilation in general relies on deterministic macros.
The two most common ways for a macro to be non-deterministic are by
using `gensym` (use `generate-temporaries`, instead) and by using an
unsorted hash-table traversal (don't do that).
At this point, bytecode generation is unlikely to be completely
deterministic, since I uncovered non-determinism mostly by iterating
attempts over the base collections. For now, the intent is not to
provide guarantees outside of the compilation of the base collections
--- but "more deterministic" is likely to be useful in the short run,
and we can improve further in the long run.
Specialize a
(call-with-immediate-continuation-mark _key (lambda (_arg) _body) _def-val)
call to an internal
(with-immediate-continuation-mark [_arg (#%immediate _key _def_val)] _body)
form, which avoids a closure allocation and more.
This optimization is useful for contracts, which use
`call-with-immediate-continuation-mark` to avoid redundant
contract checks.
On OS X, it seems that access() can sometimes fail with EPERM
when checking for execute permission on a file without it.
I've previously seen this result when running as the superuser,
but that's apparently not the only possibility; a long path
may also be relevant.
Re-linking in a new namespace doesn't need the namespace of
compilation.
A "namespac.rktl" test exposed this problem, where the "transfer a
definition of a macro-introduced variable" test could fail if a GC
occurred between compilation in one namespace and evaluation in
another.
Although `eval-syntax` is not supposed to add the current namespace's
"outer edge" scope, it must add the "inner edge" scope to be consistent
with adding the inner edge to every intermediate expansion (as in
other definition contexts).
In addition, `eval`, `eval-syntax`, `expand`, and `expand-syntax`
did not cooperate properly with `local-expand` on the inner edge.
Some failure paths were missing an update before calling failure
code, and the new failure paths need to unconditionally update the
runstack pointer (because the common stub doesn't know whether the
calling context needs an update).
Genereating a use-site scope, instead of a macro-introduction scope,
prevents the scope's presense from triggering a #f result from
`syntax-original?`.
This change mostly reverts 1465ff25fc, which turned out to be a hassle
because it created more cyclic structure.
A simpler strategy is to allow a phase-specific scope to be detached
(perhaps temporarily, due to on-demand loading of bytecode) from its
group; when that's possible, the scope is not reachable from a place
where it can be moved to other syntax objects, so it's ok to be
detached. Debugging output needs to handle that gracefully, though.
Also, in case of broken bytecode, fix up a detached scope if it
does end up in an unexpected place.
Formerly, compiling a definition in one namespace and evaluating it in
another would cause the definition to take place in the original
namespace --- unless the compiled code is marshaled to a byte string
and back. Adjust the "linking" process to redirect the variable
definition and any references to the new namespace. (This is a change
relative to the compiler with the old macro expander.)
Also, repair a compiled `require` form along similar lines. (This is
*not* a change relative to the compiler with the old macro expander;
the mismatch is part of the motivation for changing `define`
handling.)
Add the current definition context's scope to any expression that is
produced by macro expansion before trying to expand again, in case the
expansion needs to refer to a definition introduced by a previous
expansion.
Previously, the scope was added before any expansion and after any
expansion, but that misses intermediate points.
The old expander had this bug, too (some of the new tests fail there),
but it showed up less often and was sometimes considered correct, for
various reasons.
I had tried to simplify the "generation 0" allocation function to
always use `GEN0_PAGE_SIZE`, but "generation 0" is also used for place
messages, in which case a much smaller size should be used.
The "place-in-channel-fnl.rkt" test exposed this problem.
A recent GC change (included with the set-of-scopes expander)
allows the GCs marking procedure to recur directly to a limited
depth, instead of always pushing pointers onto a stack. Direct
recursion is not cmopatible with ephemeron-resolution process,
so switch to no-recur mode.
This problem was uncovered by an existing test.
The combination of splitting a `letrec` and optimizing
the resulting `(let ([x <proc>]) x)` to just `<proc>`
used a bad coordinate shift, which made property testing
incorrect, etc.
For reasons that are not clear, the new expander triggered
the problem through an existing test.
The `eval-syntax` function (which is used by other functions, such as
loading a module) should not install fallback-binding scopes from
the current namespace.
When `(let ([x ...]) (let ([y x]) ... y ... y ...))` turns into
`(let ([x ...]) ... x ... x ...)`, make sure that `x` is not
still marked as single-use. Incorrect marking as single-use could
cause the optimizer to inline too much, for example.
Thanks to Gustavo for tracking down the problem.
Previously all the predicates recognized only non-#f things, so ´not´ can be
added to the list of disjoint predicates. But many of the parts of the code
relied on the non-#f property and had to be modified.
In (if (eq? x <pred?-expr>) <tbranch> <fbranch>) infer that the type of
x is pred? in the tbranch.
Also, reduce (eq? x y) => #f when the types are different.
The optimizer reduces the variables with a known type to #t in a Boolean context.
But some predicates imply that the variable has a definite values, so they can be
reduced in a non-Boolean context too.
For example, in (lambda (x) (if (null? x) x 0))) reduce the last x ==> null.
This fixes the bug twice:
* Don't reduce mutable variables with a type to #t in a Boolean context.
* Don't record the type of mutable variables when a predicate is
checked in a test condition.
While reducing some ignored constructors, the optimizer may wrap the arguments
<expr> in (values <expr>) to ensure that it's a single value non-cm expression.
This avoids the unnecessary nesting of (values (values <expr>)).
Similarly, add the cases for begin and begin0 to single_valued_noncm_expression
While `#:in-original-place? #t` provides one way to serialize
foreign calls, it acts as a single lock and requires expensive
context switches. Using an explicit lock can be more efficient
for serializing calls across different places.
For example, running "plot.scrbl" takes 70 seconds on my machine
in the original place and using `#:lock-name` in any place,
while it took 162 seconds in a non-main place with Cairo+Pango
serialization via `#:in-original-place? #t`.
Internally, the named lock combines compare-and-swap with a
place channel. That strategy gives good performance in the case
of no contention, and it cooperates properly with the Racket
scheduler where there is contention.
The optimizer was able to use the type information gained outside
the let's to reduce expressions inside the lets. For example, in
(lambda (z) (car z) (let ([o (random)]) (pair? z)))
it reduces (pair? z) ==> #t.
This enable the propagation in the other direction so in
(lambda (z) (let ([o (random)]) (car z)) (pair? z))
it reduces (pair? z) ==> #t too.
Using `(thread-resume t1 t2)` would not prevent a GC of t1, but it
would create an intermediate record to make the link from t1 to t2,
and that intermediate record would leak due to a missing level of
indirection in a table-cleanup traveral. The leak not only accumulated
memory, it also caused ever slower traversals of the table in an
attempt to clean up.
(Since the leak is small and the leaking object is not directly
accessible, I don't have a good idea on how to test this repair
automatically, but see the program in the PR.)
Closes PR 15099.
Modern OS configurations likely use an even larger buffer size, and
making it small can have substantial negative performance effects
(e.g., with PostgreSQL over TCP).
When AC_PROG_CC picks GCC, move its selection of CFLAGS
into CPPFLAGS, so that preprocessing will have the same
optimization and debugging flags as compilation.
Arguably, AC_PROG_CC plus AC_PROG_CPP should do that
soemhow, but it's understandable that the autoconf
implementers didn't cover the possibility of
preprocessing that changes with the optimization level.
Closes#945
When `local-require` is used in a non-phase-0 position and it is
`expand`ed (as opposed to compiled directly), then the generated
`#%require` form had the wrong binding phase.
Merge to v6.2
In many use cases the length of the vector is fixed and know,
so we are sure that make-vector will not raise an error and
we can recognize these expressions as omittable and drop
them when the result is ignored.
The result of some procedures is a vector, but they are not omittable
because they may rise an error. With the recent changes of the
predicate reduction these cases are correctly handled.
The optimizer checks the type of the argument of some unary procedures and
uses the gathered information to replace them by the unsafe version, reduce
predicates and detect type errors. This extends the checks to more procedures
that have no unsafe version and procedures that have more than one argument.
Use the given readtable more consistently to parse
delimiters in the top-level form. This change particularly
addresses problems with trying to restore the original
`(` when parsing a hash table, but allowing nested
forms to still use a different `(` mapping.
When determing whether expressions can be reordered, a reference to a
module-defined variable was considered unreorderable when it is
known to have a value and no further mutation, but the value isn't
constant across all runs.
The optimizer had some reductions of predicates applications, like (pair? X),
only when X was very simple and the type of X was obvious.
Use expr_implies_predicate and make_discarding_sequence to allow
the reduction of more complex expressions.
Also, the reduction of procedure? and fixnum? were special cases in
optimize_application2. Move the checks to expr_implies_predicate
to take advantage of the reductions in more general cases.
Use `syntax-track-origin` and 'disappeared-use properties to
communicate `require` and `provide` form bindings to tools such as
Check Syntax.
Relevant to PR 13186
When a structure type has `prop:inpersonator-of`, follow it
when attemptng to access imperonator properties.
This change fixes a problem with `impersonate-procedure` as
reported by Scott Moore.
The compiler/expander attempted to clear out references in a namespace
used only during macro expansion, but it's possible for references to
be retained (via unusual macros), so get rid of the broken attempt to
help the GC.
Gustavo's tests in de3fa9a855 illustrate the problem. The solution
is simply passing 1 for `optimized_rator` to optimize_for_inline().
Additional changes generalize optimize_for_inline() a little (although
that generality doesn't seem to be useful at the moment) and collapse
some variables that represent the same value.
A new `--enable-ios=<sdk-path>` flag in combination with `--host=...`
sets up the right compiler options for compiling the Racket runtime
system as a framework to use in an iOS application.
I don't know whether the resulting framework actually works, but
compiling and linking is a step forward.
scheme_optimize_apply_values reduces (call-with-values gen proc)
to (#%apply-values proc gen) when recognizes proc as a procedure.
This extends the expressions that are recognized as procedures.
Instead of delaying the registration of some constants until a
group of expressions is re-optimized, add constant information as
it is discovered, which can expose some additional optimizations.
The old grouping was probably aimed at avoiding excessive code growth,
but I think that other and better controls are now in place. The
overall size of ".zo" files in an installation did not grow
significantly with this change.
Closes PR 14978
For detecting and debugging accidental dependencies on hash-table
order, it might be helpful to invert the order at the lowest level. To
do that, uncomment `#define REVERSE_HASH_TABLE_ORDER` in "hash.c".
The macro expander formerly put all lifted requires at the start of a
module, but that doesn't work with re-expansion if a module has
submodules and lifted requires that refer to submodules. Put lifted
submodules in the right place, instead: just before the form whose
expansion added the lifted require.
Racket wasn't reparsing correctly; the strategy worked ok
for links created by `mklink`, but not with other tools that
leave the "printed name" field blank.
A consequence of various fixes is that reparse points like
"My Documents" (in a typical configuration) correctly resolve
to actual paths like "Documents".
Finally, `directory-exists?` didn't handle root directories like
"C:/" correctly. The query would actually report properties of
the OS-level current working directory, and when junctions are
involved, the current directory can be a link instead of a directory.
Relevant to PR 14950 and PR 14912
Unlike `collapse-module-path`, it makes sense for
`collapse-module-path-index` to convert a relative module path index
to a plain module path. In other words, `collapse-module-path-index`
can convert a module path index to a module path.
Optimization can cause a `lambda` that was going to refer to a
top-level variable or syntax object to not refer to it after all.
Ideally, the prefix should be dropped from the closure, but
the change here is more conservative: it fixes the `lambda`s
annotation that's used by the GC to indicate that nothing will
be used from the prefix.
For GC purposes, if a "prefix" (a closure frame that caprues
top-level or module-level bindings) may refer to syntax objects
that are not used by any reachable closure, in which case the
syntax object can be dropped. This pruning of syntax objects
uses the infrastructure already in place to prune variables.
Syntax objects were not included in the original pruning
implementation, because they are unlikely to create
finalization cycles in the way that global-variable
references can. A syntax object can retain a namespace's
table of module imports, however, which can be substantial
and worth releasing of a closure is only held, say, for
a low-level finalization action.
Although names were cleared correctly, the trie used for
the mapping was not pruned correctly, so lots of empty
branches could accumulate (especially in 64-bit mode).
Even when `(variable-reference-constant? (#%variable-reference ....))`
cannot be optimized to a boolean, the expression should not retain a
reference to the enclosing namespace. That space guarantee is
important for the compilation of calls to keyword-accepting functions.
The handling of `for-template` imports by `namespace-attach-module`
didn't match the docs. The actual handling was to refrain from
attaching instances of a phase-0 module if the instance was reachable
only through a `for-template`. The rationale had to do with such
modules instances being created only through instantiation of
phase-1 modules, and phase-1 module instances aren't attached;
it doesn't work well that way, though, when different modules
are attached with intervening `namespace-require`s on the target
namespace.
The change includes a documentation correction. Previously and still,
only modules at the same phase as the attached module (as opposed to
the same phase or less) are instantiated in the target namespace.
Closes PR 14938
If a file or directory delete fails, try adjusting the file or directory
permissions to allow writes, then try deleting again. This process should
provide a more Unix-like experience and make programs behave more
consistently.
A new `current-force-delete-permissions` parameter provides access to
the raw native behavior.
Check for an empty path after dropping `"`s, instead of before.
Otherwise, a bad PATH setting interferes with functions like
`find-executable-path`, which in turn can prevent DrRacket from
starting up.
Closes PR 14930
If the slow path has to be taken because the number of
list elements is greater than the stack size, then the
old implementation would copy all the arguments --- which
still might be too much for the available stack space.
Avoid that copy.
Also, add pad word to the end of the stack to help detect
overflow.
For example, reduce (begin x (error 'e) y) ==> (begin x (error 'e)) and
(f (error 'e) y ) ==> (begin f (error 'e)).
Also, reduce (if (error 'e) x y) ==> (error 'e) and propagate the type information
and clocks when only one branch produce an error.
- Modify the features used by OpenBSD (not everything was
tested). Mostly copied from Linux, FreeBSD and NetBSD.
- Add support for Bitrig, a fork of OpenBSD. Eventually
they will differ more and more from OpenBSD.
- Typos and extra trailing spaces.
- Update config.guess and config.sub from GNU.
The implementation of caching stack-trace information in the
stack didn't work right in libunwind mode, with the result that
`(current-continuatiom-marks)` took O(N) time for a continuation
of size N, when it should be amortized constant time.
A value-printing truncation discovered after a stack-overflow handle
and return could go badly, because the truncation escape wasn't
reset correctly after overflow handling (in contrast to truncation
discovered during the overflow handling, which was handled correctly).
Closes PR 14870