JIT inlines some floating-point ops
svn: r8144
This commit is contained in:
parent
2abe742b98
commit
6ccffdffdb
|
@ -106,15 +106,7 @@ printed output.
|
||||||
@subsection[#:tag "ephemerons"]{Ephemerons}
|
@subsection[#:tag "ephemerons"]{Ephemerons}
|
||||||
|
|
||||||
@; ----------------------------------------------------------------------
|
@; ----------------------------------------------------------------------
|
||||||
@section[#:tag "performance"]{Performance}
|
@include-section["performance.scrbl"]
|
||||||
|
|
||||||
Every definition or expression is compiled to an internal bytecode
|
|
||||||
format. Standard optimizations are applied when compiling the
|
|
||||||
bytecode. For example, in an environment where @scheme[+] has its
|
|
||||||
usual binding, the expression @scheme[(let ([x 1][y (lambda () 4)]) (+
|
|
||||||
1 (y)))] is compiled the same as the constant @scheme[5] due to
|
|
||||||
constant propagation, constant folding, and inlining optimizations.
|
|
||||||
|
|
||||||
|
|
||||||
@; ----------------------------------------------------------------------
|
@; ----------------------------------------------------------------------
|
||||||
@section[#:tag "ffi"]{Foreign-Function Interface@aux-elem{ (FFI)}}
|
@section[#:tag "ffi"]{Foreign-Function Interface@aux-elem{ (FFI)}}
|
||||||
|
|
318
collects/scribblings/guide/performance.scrbl
Normal file
318
collects/scribblings/guide/performance.scrbl
Normal file
|
@ -0,0 +1,318 @@
|
||||||
|
#lang scribble/doc
|
||||||
|
@(require scribble/manual
|
||||||
|
"guide-utils.ss")
|
||||||
|
|
||||||
|
@title[#:tag "performance"]{Performance}
|
||||||
|
|
||||||
|
Alan Perlis famously quipped ``Lisp programmers know the value of
|
||||||
|
everything and the cost of nothing.'' A Scheme programmer knows, for
|
||||||
|
example, that a @scheme[lambda] anywhere in a program produces a value
|
||||||
|
that is closed over it lexical environment---but how much does
|
||||||
|
allocating that value cost? While most programmers have a reasonable
|
||||||
|
grasp of the cost of various operations and data structures at the
|
||||||
|
machine level, the gap between the Scheme language model and the
|
||||||
|
underlying computing machinery can be quite large.
|
||||||
|
|
||||||
|
In this chapter, we narrow the gap by explaining details of the PLT
|
||||||
|
Scheme compiler and run-time system and how they affect the run-time
|
||||||
|
and memory performance of Scheme code.
|
||||||
|
|
||||||
|
@; ----------------------------------------------------------------------
|
||||||
|
|
||||||
|
@section{The Bytecode and Just-in-Time (JIT) Compilers}
|
||||||
|
|
||||||
|
Every definition or expression to be evaluated by Scheme is compiled
|
||||||
|
to an internal bytecode format. In interactive mode, this compilation
|
||||||
|
occurs automatically and on-the-fly. Tools like @exec{setup-plt} and
|
||||||
|
@scheme[compile-file] marshal compiled bytecode to a file. Most of the
|
||||||
|
time required to compile a file is actually in macro expansion;
|
||||||
|
generating bytecode from fully expanded code is relatively fast.
|
||||||
|
|
||||||
|
The bytecode compiler applies all standard optimizations, such as
|
||||||
|
constant propagation, constant folding, inlining, and dead-code
|
||||||
|
elimination. For example, in an environment where @scheme[+] has its
|
||||||
|
usual binding, the expression @scheme[(let ([x 1][y (lambda () 4)]) (+
|
||||||
|
1 (y)))] is compiled the same as the constant @scheme[5].
|
||||||
|
|
||||||
|
On some platforms, bytecode is further compiled to native code via a
|
||||||
|
@deftech{jut-in-time} or @deftech{JIT} compiler. The @tech{JIT}
|
||||||
|
compiler substantially speed programs that execute tight loops,
|
||||||
|
arithmetic on small integers, and arithmetic on inexact real
|
||||||
|
numbers. Currently, @tech{JIT} compilation is supported for x86,
|
||||||
|
x86_64 (a.k.a. AMD64), and 32-bit PowerPC processors. The @tech{JIT}
|
||||||
|
compiler can be disabled via the @scheme[eval-jit-enabled] parameter
|
||||||
|
or the @DFlag{no-jit}/@Flag{j} command-line flag.
|
||||||
|
|
||||||
|
The @tech{JIT} compiler works incrementally as functions are applied,
|
||||||
|
but the @tech{JIT} compiler makes only limited use of run-time
|
||||||
|
information when compiling procedures, since the code for a given
|
||||||
|
module body or @scheme[lambda] abstraction is compiled only once. The
|
||||||
|
@tech{JIT}'s granularity of compilation is a single procedure body,
|
||||||
|
not counting the bodies of any lexically nested procedures. The
|
||||||
|
overhead for @tech{JIT} compilation is normally so small that it is
|
||||||
|
difficult to detect.
|
||||||
|
|
||||||
|
@; ----------------------------------------------------------------------
|
||||||
|
|
||||||
|
@section{Modules and Performance}
|
||||||
|
|
||||||
|
The module system aids optimization by helping to ensure that
|
||||||
|
identifiers have the usual bindings. That is, the @scheme[+] provided
|
||||||
|
by @schememodname[scheme/base] can be recognized by the compiler and
|
||||||
|
inlined, which is especially imported for @tech{JIT}-compiled code.
|
||||||
|
In contrast, in a traditional interactive Scheme system, the top-level
|
||||||
|
@scheme[+] binding might be redefined, so the compiler cannot assume a
|
||||||
|
fixed @scheme[+] binding (unless special flags or declarations
|
||||||
|
act as a poor-man's module system to indicate otherwise).
|
||||||
|
|
||||||
|
Even in the top-level environment, importing with @scheme[require]
|
||||||
|
enables some inlining optimizations. Although a @scheme[+] definition
|
||||||
|
at the top level might shadow an imported @scheme[+], the shadowing
|
||||||
|
definition applies only to expressions evaluated later.
|
||||||
|
|
||||||
|
Within a module, inlining and constant-propagation optimizations take
|
||||||
|
additional advantage of the fact that definitions within a module
|
||||||
|
cannot be mutated when no @scheme[set!] is visable at compile
|
||||||
|
time. Such optimizations are unavailable in the top-level
|
||||||
|
environment. Although this optimization within modules is important
|
||||||
|
for performance, it hinders some forms of interactive development and
|
||||||
|
exploration. The @scheme[compile-enforce-module-constants] parameter
|
||||||
|
disables the @tech{JIT} compiler's assumptions about module
|
||||||
|
definitions when interactive exploration is more important. See
|
||||||
|
@secref["module-set"] for more information.
|
||||||
|
|
||||||
|
Currently, the compiler does not attempt to inline or propagate
|
||||||
|
constant across module boundary, except for exports of the built-in
|
||||||
|
modules (such as the one that originally provides @scheme[+]).
|
||||||
|
|
||||||
|
The later section @secref["letrec-performance"] provides some
|
||||||
|
additional caveats concerning inlining of module bindings.
|
||||||
|
|
||||||
|
@; ----------------------------------------------------------------------
|
||||||
|
|
||||||
|
@section[#:tag "func-call-performance"]{Function-Call Optimizations}
|
||||||
|
|
||||||
|
When the compiler detects a function call to an immediately visible
|
||||||
|
function, it generates more efficient code than for a generic call,
|
||||||
|
especially for tail calls. For example, given the program
|
||||||
|
|
||||||
|
@schemeblock[
|
||||||
|
(letrec ([odd (lambda (x)
|
||||||
|
(if (zero? x)
|
||||||
|
#f
|
||||||
|
(even (sub1 x))))]
|
||||||
|
[even (lambda (x)
|
||||||
|
(if (zero? x)
|
||||||
|
#t
|
||||||
|
(odd (sub1 x))))])
|
||||||
|
(odd 40000000))
|
||||||
|
]
|
||||||
|
|
||||||
|
the compiler can detect the @scheme[odd]--@scheme[even] loop and
|
||||||
|
produce code that runs much faster via loop unrolling and related
|
||||||
|
optimizations.
|
||||||
|
|
||||||
|
Within a module form, @scheme[define]d variables are lexically scoped
|
||||||
|
like @scheme[letrec] bindings, and definitions within a module
|
||||||
|
therefore permit call optimizations, so
|
||||||
|
|
||||||
|
@schemeblock[
|
||||||
|
(define (odd x) ....)
|
||||||
|
(define (even x) ....)
|
||||||
|
]
|
||||||
|
|
||||||
|
within a module would perform the same as the @scheme[letrec] version.
|
||||||
|
|
||||||
|
Primitive operations like @scheme[pair?], @scheme[car], and
|
||||||
|
@scheme[cdr] are inlined at the machine-code level by the @tech{JIT}
|
||||||
|
compiler. See also the later section @secref["fixnums+flonums"] for
|
||||||
|
information about inlined arithmetic operations.
|
||||||
|
|
||||||
|
@; ----------------------------------------------------------------------
|
||||||
|
|
||||||
|
@section{Mutation and Performance}
|
||||||
|
|
||||||
|
Using @scheme[set!] to mutate a variable can lead to bad
|
||||||
|
performance. For example, the microbenchmark
|
||||||
|
|
||||||
|
@schememod[
|
||||||
|
scheme/base
|
||||||
|
|
||||||
|
(define (subtract-one x)
|
||||||
|
(set! x (sub1 x))
|
||||||
|
x)
|
||||||
|
|
||||||
|
(time
|
||||||
|
(let loop ([n 4000000])
|
||||||
|
(if (zero? n)
|
||||||
|
'done
|
||||||
|
(loop (subtract-one n)))))
|
||||||
|
]
|
||||||
|
|
||||||
|
runs much more slowly than the equivalent
|
||||||
|
|
||||||
|
@schememod[
|
||||||
|
scheme/base
|
||||||
|
|
||||||
|
(define (subtract-one x)
|
||||||
|
(sub1 x))
|
||||||
|
|
||||||
|
(time
|
||||||
|
(let loop ([n 4000000])
|
||||||
|
(if (zero? n)
|
||||||
|
'done
|
||||||
|
(loop (subtract-one n)))))
|
||||||
|
]
|
||||||
|
|
||||||
|
In the first variant, a new location is allocated for @scheme[x] on
|
||||||
|
every iteration, leading to poor performance. A more clever compiler
|
||||||
|
could unravel the use of @scheme[set!] in the first example, but since
|
||||||
|
mutation is discouraged (see @secref["using-set!"]), the compiler's
|
||||||
|
effort is spent elsewhere.
|
||||||
|
|
||||||
|
More significantly, mutation can obscure bindings where inlining and
|
||||||
|
constant-propagation might otherwise apply. For example, in
|
||||||
|
|
||||||
|
@schemeblock[
|
||||||
|
(let ([minus1 #f])
|
||||||
|
(set! minus1 sub1)
|
||||||
|
(let loop ([n 4000000])
|
||||||
|
(if (zero? n)
|
||||||
|
'done
|
||||||
|
(loop (minus1 n)))))
|
||||||
|
]
|
||||||
|
|
||||||
|
the @scheme[set!] obscures the fact that @scheme[minus1] is just
|
||||||
|
another name for the built-in @scheme[sub1].
|
||||||
|
|
||||||
|
@; ----------------------------------------------------------------------
|
||||||
|
|
||||||
|
@section[#:tag "letrec-performance"]{@scheme[letrec] Performance}
|
||||||
|
|
||||||
|
When @scheme[letrec] is used to bind only procedures and literals,
|
||||||
|
then the compiler can treat the bindings in an optimal manner,
|
||||||
|
compiling uses of the bindings efficiently. When other kinds of
|
||||||
|
bindings are mixed with procedures, the compiler may be less able to
|
||||||
|
determine the control flow.
|
||||||
|
|
||||||
|
For example,
|
||||||
|
|
||||||
|
@schemeblock[
|
||||||
|
(letrec ([loop (lambda (x)
|
||||||
|
(if (zero? x)
|
||||||
|
'done
|
||||||
|
(loop (next x))))]
|
||||||
|
[junk (display loop)]
|
||||||
|
[next (lambda (x) (sub1 x))])
|
||||||
|
(loop 40000000))
|
||||||
|
]
|
||||||
|
|
||||||
|
likely compiles to less efficient code than
|
||||||
|
|
||||||
|
@schemeblock[
|
||||||
|
(letrec ([loop (lambda (x)
|
||||||
|
(if (zero? x)
|
||||||
|
'done
|
||||||
|
(loop (next x))))]
|
||||||
|
[next (lambda (x) (sub1 x))])
|
||||||
|
(loop 40000000))
|
||||||
|
]
|
||||||
|
|
||||||
|
In the first case, the compiler likely does not know that
|
||||||
|
@scheme[display] does not call @scheme[loop]. If it did, then
|
||||||
|
@scheme[loop] might refer to @scheme[next] before the binding is
|
||||||
|
available.
|
||||||
|
|
||||||
|
This caveat about @scheme[letrec] also applies to definitions of
|
||||||
|
functions and constants within modules. A definition sequence in a
|
||||||
|
module body is analogous to a sequence of @scheme[letrec] bindings,
|
||||||
|
and non-constant expressions in a module body can interfere with the
|
||||||
|
optimization of references to later bindings.
|
||||||
|
|
||||||
|
@; ----------------------------------------------------------------------
|
||||||
|
|
||||||
|
@section[#:tag "fixnums+flonums"]{Fixnum and Flonum Optimizations}
|
||||||
|
|
||||||
|
A @deftech{fixnum} is a small exact integer. In this case, ``small''
|
||||||
|
depends on the platform. For a 32-bit machine, numbers that can be
|
||||||
|
expressed in 30 bits plus a sign bit are represented as fixnums. On a
|
||||||
|
64-bit machine, 62 bits plus a sign bit are available.
|
||||||
|
|
||||||
|
A @deftech{flonum} is used to represent any inexact real number. They
|
||||||
|
correspond to 64-bit IEEE floating-point numbers on all platforms.
|
||||||
|
|
||||||
|
Inlined fixnum and flonum arithmetic operations are among the most
|
||||||
|
important advantages of the @tech{JIT} compiler. For example, when
|
||||||
|
@scheme[+] is applied to two arguments, the generated machine code
|
||||||
|
tests whether the two arguments are fixnums, and if so, it uses the
|
||||||
|
machine's instruction to add the numbers (and check for overflow). If
|
||||||
|
the two numbers are not fixnums, then the next check whether whether
|
||||||
|
both are flonums; in that case, the machine's floating-point
|
||||||
|
operations are used directly. For functions that take any number of
|
||||||
|
arguments, such as @scheme[+], inlining is applied only for the
|
||||||
|
two-argument case (except for @scheme[-], whose one-argument case is
|
||||||
|
also inlined).
|
||||||
|
|
||||||
|
Flonums are @defterm{boxed}, which means that memory is allocated to
|
||||||
|
hold every result of a flonum computation. Fortunately, the
|
||||||
|
generational garbage collector (described later in @secref["gc-perf"])
|
||||||
|
makes allocation for short-lived results reasonably cheap. Fixnums, in
|
||||||
|
contrast are never boxed, so they are especially cheap to use.
|
||||||
|
|
||||||
|
@; ----------------------------------------------------------------------
|
||||||
|
|
||||||
|
@section[#:tag "gc-perf"]{Memory Management}
|
||||||
|
|
||||||
|
PLT Scheme is available in two variants: @deftech{3m} and
|
||||||
|
@deftech{CGC}. The @tech{3m} variant uses a modern,
|
||||||
|
@deftech{generational garbage collector} that makes allocation
|
||||||
|
relatively cheap for short-lived objects. The @tech{CGC} variant uses
|
||||||
|
a @deftech{conservative garbage collector} which facilitates
|
||||||
|
interaction with C code at the expense of both precision and speed for
|
||||||
|
Scheme memory management. The 3m variant is the standard one.
|
||||||
|
|
||||||
|
Although memory allocation is reasonably cheap, avoiding allocation
|
||||||
|
altogether is normally faster. One particular place where allocation
|
||||||
|
can be avoided sometimes is in @deftech{closures}, which are the
|
||||||
|
run-time representation of functions that contain free variables.
|
||||||
|
For example,
|
||||||
|
|
||||||
|
@schemeblock[
|
||||||
|
(let loop ([n 40000000][prev-thunk (lambda () #f)])
|
||||||
|
(if (zero? n)
|
||||||
|
(prev-thunk)
|
||||||
|
(loop (sub1 n)
|
||||||
|
(lambda () n))))
|
||||||
|
]
|
||||||
|
|
||||||
|
allocates a closure on every iteration, since @scheme[(lambda () n)]
|
||||||
|
effectively saves @scheme[n].
|
||||||
|
|
||||||
|
The compiler can eliminate many closures automatically. For example,
|
||||||
|
in
|
||||||
|
|
||||||
|
@schemeblock[
|
||||||
|
(let loop ([n 40000000][prev-val #f])
|
||||||
|
(let ([prev-thunk (lambda () n)])
|
||||||
|
(if (zero? n)
|
||||||
|
prev-val
|
||||||
|
(loop (sub1 n) (prev-thunk)))))
|
||||||
|
]
|
||||||
|
|
||||||
|
no closure is ever allocated for @scheme[prev-thunk], because its only
|
||||||
|
application is visible, and so it is inlined. Similarly, in
|
||||||
|
|
||||||
|
@schemeblock[
|
||||||
|
(let n-loop ([n 400000])
|
||||||
|
(if (zero? n)
|
||||||
|
'done
|
||||||
|
(let m-loop ([m 100])
|
||||||
|
(if (zero? m)
|
||||||
|
(n-loop (sub1 n))
|
||||||
|
(m-loop (sub1 m))))))
|
||||||
|
]
|
||||||
|
|
||||||
|
then the expansion of the @scheme[let] form to implement
|
||||||
|
@scheme[m-loop] involves a closure over @scheme[n], but the compiler
|
||||||
|
automatically converts the closure to pass itself @scheme[n] as an
|
||||||
|
argument instead.
|
|
@ -45,7 +45,7 @@ greeted
|
||||||
]
|
]
|
||||||
|
|
||||||
@;------------------------------------------------------------------------
|
@;------------------------------------------------------------------------
|
||||||
@section{Guidelines for Using Assignment}
|
@section[#:tag "using-set!"]{Guidelines for Using Assignment}
|
||||||
|
|
||||||
Although using @scheme[set!] is sometimes appropriate, Scheme style
|
Although using @scheme[set!] is sometimes appropriate, Scheme style
|
||||||
generally discourages the use of @scheme[set!]. The following
|
generally discourages the use of @scheme[set!]. The following
|
||||||
|
|
|
@ -179,6 +179,13 @@ exec mzscheme -qu "$0" ${1+"$@"}
|
||||||
extract-mzscheme-times
|
extract-mzscheme-times
|
||||||
clean-up-nothing
|
clean-up-nothing
|
||||||
mutable-pair-progs)
|
mutable-pair-progs)
|
||||||
|
(make-impl 'mz-old
|
||||||
|
mk-mzscheme
|
||||||
|
(lambda (bm)
|
||||||
|
(system (format "mz-old -u ~a.ss" bm)))
|
||||||
|
extract-mzscheme-times
|
||||||
|
clean-up-nothing
|
||||||
|
mutable-pair-progs)
|
||||||
(make-impl 'mzschemecgc
|
(make-impl 'mzschemecgc
|
||||||
mk-mzscheme
|
mk-mzscheme
|
||||||
(lambda (bm)
|
(lambda (bm)
|
||||||
|
|
|
@ -39,7 +39,7 @@
|
||||||
(ziq (* zi zi)))
|
(ziq (* zi zi)))
|
||||||
(cond
|
(cond
|
||||||
((> (+ zrq ziq) +limit-sqr+) 0)
|
((> (+ zrq ziq) +limit-sqr+) 0)
|
||||||
(else (loop (add1 i) (+ (- zrq ziq) cr) (+ (* 2.0 zr zi) ci)))))))))
|
(else (loop (add1 i) (+ (- zrq ziq) cr) (+ (* 2.0 (* zr zi)) ci)))))))))
|
||||||
|
|
||||||
;; -------------------------------
|
;; -------------------------------
|
||||||
|
|
||||||
|
|
|
@ -65,12 +65,23 @@
|
||||||
(bin0 v op arg1 arg2))]
|
(bin0 v op arg1 arg2))]
|
||||||
[bin (lambda (v op arg1 arg2)
|
[bin (lambda (v op arg1 arg2)
|
||||||
(bin-exact v op arg1 arg2)
|
(bin-exact v op arg1 arg2)
|
||||||
(let ([iv (if (number? v)
|
(let* ([iv (if (number? v)
|
||||||
(exact->inexact v)
|
(exact->inexact v)
|
||||||
v)])
|
v)]
|
||||||
|
[iv0 (if (and (memq op '(* /)) (zero? iv))
|
||||||
|
0
|
||||||
|
iv)])
|
||||||
(bin0 iv op (exact->inexact arg1) arg2)
|
(bin0 iv op (exact->inexact arg1) arg2)
|
||||||
(bin0 iv op arg1 (exact->inexact arg2))
|
(bin0 iv0 op arg1 (exact->inexact arg2))
|
||||||
(bin0 iv op (exact->inexact arg1) (exact->inexact arg2))))]
|
(bin0 iv op (exact->inexact arg1) (exact->inexact arg2)))
|
||||||
|
(let ([iv (if (number? v)
|
||||||
|
(if (eq? op '*)
|
||||||
|
(/ v (* 33333 33333))
|
||||||
|
(if (eq? op '/)
|
||||||
|
v
|
||||||
|
(/ v 33333)))
|
||||||
|
v)])
|
||||||
|
(bin0 iv op (/ arg1 33333) (/ arg2 33333))))]
|
||||||
[tri0 (lambda (v op get-arg1 arg2 arg3 check-effect)
|
[tri0 (lambda (v op get-arg1 arg2 arg3 check-effect)
|
||||||
;; (printf "Trying ~a ~a ~a\n" op (get-arg1) arg2 arg3);
|
;; (printf "Trying ~a ~a ~a\n" op (get-arg1) arg2 arg3);
|
||||||
(let ([name `(,op ,get-arg1 ,arg2, arg3)])
|
(let ([name `(,op ,get-arg1 ,arg2, arg3)])
|
||||||
|
@ -201,6 +212,19 @@
|
||||||
(un -5 'sub1 -4)
|
(un -5 'sub1 -4)
|
||||||
(un (- (expt 2 30)) 'sub1 (- 1 (expt 2 30)))
|
(un (- (expt 2 30)) 'sub1 (- 1 (expt 2 30)))
|
||||||
|
|
||||||
|
(un -1 '- 1)
|
||||||
|
(un 1 '- -1)
|
||||||
|
(un (- (expt 2 30)) '- (expt 2 30))
|
||||||
|
(un (expt 2 30) '- (- (expt 2 30)))
|
||||||
|
(un -0.0 '- 0.0)
|
||||||
|
(un 0.0 '- -0.0)
|
||||||
|
|
||||||
|
(un 0 'abs 0)
|
||||||
|
(un 1 'abs 1)
|
||||||
|
(un 1 'abs -1)
|
||||||
|
(un (sub1 (expt 2 31)) 'abs (sub1 (expt 2 31)))
|
||||||
|
(un (sub1 (expt 2 31)) 'abs (add1 (expt -2 31)))
|
||||||
|
|
||||||
(bin 11 '+ 4 7)
|
(bin 11 '+ 4 7)
|
||||||
(bin -3 '+ 4 -7)
|
(bin -3 '+ 4 -7)
|
||||||
(bin (expt 2 30) '+ (expt 2 29) (expt 2 29))
|
(bin (expt 2 30) '+ (expt 2 29) (expt 2 29))
|
||||||
|
@ -213,6 +237,21 @@
|
||||||
(bin (- (expt 2 30)) '- (- (expt 2 29)) (expt 2 29))
|
(bin (- (expt 2 30)) '- (- (expt 2 29)) (expt 2 29))
|
||||||
(bin (- 2 (expt 2 31)) '- (- 1 (expt 2 30)) (sub1 (expt 2 30)))
|
(bin (- 2 (expt 2 31)) '- (- 1 (expt 2 30)) (sub1 (expt 2 30)))
|
||||||
|
|
||||||
|
(bin 4 '* 1 4)
|
||||||
|
(bin 0 '* 0 4)
|
||||||
|
(bin 12 '* 3 4)
|
||||||
|
(bin -12 '* -3 4)
|
||||||
|
(bin -12 '* 3 -4)
|
||||||
|
(bin 12 '* -3 -4)
|
||||||
|
|
||||||
|
(bin 0 '/ 0 4)
|
||||||
|
(bin 1/4 '/ 1 4)
|
||||||
|
(bin 4 '/ 4 1)
|
||||||
|
(bin 4 '/ 16 4)
|
||||||
|
(bin -4 '/ -16 4)
|
||||||
|
(bin -4 '/ 16 -4)
|
||||||
|
(bin 4 '/ -16 -4)
|
||||||
|
|
||||||
(bin 3 'min 3 300)
|
(bin 3 'min 3 300)
|
||||||
(bin -300 'min 3 -300)
|
(bin -300 'min 3 -300)
|
||||||
(bin -400 'min -400 -300)
|
(bin -400 'min -400 -300)
|
||||||
|
|
|
@ -59,6 +59,10 @@
|
||||||
END_XFORM_ARITH;
|
END_XFORM_ARITH;
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
|
#ifdef MZ_USE_JIT_I386
|
||||||
|
# define JIT_USE_FP_OPS
|
||||||
|
#endif
|
||||||
|
|
||||||
#ifdef MZ_USE_JIT_X86_64
|
#ifdef MZ_USE_JIT_X86_64
|
||||||
# define MZ_USE_JIT_I386
|
# define MZ_USE_JIT_I386
|
||||||
# define JIT_X86_64
|
# define JIT_X86_64
|
||||||
|
@ -1099,6 +1103,21 @@ static int inline_alloc(mz_jit_state *jitter, int amt, Scheme_Type ty, int keep_
|
||||||
}
|
}
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
|
#ifdef JIT_USE_FP_OPS
|
||||||
|
# define INLINE_FP_COMP
|
||||||
|
# ifdef CAN_INLINE_ALLOC
|
||||||
|
# define INLINE_FP_OPS
|
||||||
|
# endif
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#if defined(INLINE_FP_OPS) && !defined(CAN_INLINE_ALLOC)
|
||||||
|
static double double_result;
|
||||||
|
static void *malloc_double(void)
|
||||||
|
{
|
||||||
|
return scheme_make_double(double_result);
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
/*========================================================================*/
|
/*========================================================================*/
|
||||||
/* bytecode properties */
|
/* bytecode properties */
|
||||||
/*========================================================================*/
|
/*========================================================================*/
|
||||||
|
@ -2325,6 +2344,9 @@ static jit_insn *generate_arith_slow_path(mz_jit_state *jitter, Scheme_Object *r
|
||||||
jit_insn **_ref, jit_insn **_ref4,
|
jit_insn **_ref, jit_insn **_ref4,
|
||||||
jit_insn **for_branch,
|
jit_insn **for_branch,
|
||||||
int orig_args, int reversed, int arith, int use_v, int v)
|
int orig_args, int reversed, int arith, int use_v, int v)
|
||||||
|
/* *_ref is place to set for where to jump (for true case, if for_branch) after completing;
|
||||||
|
*_ref4 is place to set for where to jump for false if for_branch;
|
||||||
|
result is place to jump to start slow path if fixnum attempt fails */
|
||||||
{
|
{
|
||||||
jit_insn *ref, *ref4, *refslow;
|
jit_insn *ref, *ref4, *refslow;
|
||||||
|
|
||||||
|
@ -2394,11 +2416,210 @@ static jit_insn *generate_arith_slow_path(mz_jit_state *jitter, Scheme_Object *r
|
||||||
# define SCHEME_INT_SMALL_ENOUGH(rand2) 1
|
# define SCHEME_INT_SMALL_ENOUGH(rand2) 1
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
|
static int can_fast_double(int arith, int cmp, int two_args)
|
||||||
|
{
|
||||||
|
#ifdef INLINE_FP_OPS
|
||||||
|
if ((arith == 1)
|
||||||
|
|| (arith == -1)
|
||||||
|
|| (arith == 2)
|
||||||
|
|| (arith == -2)
|
||||||
|
|| (arith == 11))
|
||||||
|
return 1;
|
||||||
|
#endif
|
||||||
|
#ifdef INLINE_FP_COMP
|
||||||
|
if (!arith
|
||||||
|
|| ((arith == 9) /* min */ && two_args)
|
||||||
|
|| ((arith == 10) /* max */ && two_args))
|
||||||
|
return 1;
|
||||||
|
#endif
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
static int generate_double_arith(mz_jit_state *jitter, int arith, int cmp, int reversed, int two_args, int second_const,
|
||||||
|
jit_insn **_refd, jit_insn **_refdt,
|
||||||
|
int branch_short)
|
||||||
|
{
|
||||||
|
#if defined(INLINE_FP_OPS) || defined(INLINE_FP_COMP)
|
||||||
|
GC_CAN_IGNORE jit_insn *ref8, *ref9, *ref10, *refd, *refdt, *refskip = NULL;
|
||||||
|
int no_alloc = 0;
|
||||||
|
|
||||||
|
/* Maybe they're doubles */
|
||||||
|
__START_SHORT_JUMPS__(1);
|
||||||
|
if (two_args) {
|
||||||
|
jit_orr_ul(JIT_R2, JIT_R0, JIT_R1);
|
||||||
|
ref8 = jit_bmsi_ul(jit_forward(), JIT_R2, 0x1);
|
||||||
|
} else
|
||||||
|
ref8 = NULL;
|
||||||
|
jit_ldxi_s(JIT_R2, JIT_R0, &((Scheme_Object *)0x0)->type);
|
||||||
|
ref9 = jit_bnei_p(jit_forward(), JIT_R2, scheme_double_type);
|
||||||
|
if (two_args) {
|
||||||
|
jit_ldxi_s(JIT_R2, JIT_R1, &((Scheme_Object *)0x0)->type);
|
||||||
|
ref10 = jit_bnei_p(jit_forward(), JIT_R2, scheme_double_type);
|
||||||
|
} else
|
||||||
|
ref10 = NULL;
|
||||||
|
CHECK_LIMIT();
|
||||||
|
|
||||||
|
if (!two_args && !second_const && ((arith == 2) || ((arith == -2) && reversed))) {
|
||||||
|
/* Special case: multiplication by exact 0 */
|
||||||
|
jit_movi_p(JIT_R0, scheme_make_integer(0));
|
||||||
|
} else {
|
||||||
|
__END_SHORT_JUMPS__(1);
|
||||||
|
|
||||||
|
/* Yes, they're doubles. */
|
||||||
|
jit_ldxi_d(JIT_FPR0, JIT_R0, &((Scheme_Double *)0x0)->double_val);
|
||||||
|
if (two_args) {
|
||||||
|
jit_ldxi_d(JIT_FPR1, JIT_R1, &((Scheme_Double *)0x0)->double_val);
|
||||||
|
} else if ((arith == -1) && !second_const && reversed) {
|
||||||
|
reversed = 0;
|
||||||
|
} else {
|
||||||
|
double d = second_const;
|
||||||
|
jit_movi_d(JIT_FPR1, d);
|
||||||
|
reversed = !reversed;
|
||||||
|
cmp = -cmp;
|
||||||
|
}
|
||||||
|
if (arith) {
|
||||||
|
switch (arith) {
|
||||||
|
case 1:
|
||||||
|
jit_addr_d(JIT_FPR0, JIT_FPR0, JIT_FPR1);
|
||||||
|
break;
|
||||||
|
case 2:
|
||||||
|
jit_mulr_d(JIT_FPR0, JIT_FPR0, JIT_FPR1);
|
||||||
|
break;
|
||||||
|
case -2:
|
||||||
|
if (reversed)
|
||||||
|
jit_divr_d(JIT_FPR0, JIT_FPR0, JIT_FPR1);
|
||||||
|
else
|
||||||
|
jit_divr_d(JIT_FPR0, JIT_FPR1, JIT_FPR0);
|
||||||
|
break;
|
||||||
|
case -1:
|
||||||
|
{
|
||||||
|
if (!two_args && !second_const && !reversed) {
|
||||||
|
/* Need a special case to make sure that (- 0.0) => -0.0 */
|
||||||
|
jit_negr_d(JIT_FPR0, JIT_FPR0);
|
||||||
|
} else if (reversed)
|
||||||
|
jit_subr_d(JIT_FPR0, JIT_FPR0, JIT_FPR1);
|
||||||
|
else
|
||||||
|
jit_subr_d(JIT_FPR0, JIT_FPR1, JIT_FPR0);
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
case 9: /* min */
|
||||||
|
case 10: /* max */
|
||||||
|
{
|
||||||
|
GC_CAN_IGNORE jit_insn *refc;
|
||||||
|
__START_SHORT_JUMPS__(1);
|
||||||
|
if (arith == 9) {
|
||||||
|
refc = jit_bler_d(jit_forward(), JIT_FPR0, JIT_FPR1);
|
||||||
|
} else {
|
||||||
|
refc = jit_bger_d(jit_forward(), JIT_FPR0, JIT_FPR1);
|
||||||
|
}
|
||||||
|
jit_movr_p(JIT_R0, JIT_R1);
|
||||||
|
mz_patch_branch(refc);
|
||||||
|
__END_SHORT_JUMPS__(1);
|
||||||
|
no_alloc = 1;
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
case 11: /* abs */
|
||||||
|
__START_SHORT_JUMPS__(1);
|
||||||
|
refskip = jit_bger_d(jit_forward(), JIT_FPR0, JIT_FPR1);
|
||||||
|
jit_subr_d(JIT_FPR0, JIT_FPR1, JIT_FPR0);
|
||||||
|
__END_SHORT_JUMPS__(1);
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
CHECK_LIMIT();
|
||||||
|
|
||||||
|
if (!no_alloc) {
|
||||||
|
#ifdef INLINE_FP_OPS
|
||||||
|
# ifdef CAN_INLINE_ALLOC
|
||||||
|
inline_alloc(jitter, sizeof(Scheme_Double), scheme_double_type, 0);
|
||||||
|
CHECK_LIMIT();
|
||||||
|
jit_addi_p(JIT_R0, JIT_V1, sizeof(long));
|
||||||
|
# else
|
||||||
|
(void)jit_sti_d(&double_result, JIT_FPR0);
|
||||||
|
JIT_UPDATE_THREAD_RSPTR_IF_NEEDED();
|
||||||
|
mz_prepare(0);
|
||||||
|
(void)mz_finish(malloc_double);
|
||||||
|
jit_retval(JIT_R0);
|
||||||
|
# endif
|
||||||
|
#endif
|
||||||
|
CHECK_LIMIT();
|
||||||
|
|
||||||
|
(void)jit_stxi_d(&((Scheme_Double *)0x0)->double_val, JIT_R0, JIT_FPR0);
|
||||||
|
|
||||||
|
if (refskip) {
|
||||||
|
__START_SHORT_JUMPS__(1);
|
||||||
|
mz_patch_branch(refskip);
|
||||||
|
__END_SHORT_JUMPS__(1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
__START_SHORT_JUMPS__(branch_short);
|
||||||
|
switch (cmp) {
|
||||||
|
case -2:
|
||||||
|
refdt = jit_bltr_d(jit_forward(), JIT_FPR1, JIT_FPR0);
|
||||||
|
break;
|
||||||
|
case -1:
|
||||||
|
refdt = jit_bler_d(jit_forward(), JIT_FPR1, JIT_FPR0);
|
||||||
|
break;
|
||||||
|
case 0:
|
||||||
|
refdt = jit_beqr_d(jit_forward(), JIT_FPR1, JIT_FPR0);
|
||||||
|
break;
|
||||||
|
case 1:
|
||||||
|
refdt = jit_bger_d(jit_forward(), JIT_FPR1, JIT_FPR0);
|
||||||
|
break;
|
||||||
|
case 2:
|
||||||
|
refdt = jit_bgtr_d(jit_forward(), JIT_FPR1, JIT_FPR0);
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
refdt = NULL;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
__END_SHORT_JUMPS__(branch_short);
|
||||||
|
*_refdt = refdt;
|
||||||
|
}
|
||||||
|
|
||||||
|
__START_SHORT_JUMPS__(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Jump to return result or false branch: */
|
||||||
|
refd = jit_jmpi(jit_forward());
|
||||||
|
*_refd = refd;
|
||||||
|
|
||||||
|
/* No, they're not both doubles. */
|
||||||
|
if (two_args) {
|
||||||
|
mz_patch_branch(ref8);
|
||||||
|
mz_patch_branch(ref10);
|
||||||
|
}
|
||||||
|
mz_patch_branch(ref9);
|
||||||
|
__END_SHORT_JUMPS__(1);
|
||||||
|
#endif
|
||||||
|
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
|
||||||
static int generate_arith(mz_jit_state *jitter, Scheme_Object *rator, Scheme_Object *rand, Scheme_Object *rand2,
|
static int generate_arith(mz_jit_state *jitter, Scheme_Object *rator, Scheme_Object *rand, Scheme_Object *rand2,
|
||||||
int orig_args, int arith, int cmp, int v, jit_insn **for_branch, int branch_short)
|
int orig_args, int arith, int cmp, int v, jit_insn **for_branch, int branch_short)
|
||||||
|
/* Either arith is non-zero or it's a cmp; the value of each determines the operation:
|
||||||
|
arith = 1 -> + or add1 (if !rand2)
|
||||||
|
arith = -1 -> - or sub1
|
||||||
|
arith = 2 -> *
|
||||||
|
arith = 3 -> bitwise-and
|
||||||
|
arith = 4 -> bitwise-ior
|
||||||
|
arith = 5 -> bitwise-xor
|
||||||
|
arith = 6 -> arithmetic-shift
|
||||||
|
arith = 7 -> bitwise-not
|
||||||
|
arith = 9 -> min
|
||||||
|
arith = 10 -> max
|
||||||
|
arith = 11 -> abs
|
||||||
|
cmp = 0 -> = or zero?
|
||||||
|
cmp = +/-1 -> >=/<=
|
||||||
|
cmp = +/-2 -> >/< or positive/negative?
|
||||||
|
*/
|
||||||
{
|
{
|
||||||
GC_CAN_IGNORE jit_insn *ref, *ref2, *ref3, *ref4, *refslow;
|
GC_CAN_IGNORE jit_insn *ref, *ref2, *ref3, *ref4, *refd = NULL, *refdt = NULL, *refslow;
|
||||||
int skipped, simple_rand, reversed = 0;
|
int skipped, simple_rand, reversed = 0, has_fixnum_fast = 1;
|
||||||
|
|
||||||
LOG_IT(("inlined %s\n", ((Scheme_Primitive_Proc *)rator)->name));
|
LOG_IT(("inlined %s\n", ((Scheme_Primitive_Proc *)rator)->name));
|
||||||
|
|
||||||
|
@ -2437,6 +2658,11 @@ static int generate_arith(mz_jit_state *jitter, Scheme_Object *rator, Scheme_Obj
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if ((arith == -1) && (orig_args == 1) && !v) {
|
||||||
|
/* Unary subtract */
|
||||||
|
reversed = 1;
|
||||||
|
}
|
||||||
|
|
||||||
if (rand2) {
|
if (rand2) {
|
||||||
simple_rand = (SAME_TYPE(SCHEME_TYPE(rand), scheme_local_type)
|
simple_rand = (SAME_TYPE(SCHEME_TYPE(rand), scheme_local_type)
|
||||||
|| SCHEME_INTP(rand));
|
|| SCHEME_INTP(rand));
|
||||||
|
@ -2462,6 +2688,16 @@ static int generate_arith(mz_jit_state *jitter, Scheme_Object *rator, Scheme_Obj
|
||||||
generate_non_tail(rand2 ? rand2 : rand, jitter, 0, 1);
|
generate_non_tail(rand2 ? rand2 : rand, jitter, 0, 1);
|
||||||
CHECK_LIMIT();
|
CHECK_LIMIT();
|
||||||
|
|
||||||
|
if (arith == 2) {
|
||||||
|
if (rand2 || ((v != 0) && (v != 1)))
|
||||||
|
has_fixnum_fast = 0;
|
||||||
|
} else if (arith == -2) {
|
||||||
|
if (rand2 || (v != 1) || reversed)
|
||||||
|
has_fixnum_fast = 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* rand2 in R0, and rand in R1 unless it's simple */
|
||||||
|
|
||||||
if (simple_rand) {
|
if (simple_rand) {
|
||||||
int pos, va;
|
int pos, va;
|
||||||
|
|
||||||
|
@ -2471,6 +2707,7 @@ static int generate_arith(mz_jit_state *jitter, Scheme_Object *rator, Scheme_Obj
|
||||||
} else {
|
} else {
|
||||||
pos = mz_remap(SCHEME_LOCAL_POS(rand));
|
pos = mz_remap(SCHEME_LOCAL_POS(rand));
|
||||||
jit_ldxi_p(JIT_R1, JIT_RUNSTACK, WORDS_TO_BYTES(pos));
|
jit_ldxi_p(JIT_R1, JIT_RUNSTACK, WORDS_TO_BYTES(pos));
|
||||||
|
/* check both fixnum bits at once by ANDing into R2: */
|
||||||
jit_andr_ul(JIT_R2, JIT_R0, JIT_R1);
|
jit_andr_ul(JIT_R2, JIT_R0, JIT_R1);
|
||||||
va = JIT_R2;
|
va = JIT_R2;
|
||||||
}
|
}
|
||||||
|
@ -2479,41 +2716,92 @@ static int generate_arith(mz_jit_state *jitter, Scheme_Object *rator, Scheme_Obj
|
||||||
ref2 = jit_bmsi_ul(jit_forward(), va, 0x1);
|
ref2 = jit_bmsi_ul(jit_forward(), va, 0x1);
|
||||||
__END_SHORT_JUMPS__(1);
|
__END_SHORT_JUMPS__(1);
|
||||||
|
|
||||||
|
if (!SCHEME_INTP(rand) && can_fast_double(arith, cmp, 1)) {
|
||||||
|
/* Maybe they're both doubles... */
|
||||||
|
generate_double_arith(jitter, arith, cmp, reversed, 1, 0, &refd, &refdt, branch_short);
|
||||||
|
CHECK_LIMIT();
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!has_fixnum_fast) {
|
||||||
|
__START_SHORT_JUMPS__(1);
|
||||||
|
mz_patch_branch(ref2);
|
||||||
|
__END_SHORT_JUMPS__(1);
|
||||||
|
}
|
||||||
|
|
||||||
/* Slow path */
|
/* Slow path */
|
||||||
refslow = generate_arith_slow_path(jitter, rator, &ref, &ref4, for_branch, orig_args, reversed, arith, 0, 0);
|
refslow = generate_arith_slow_path(jitter, rator, &ref, &ref4, for_branch, orig_args, reversed, arith, 0, 0);
|
||||||
|
|
||||||
__START_SHORT_JUMPS__(1);
|
if (has_fixnum_fast) {
|
||||||
mz_patch_branch(ref2);
|
__START_SHORT_JUMPS__(1);
|
||||||
__END_SHORT_JUMPS__(1);
|
mz_patch_branch(ref2);
|
||||||
|
__END_SHORT_JUMPS__(1);
|
||||||
|
}
|
||||||
CHECK_LIMIT();
|
CHECK_LIMIT();
|
||||||
} else if (rand2) {
|
} else if (rand2) {
|
||||||
|
/* Move rand result back into R1 */
|
||||||
jit_ldr_p(JIT_R1, JIT_RUNSTACK);
|
jit_ldr_p(JIT_R1, JIT_RUNSTACK);
|
||||||
jit_addi_p(JIT_RUNSTACK, JIT_RUNSTACK, WORDS_TO_BYTES(1));
|
jit_addi_p(JIT_RUNSTACK, JIT_RUNSTACK, WORDS_TO_BYTES(1));
|
||||||
mz_runstack_popped(jitter, 1);
|
mz_runstack_popped(jitter, 1);
|
||||||
|
|
||||||
|
/* check both fixnum bits at once by ANDing into R2: */
|
||||||
jit_andr_ul(JIT_R2, JIT_R0, JIT_R1);
|
jit_andr_ul(JIT_R2, JIT_R0, JIT_R1);
|
||||||
__START_SHORT_JUMPS__(1);
|
__START_SHORT_JUMPS__(1);
|
||||||
ref2 = jit_bmsi_ul(jit_forward(), JIT_R2, 0x1);
|
ref2 = jit_bmsi_ul(jit_forward(), JIT_R2, 0x1);
|
||||||
__END_SHORT_JUMPS__(1);
|
__END_SHORT_JUMPS__(1);
|
||||||
|
CHECK_LIMIT();
|
||||||
|
|
||||||
|
if (can_fast_double(arith, cmp, 1)) {
|
||||||
|
/* Maybe they're both doubles... */
|
||||||
|
generate_double_arith(jitter, arith, cmp, reversed, 1, 0, &refd, &refdt, branch_short);
|
||||||
|
CHECK_LIMIT();
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!has_fixnum_fast) {
|
||||||
|
__START_SHORT_JUMPS__(1);
|
||||||
|
mz_patch_branch(ref2);
|
||||||
|
__END_SHORT_JUMPS__(1);
|
||||||
|
}
|
||||||
|
|
||||||
/* Slow path */
|
/* Slow path */
|
||||||
refslow = generate_arith_slow_path(jitter, rator, &ref, &ref4, for_branch, orig_args, reversed, arith, 0, 0);
|
refslow = generate_arith_slow_path(jitter, rator, &ref, &ref4, for_branch, orig_args, reversed, arith, 0, 0);
|
||||||
|
|
||||||
__START_SHORT_JUMPS__(1);
|
if (has_fixnum_fast) {
|
||||||
mz_patch_branch(ref2);
|
/* Fixnum branch: */
|
||||||
__END_SHORT_JUMPS__(1);
|
__START_SHORT_JUMPS__(1);
|
||||||
|
mz_patch_branch(ref2);
|
||||||
|
__END_SHORT_JUMPS__(1);
|
||||||
|
}
|
||||||
CHECK_LIMIT();
|
CHECK_LIMIT();
|
||||||
} else {
|
} else {
|
||||||
|
/* Only one argument: */
|
||||||
__START_SHORT_JUMPS__(1);
|
__START_SHORT_JUMPS__(1);
|
||||||
ref2 = jit_bmsi_ul(jit_forward(), JIT_R0, 0x1);
|
ref2 = jit_bmsi_ul(jit_forward(), JIT_R0, 0x1);
|
||||||
__END_SHORT_JUMPS__(1);
|
__END_SHORT_JUMPS__(1);
|
||||||
|
|
||||||
|
if ((orig_args != 2) /* <- heuristic: we could generate code when an exact argument is
|
||||||
|
given, but the extra FP code is probably not worthwhile. */
|
||||||
|
&& can_fast_double(arith, cmp, 0)
|
||||||
|
/* watch out: divide by 0 is special: */
|
||||||
|
&& ((arith != -2) || v || reversed)) {
|
||||||
|
/* Maybe it's a double... */
|
||||||
|
generate_double_arith(jitter, arith, cmp, reversed, 0, v, &refd, &refdt, branch_short);
|
||||||
|
CHECK_LIMIT();
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!has_fixnum_fast) {
|
||||||
|
__START_SHORT_JUMPS__(1);
|
||||||
|
mz_patch_branch(ref2);
|
||||||
|
__END_SHORT_JUMPS__(1);
|
||||||
|
}
|
||||||
|
|
||||||
/* Slow path */
|
/* Slow path */
|
||||||
refslow = generate_arith_slow_path(jitter, rator, &ref, &ref4, for_branch, orig_args, reversed, arith, 1, v);
|
refslow = generate_arith_slow_path(jitter, rator, &ref, &ref4, for_branch, orig_args, reversed, arith, 1, v);
|
||||||
|
|
||||||
__START_SHORT_JUMPS__(1);
|
if (has_fixnum_fast) {
|
||||||
mz_patch_branch(ref2);
|
__START_SHORT_JUMPS__(1);
|
||||||
__END_SHORT_JUMPS__(1);
|
mz_patch_branch(ref2);
|
||||||
|
__END_SHORT_JUMPS__(1);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
CHECK_LIMIT();
|
CHECK_LIMIT();
|
||||||
|
@ -2538,6 +2826,16 @@ static int generate_arith(mz_jit_state *jitter, Scheme_Object *rator, Scheme_Obj
|
||||||
(void)jit_bosubr_l(refslow, JIT_R2, JIT_R0);
|
(void)jit_bosubr_l(refslow, JIT_R2, JIT_R0);
|
||||||
}
|
}
|
||||||
jit_ori_ul(JIT_R0, JIT_R2, 0x1);
|
jit_ori_ul(JIT_R0, JIT_R2, 0x1);
|
||||||
|
} else if (arith == 2) {
|
||||||
|
if (has_fixnum_fast) {
|
||||||
|
/* No fast path for fixnum multiplication, yet */
|
||||||
|
(void)jit_jmpi(refslow);
|
||||||
|
}
|
||||||
|
} else if (arith == -2) {
|
||||||
|
if (has_fixnum_fast) {
|
||||||
|
/* No fast path for fixnum division, yet */
|
||||||
|
(void)jit_jmpi(refslow);
|
||||||
|
}
|
||||||
} else if (arith == 3) {
|
} else if (arith == 3) {
|
||||||
/* and */
|
/* and */
|
||||||
jit_andr_ul(JIT_R0, JIT_R1, JIT_R0);
|
jit_andr_ul(JIT_R0, JIT_R1, JIT_R0);
|
||||||
|
@ -2563,6 +2861,7 @@ static int generate_arith(mz_jit_state *jitter, Scheme_Object *rator, Scheme_Obj
|
||||||
jit_notr_l(JIT_V1, v2);
|
jit_notr_l(JIT_V1, v2);
|
||||||
jit_rshi_l(JIT_V1, JIT_V1, 0x1);
|
jit_rshi_l(JIT_V1, JIT_V1, 0x1);
|
||||||
jit_addi_l(JIT_V1, JIT_V1, 0x1);
|
jit_addi_l(JIT_V1, JIT_V1, 0x1);
|
||||||
|
CHECK_LIMIT();
|
||||||
#ifdef MZ_USE_JIT_I386
|
#ifdef MZ_USE_JIT_I386
|
||||||
/* Can't shift from _ECX */
|
/* Can't shift from _ECX */
|
||||||
jit_movr_l(JIT_R2, v1);
|
jit_movr_l(JIT_R2, v1);
|
||||||
|
@ -2572,6 +2871,7 @@ static int generate_arith(mz_jit_state *jitter, Scheme_Object *rator, Scheme_Obj
|
||||||
#endif
|
#endif
|
||||||
jit_ori_l(JIT_R0, JIT_R2, 0x1);
|
jit_ori_l(JIT_R0, JIT_R2, 0x1);
|
||||||
refc = jit_jmpi(jit_forward());
|
refc = jit_jmpi(jit_forward());
|
||||||
|
CHECK_LIMIT();
|
||||||
|
|
||||||
/* Left shift */
|
/* Left shift */
|
||||||
mz_patch_branch(refi);
|
mz_patch_branch(refi);
|
||||||
|
@ -2585,6 +2885,7 @@ static int generate_arith(mz_jit_state *jitter, Scheme_Object *rator, Scheme_Obj
|
||||||
#else
|
#else
|
||||||
jit_lshr_l(JIT_R2, v1, JIT_V1);
|
jit_lshr_l(JIT_R2, v1, JIT_V1);
|
||||||
#endif
|
#endif
|
||||||
|
CHECK_LIMIT();
|
||||||
/* If shifting back right produces a different result, that's overflow... */
|
/* If shifting back right produces a different result, that's overflow... */
|
||||||
jit_rshr_l(JIT_V1, JIT_R2, JIT_V1);
|
jit_rshr_l(JIT_V1, JIT_R2, JIT_V1);
|
||||||
/* !! In case we go refslow, it nseed to add back tag to v1 !! */
|
/* !! In case we go refslow, it nseed to add back tag to v1 !! */
|
||||||
|
@ -2622,6 +2923,28 @@ static int generate_arith(mz_jit_state *jitter, Scheme_Object *rator, Scheme_Obj
|
||||||
(void)jit_bosubi_l(refslow, JIT_R2, v << 1);
|
(void)jit_bosubi_l(refslow, JIT_R2, v << 1);
|
||||||
jit_movr_p(JIT_R0, JIT_R2);
|
jit_movr_p(JIT_R0, JIT_R2);
|
||||||
}
|
}
|
||||||
|
} else if (arith == 2) {
|
||||||
|
if (v == 1) {
|
||||||
|
/* R0 already is the answer */
|
||||||
|
} else if (v == 0) {
|
||||||
|
jit_movi_p(JIT_R0, scheme_make_integer(0));
|
||||||
|
} else {
|
||||||
|
if (has_fixnum_fast) {
|
||||||
|
/* No general fast path for fixnum multiplication, yet */
|
||||||
|
jit_movi_p(JIT_R1, scheme_make_integer(v));
|
||||||
|
(void)jit_jmpi(refslow);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} else if (arith == -2) {
|
||||||
|
if ((v == 1) && !reversed) {
|
||||||
|
/* R0 already is the answer */
|
||||||
|
} else {
|
||||||
|
if (has_fixnum_fast) {
|
||||||
|
/* No general fast path for fixnum division, yet */
|
||||||
|
jit_movi_p(JIT_R1, scheme_make_integer(v));
|
||||||
|
(void)jit_jmpi(refslow);
|
||||||
|
}
|
||||||
|
}
|
||||||
} else {
|
} else {
|
||||||
if (arith == 3) {
|
if (arith == 3) {
|
||||||
/* and */
|
/* and */
|
||||||
|
@ -2665,9 +2988,22 @@ static int generate_arith(mz_jit_state *jitter, Scheme_Object *rator, Scheme_Obj
|
||||||
refc = jit_bgti_l(jit_forward(), JIT_R0, (long)scheme_make_integer(v));
|
refc = jit_bgti_l(jit_forward(), JIT_R0, (long)scheme_make_integer(v));
|
||||||
jit_movi_l(JIT_R0, (long)scheme_make_integer(v));
|
jit_movi_l(JIT_R0, (long)scheme_make_integer(v));
|
||||||
mz_patch_branch(refc);
|
mz_patch_branch(refc);
|
||||||
|
} else if (arith == 11) {
|
||||||
|
/* abs */
|
||||||
|
jit_insn *refc;
|
||||||
|
refc = jit_bgti_l(jit_forward(), JIT_R0, (long)scheme_make_integer(0));
|
||||||
|
jit_rshi_l(JIT_R0, JIT_R0, 1);
|
||||||
|
jit_movi_l(JIT_R1, 0);
|
||||||
|
jit_subr_l(JIT_R0, JIT_R1, JIT_R0);
|
||||||
|
jit_lshi_l(JIT_R0, JIT_R0, 1);
|
||||||
|
jit_ori_l(JIT_R0, JIT_R0, 0x1);
|
||||||
|
mz_patch_branch(refc);
|
||||||
|
CHECK_LIMIT();
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
if (refd)
|
||||||
|
mz_patch_ucbranch(refd);
|
||||||
jit_patch_movi(ref, (_jit.x.pc));
|
jit_patch_movi(ref, (_jit.x.pc));
|
||||||
} else {
|
} else {
|
||||||
/* If second is constant, first arg is in JIT_R0. */
|
/* If second is constant, first arg is in JIT_R0. */
|
||||||
|
@ -2711,14 +3047,20 @@ static int generate_arith(mz_jit_state *jitter, Scheme_Object *rator, Scheme_Obj
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (refdt)
|
||||||
|
mz_patch_ucbranch(refdt);
|
||||||
|
|
||||||
if (for_branch) {
|
if (for_branch) {
|
||||||
for_branch[0] = ref3;
|
for_branch[0] = ref3;
|
||||||
|
for_branch[1] = refd;
|
||||||
for_branch[2] = ref;
|
for_branch[2] = ref;
|
||||||
jit_patch_movi(ref4, (_jit.x.pc));
|
jit_patch_movi(ref4, (_jit.x.pc));
|
||||||
} else {
|
} else {
|
||||||
(void)jit_movi_p(JIT_R0, scheme_true);
|
(void)jit_movi_p(JIT_R0, scheme_true);
|
||||||
ref2 = jit_jmpi(jit_forward());
|
ref2 = jit_jmpi(jit_forward());
|
||||||
mz_patch_branch(ref3);
|
mz_patch_branch(ref3);
|
||||||
|
if (refd)
|
||||||
|
mz_patch_ucbranch(refd);
|
||||||
(void)jit_movi_p(JIT_R0, scheme_false);
|
(void)jit_movi_p(JIT_R0, scheme_false);
|
||||||
mz_patch_ucbranch(ref2);
|
mz_patch_ucbranch(ref2);
|
||||||
jit_patch_movi(ref, (_jit.x.pc));
|
jit_patch_movi(ref, (_jit.x.pc));
|
||||||
|
@ -3175,6 +3517,12 @@ static int generate_inlined_unary(mz_jit_state *jitter, Scheme_App2_Rec *app, in
|
||||||
} else if (IS_NAMED_PRIM(rator, "sub1")) {
|
} else if (IS_NAMED_PRIM(rator, "sub1")) {
|
||||||
generate_arith(jitter, rator, app->rand, NULL, 1, -1, 0, 1, NULL, 1);
|
generate_arith(jitter, rator, app->rand, NULL, 1, -1, 0, 1, NULL, 1);
|
||||||
return 1;
|
return 1;
|
||||||
|
} else if (IS_NAMED_PRIM(rator, "-")) {
|
||||||
|
generate_arith(jitter, rator, app->rand, NULL, 1, -1, 0, 0, NULL, 1);
|
||||||
|
return 1;
|
||||||
|
} else if (IS_NAMED_PRIM(rator, "abs")) {
|
||||||
|
generate_arith(jitter, rator, app->rand, NULL, 1, 11, 0, 0, NULL, 1);
|
||||||
|
return 1;
|
||||||
} else if (IS_NAMED_PRIM(rator, "bitwise-not")) {
|
} else if (IS_NAMED_PRIM(rator, "bitwise-not")) {
|
||||||
generate_arith(jitter, rator, app->rand, NULL, 1, 7, 0, 9, NULL, 1);
|
generate_arith(jitter, rator, app->rand, NULL, 1, 7, 0, 9, NULL, 1);
|
||||||
return 1;
|
return 1;
|
||||||
|
@ -3455,6 +3803,12 @@ static int generate_inlined_binary(mz_jit_state *jitter, Scheme_App3_Rec *app, i
|
||||||
} else if (IS_NAMED_PRIM(rator, "-")) {
|
} else if (IS_NAMED_PRIM(rator, "-")) {
|
||||||
generate_arith(jitter, rator, app->rand1, app->rand2, 2, -1, 0, 0, NULL, 1);
|
generate_arith(jitter, rator, app->rand1, app->rand2, 2, -1, 0, 0, NULL, 1);
|
||||||
return 1;
|
return 1;
|
||||||
|
} else if (IS_NAMED_PRIM(rator, "*")) {
|
||||||
|
generate_arith(jitter, rator, app->rand1, app->rand2, 2, 2, 0, 0, NULL, 1);
|
||||||
|
return 1;
|
||||||
|
} else if (IS_NAMED_PRIM(rator, "/")) {
|
||||||
|
generate_arith(jitter, rator, app->rand1, app->rand2, 2, -2, 0, 0, NULL, 1);
|
||||||
|
return 1;
|
||||||
} else if (IS_NAMED_PRIM(rator, "min")) {
|
} else if (IS_NAMED_PRIM(rator, "min")) {
|
||||||
generate_arith(jitter, rator, app->rand1, app->rand2, 2, 9, 0, 0, NULL, 1);
|
generate_arith(jitter, rator, app->rand1, app->rand2, 2, 9, 0, 0, NULL, 1);
|
||||||
return 1;
|
return 1;
|
||||||
|
|
|
@ -583,6 +583,8 @@ typedef _uc jit_insn;
|
||||||
#define JNLEm(D,B,I,S) JCCim(0xf,0xe,D,B,I,S)
|
#define JNLEm(D,B,I,S) JCCim(0xf,0xe,D,B,I,S)
|
||||||
#define JGm(D,B,I,S) JCCim(0xf,0xe,D,B,I,S)
|
#define JGm(D,B,I,S) JCCim(0xf,0xe,D,B,I,S)
|
||||||
|
|
||||||
|
#define JCm(D,B,I,S) JBm(D,B,I,S)
|
||||||
|
#define JNCm(D,B,I,S) JNBm(D,B,I,S)
|
||||||
|
|
||||||
#define JMPSm(D,B,I,S) ((_r0P(B) && _r0P(I)) ? _O_D8 (0xeb ,(long)(D) ) : \
|
#define JMPSm(D,B,I,S) ((_r0P(B) && _r0P(I)) ? _O_D8 (0xeb ,(long)(D) ) : \
|
||||||
JITFAIL("illegal mode in short jump"))
|
JITFAIL("illegal mode in short jump"))
|
||||||
|
|
|
@ -49,8 +49,8 @@
|
||||||
#define JIT_FPR_NUM 6
|
#define JIT_FPR_NUM 6
|
||||||
#define JIT_FPR(i) (i)
|
#define JIT_FPR(i) (i)
|
||||||
|
|
||||||
#define jit_fxch(rs, op) (((rs) != 0 ? FXCHr(rs) : 0), \
|
#define jit_fxch(rs, op) (((rs) != 0 ? FXCHr(rs) : (void)0), \
|
||||||
op, ((rs) != 0 ? FXCHr(rs) : 0))
|
op, ((rs) != 0 ? FXCHr(rs) : (void)0))
|
||||||
|
|
||||||
#define jit_fp_unary(rd, s1, op) \
|
#define jit_fp_unary(rd, s1, op) \
|
||||||
((rd) == (s1) ? jit_fxch ((rd), op) \
|
((rd) == (s1) ? jit_fxch ((rd), op) \
|
||||||
|
@ -62,7 +62,7 @@
|
||||||
((s2) == 0 ? opr(0, (rd)) \
|
((s2) == 0 ? opr(0, (rd)) \
|
||||||
: (s2) == (s1) ? jit_fxch((rd), op(0, 0)) \
|
: (s2) == (s1) ? jit_fxch((rd), op(0, 0)) \
|
||||||
: jit_fxch((rd), op((s2), 0))) \
|
: jit_fxch((rd), op((s2), 0))) \
|
||||||
: (rd) == (s2) ? jit_fxch((s1), opr(0, (rd) == 0 ? (s1) : (rd))) \
|
: (rd) == (s2) ? jit_fxch((s1), op(0, (rd) == 0 ? (s1) : (rd))) \
|
||||||
: (FLDr (s1), op(0, (s2)+1), FSTPr((rd)+1)))
|
: (FLDr (s1), op(0, (s2)+1), FSTPr((rd)+1)))
|
||||||
|
|
||||||
#define jit_addr_d(rd,s1,s2) jit_fp_binary((rd),(s1),(s2),FADDrr,FADDrr)
|
#define jit_addr_d(rd,s1,s2) jit_fp_binary((rd),(s1),(s2),FADDrr,FADDrr)
|
||||||
|
@ -255,7 +255,7 @@ union jit_double_imm {
|
||||||
((_and) ? ANDLir ((_and), _EAX) : 0), \
|
((_and) ? ANDLir ((_and), _EAX) : 0), \
|
||||||
((cmp) ? CMPLir ((cmp), _AL) : 0), \
|
((cmp) ? CMPLir ((cmp), _AL) : 0), \
|
||||||
POPLr(_EAX), \
|
POPLr(_EAX), \
|
||||||
res ((d), 0, 0, 0))
|
res ((d), 0, 0, 0), _jit.x.pc)
|
||||||
|
|
||||||
#define jit_nothing_needed(x)
|
#define jit_nothing_needed(x)
|
||||||
|
|
||||||
|
|
|
@ -54,24 +54,22 @@ void scheme_init_numarith(Scheme_Env *env)
|
||||||
scheme_add_global_constant("+", p, env);
|
scheme_add_global_constant("+", p, env);
|
||||||
|
|
||||||
p = scheme_make_folding_prim(minus, "-", 1, -1, 1);
|
p = scheme_make_folding_prim(minus, "-", 1, -1, 1);
|
||||||
SCHEME_PRIM_PROC_FLAGS(p) |= SCHEME_PRIM_IS_BINARY_INLINED;
|
SCHEME_PRIM_PROC_FLAGS(p) |= (SCHEME_PRIM_IS_BINARY_INLINED
|
||||||
|
| SCHEME_PRIM_IS_UNARY_INLINED);
|
||||||
scheme_add_global_constant("-", p, env);
|
scheme_add_global_constant("-", p, env);
|
||||||
|
|
||||||
scheme_add_global_constant("*",
|
p = scheme_make_folding_prim(mult, "*", 0, -1, 1);
|
||||||
scheme_make_folding_prim(mult,
|
SCHEME_PRIM_PROC_FLAGS(p) |= SCHEME_PRIM_IS_BINARY_INLINED;
|
||||||
"*",
|
scheme_add_global_constant("*", p, env);
|
||||||
0, -1, 1),
|
|
||||||
env);
|
p = scheme_make_folding_prim(div_prim, "/", 1, -1, 1);
|
||||||
scheme_add_global_constant("/",
|
SCHEME_PRIM_PROC_FLAGS(p) |= SCHEME_PRIM_IS_BINARY_INLINED;
|
||||||
scheme_make_folding_prim(div_prim,
|
scheme_add_global_constant("/", p, env);
|
||||||
"/",
|
|
||||||
1, -1, 1),
|
p = scheme_make_folding_prim(scheme_abs, "abs", 1, 1, 1);
|
||||||
env);
|
SCHEME_PRIM_PROC_FLAGS(p) |= SCHEME_PRIM_IS_UNARY_INLINED;
|
||||||
scheme_add_global_constant("abs",
|
scheme_add_global_constant("abs", p, env);
|
||||||
scheme_make_folding_prim(scheme_abs,
|
|
||||||
"abs",
|
|
||||||
1, 1, 1),
|
|
||||||
env);
|
|
||||||
scheme_add_global_constant("quotient",
|
scheme_add_global_constant("quotient",
|
||||||
scheme_make_folding_prim(quotient,
|
scheme_make_folding_prim(quotient,
|
||||||
"quotient",
|
"quotient",
|
||||||
|
|
Loading…
Reference in New Issue
Block a user