Commit Graph

4 Commits

Author SHA1 Message Date
Matthew Flatt
540c58bbe8 use POPCNT instruction when available on x86_64
On x86_64, a POPCNT instruction is usually available, and it can speed
up `fxpopcount` operations by a factor of 2-3.

Since POPCNT isn't always available, code using `fxpopcount` is
compiled to a call to a generic implementation. The linker substitutes
a POPCNT instruction when it determines at runtime that POPCNT is
available.

Some measurements on a 2018 MacBook Pro (2.7 GHz Core i7) using the
program below:

 popcnt = this implementation, POPCNT discovered
 nocnt  = this implementation, POPCNT considered unavailable
 optcnt = compile to use POPCNT directly (no linker work)
 cpcnt  = compile to inlined generic (no linker work, no POPCNT)

Since the generic implementation is always a 64-bit popcount, it's not
as good as an inlined version for `fxpopcount32`, but otherwise the
link-edit approach to POPCNT works well:

            fxpopcount      fxpopcount32
 popcnt:       0.098s
 nocnt:        0.284s
 optcnt        0.109s  [slower means noise?]
 cpcnt:        0.279s         0.188s

 (optimize-level 3)
 (time
  (let loop ([v #f] [i 100000000])
    (if (fx= i 0)
        v
        (loop (fxpopcount i) (fx- i 1)))))

original commit: 5f090e509f8fe5edc777ed9f0463b20c2e571336
2020-01-11 11:04:48 -07:00
Matthew Flatt
81ea967aea add stencil vectors and fxpopcount
original commit: ec766fca869b5e0407c4f54230b72619af73b40b
2020-01-06 05:34:28 -07:00
Bob Burger
831ea8ad18 changed copyright year to 2017
7.ss, scheme.1.in, comments of many files

original commit: 06f858f9a505b9d6fb6ca1ac97234927cb2dc641
2017-04-06 11:41:33 -04:00
dyb
1356af91b3 initial upload of open-source release
original commit: 47a210c15c63ba9677852269447bd2f2598b51fe
2016-04-26 10:04:54 -04:00