About the Benchmarks

The pages linked below show some benchmark results on a collection of regexp benchmarks.

Tables show relative performance, with the actual time for the fastest run shown on the left. So, by default, 1 is the fastest, but select any implementation to normalize the table with respect to that implementation's speed. A -- appears when a benchmark didn't run in an implementation (either due to a missing feature, a very long run time, or stack overflow).

Run times are averaged over three runs. All reported times are CPU time (system plus user). The times for MzScheme, Perl, and Python use the language's sime function to record the times before and after a loop within the language; the PCRE times are based on calls a timed MzScheme loop using calling PCRE via (lib "foreign.ss").

A benchmark name name/N/M means that the input size was roughly 10^N and roughly 10^M iterations were used. The name part can be matched to actual patterns and inputs in the source (see link below).

The stress- benchmarks were taken from the CL-PPCRE benchmark suite. Most others were written to test specific regexp features, although a few were taken from useful code.

Versions and performance notes:

MzScheme: 352.6: Many of the benchmarks serve as a performance test suite for MzScheme, and the MzScheme implementor also produced the benchmarks, so MzScheme should perform reasonably well!
PCRE: 6.7 (compiled with defaults): PCRE doesn't seem to ignore empty patterns like (?:), which probably don't come up much in practice. Also, the default build mode uses C recursion, so some of the stress tests fail due to stack overflow; recompiling to use heap frames would presumably fix the problem.
Perl: 5.8.6: Perl is especially clever on the even-numbered escape tests (where the even-numbered ones are where the input doesn't match). Perl lags significantly only on the stress-nopeci benchmarks, which require lots of backtracking unless the implementation first checks for the existence of a case-insensitive version of a literal string; MzScheme checks only because the CL-PPCRE benchmark suite suggested the test.
Python: 2.3.5 (old version!): Conditionals were added to Python's regexp library in version 2.4. The non-backtracking form (?>...) doesn't seem to be supported.
RxMzOld = Mzscheme 352.5: This was the last version of MzScheme with the old regexp system.

For further details on the benchmarks here, see the benchmark source and infrastructure, which is available form the PLT SVN repository:

http://svn.plt-scheme.org/plt/trunk/collects/tests/mzscheme/benchmarks/rx/

Results

machine1 Mac OS X, PowerPC, 1.5GHz, 500MB