#lang scribble/doc @(require scribble/manual scribble/eval "guide-utils.rkt" (for-label racket/flonum racket/future future-visualizer)) @(define future-eval (make-base-eval)) @(interaction-eval #:eval future-eval (require racket/future future-visualizer/private/visualizer-drawing future-visualizer/trace)) @title[#:tag "effective-futures"]{Parallelism with Futures} The @racketmodname[racket/future] library provides support for performance improvement through parallelism with the @racket[future] and @racket[touch] functions. The level of parallelism available from those constructs, however, is limited by several factors, and the current implementation is best suited to numerical tasks. @margin-note{Other functions, such as @racket[thread], support the creation of reliably concurrent tasks. However, threads never run truly in parallel, even if the hardware and operating system support parallelism.} As a starting example, the @racket[any-double?] function below takes a list of numbers and determines whether any number in the list has a double that is also in the list: @racketblock[ (define (any-double? l) (for/or ([i (in-list l)]) (for/or ([i2 (in-list l)]) (= i2 (* 2 i))))) ] This function runs in quadratic time, so it can take a long time (on the order of a second) on large lists like @racket[l1] and @racket[l2]: @racketblock[ (define l1 (for/list ([i (in-range 5000)]) (+ (* 2 i) 1))) (define l2 (for/list ([i (in-range 5000)]) (- (* 2 i) 1))) (or (any-double? l1) (any-double? l2)) ] The best way to speed up @racket[any-double?] is to use a different algorithm. However, on a machine that offers at least two processing units, the example above can run in about half the time using @racket[future] and @racket[touch]: @racketblock[ (let ([f (future (lambda () (any-double? l2)))]) (or (any-double? l1) (touch f))) ] The future @racket[f] runs @racket[(any-double? l2)] in parallel to @racket[(any-double? l1)], and the result for @racket[(any-double? l2)] becomes available about the same time that it is demanded by @racket[(touch f)]. Futures run in parallel as long as they can do so safely, but the notion of ``future safe'' is inherently tied to the implementation. The distinction between ``future safe'' and ``future unsafe'' operations may be far from apparent at the level of a Racket program. The remainder of this section works through an example to illustrate this distinction and to show how to use the future visualizer can help shed light on it. Consider the following core of a Mandelbrot-set computation: @racketblock[ (define (mandelbrot iterations x y n) (let ([ci (- (/ (* 2.0 y) n) 1.0)] [cr (- (/ (* 2.0 x) n) 1.5)]) (let loop ([i 0] [zr 0.0] [zi 0.0]) (if (> i iterations) i (let ([zrq (* zr zr)] [ziq (* zi zi)]) (cond [(> (+ zrq ziq) 4) i] [else (loop (add1 i) (+ (- zrq ziq) cr) (+ (* 2 zr zi) ci))])))))) ] The expressions @racket[(mandelbrot 10000000 62 500 1000)] and @racket[(mandelbrot 10000000 62 501 1000)] each take a while to produce an answer. Computing them both, of course, takes twice as long: @racketblock[ (list (mandelbrot 10000000 62 500 1000) (mandelbrot 10000000 62 501 1000)) ] Unfortunately, attempting to run the two computations in parallel with @racket[future] does not improve performance: @racketblock[ (let ([f (future (lambda () (mandelbrot 10000000 62 501 1000)))]) (list (mandelbrot 10000000 62 500 1000) (touch f))) ] To see why, use the @racketmodname[future-visualizer], like this: @racketblock[ (require future-visualizer) (visualize-futures (let ([f (future (lambda () (mandelbrot 10000000 62 501 1000)))]) (list (mandelbrot 10000000 62 500 1000) (touch f))))] This opens a window showing a graphical view of a trace of the computation. The upper-left portion of the window contains an execution timeline: @(interaction-eval #:eval future-eval (define bad-log (list (indexed-future-event 0 '#s(future-event #f 0 create 1334778390997.936 #f 1)) (indexed-future-event 1 '#s(future-event 1 1 start-work 1334778390998.137 #f #f)) (indexed-future-event 2 '#s(future-event 1 1 sync 1334778390998.145 #f #f)) (indexed-future-event 3 '#s(future-event 1 0 sync 1334778391001.616 [allocate memory] #f)) (indexed-future-event 4 '#s(future-event 1 0 result 1334778391001.629 #f #f)) (indexed-future-event 5 '#s(future-event 1 1 result 1334778391001.643 #f #f)) (indexed-future-event 6 '#s(future-event 1 1 block 1334778391001.653 #f #f)) (indexed-future-event 7 '#s(future-event 1 1 suspend 1334778391001.658 #f #f)) (indexed-future-event 8 '#s(future-event 1 1 end-work 1334778391001.658 #f #f)) (indexed-future-event 9 '#s(future-event 1 0 block 1334778392134.226 > #f)) (indexed-future-event 10 '#s(future-event 1 0 result 1334778392134.241 #f #f)) (indexed-future-event 11 '#s(future-event 1 1 start-work 1334778392134.254 #f #f)) (indexed-future-event 12 '#s(future-event 1 1 sync 1334778392134.339 #f #f)) (indexed-future-event 13 '#s(future-event 1 0 sync 1334778392134.375 [allocate memory] #f)) (indexed-future-event 14 '#s(future-event 1 0 result 1334778392134.38 #f #f)) (indexed-future-event 15 '#s(future-event 1 1 result 1334778392134.387 #f #f)) (indexed-future-event 16 '#s(future-event 1 1 block 1334778392134.39 #f #f)) (indexed-future-event 17 '#s(future-event 1 1 suspend 1334778392134.391 #f #f)) (indexed-future-event 18 '#s(future-event 1 1 end-work 1334778392134.391 #f #f)) (indexed-future-event 19 '#s(future-event 1 0 touch-pause 1334778392134.432 #f #f)) (indexed-future-event 20 '#s(future-event 1 0 touch-resume 1334778392134.433 #f #f)) (indexed-future-event 21 '#s(future-event 1 0 block 1334778392134.533 * #f)) (indexed-future-event 22 '#s(future-event 1 0 result 1334778392134.537 #f #f)) (indexed-future-event 23 '#s(future-event 1 2 start-work 1334778392134.568 #f #f)) (indexed-future-event 24 '#s(future-event 1 2 sync 1334778392134.57 #f #f)) (indexed-future-event 25 '#s(future-event 1 0 touch-pause 1334778392134.587 #f #f)) (indexed-future-event 26 '#s(future-event 1 0 touch-resume 1334778392134.587 #f #f)) (indexed-future-event 27 '#s(future-event 1 0 block 1334778392134.6 [allocate memory] #f)) (indexed-future-event 28 '#s(future-event 1 0 result 1334778392134.604 #f #f)) (indexed-future-event 29 '#s(future-event 1 2 result 1334778392134.627 #f #f)) (indexed-future-event 30 '#s(future-event 1 2 block 1334778392134.629 #f #f)) (indexed-future-event 31 '#s(future-event 1 2 suspend 1334778392134.632 #f #f)) (indexed-future-event 32 '#s(future-event 1 2 end-work 1334778392134.633 #f #f)) (indexed-future-event 33 '#s(future-event 1 0 touch-pause 1334778392134.64 #f #f)) (indexed-future-event 34 '#s(future-event 1 0 touch-resume 1334778392134.64 #f #f)) (indexed-future-event 35 '#s(future-event 1 0 block 1334778392134.663 > #f)) (indexed-future-event 36 '#s(future-event 1 0 result 1334778392134.666 #f #f)) (indexed-future-event 37 '#s(future-event 1 1 start-work 1334778392134.673 #f #f)) (indexed-future-event 38 '#s(future-event 1 1 block 1334778392134.676 #f #f)) (indexed-future-event 39 '#s(future-event 1 1 suspend 1334778392134.677 #f #f)) (indexed-future-event 40 '#s(future-event 1 1 end-work 1334778392134.677 #f #f)) (indexed-future-event 41 '#s(future-event 1 0 touch-pause 1334778392134.704 #f #f)) (indexed-future-event 42 '#s(future-event 1 0 touch-resume 1334778392134.704 #f #f)) (indexed-future-event 43 '#s(future-event 1 0 block 1334778392134.727 * #f)) (indexed-future-event 44 '#s(future-event 1 0 result 1334778392134.73 #f #f)) (indexed-future-event 45 '#s(future-event 1 2 start-work 1334778392134.737 #f #f)) (indexed-future-event 46 '#s(future-event 1 2 block 1334778392134.739 #f #f)) (indexed-future-event 47 '#s(future-event 1 2 suspend 1334778392134.74 #f #f)) (indexed-future-event 48 '#s(future-event 1 2 end-work 1334778392134.741 #f #f)) (indexed-future-event 49 '#s(future-event 1 0 touch-pause 1334778392134.767 #f #f)) (indexed-future-event 50 '#s(future-event 1 0 touch-resume 1334778392134.767 #f #f)) (indexed-future-event 51 '#s(future-event 1 0 block 1334778392134.79 > #f)) (indexed-future-event 52 '#s(future-event 1 0 result 1334778392134.793 #f #f)) (indexed-future-event 53 '#s(future-event 1 1 start-work 1334778392134.799 #f #f)) (indexed-future-event 54 '#s(future-event 1 1 block 1334778392134.801 #f #f)) (indexed-future-event 55 '#s(future-event 1 1 suspend 1334778392134.802 #f #f)) (indexed-future-event 56 '#s(future-event 1 1 end-work 1334778392134.803 #f #f)) (indexed-future-event 57 '#s(future-event 1 0 touch-pause 1334778392134.832 #f #f)) (indexed-future-event 58 '#s(future-event 1 0 touch-resume 1334778392134.832 #f #f)) (indexed-future-event 59 '#s(future-event 1 0 block 1334778392134.854 * #f)) (indexed-future-event 60 '#s(future-event 1 0 result 1334778392134.858 #f #f)) (indexed-future-event 61 '#s(future-event 1 2 start-work 1334778392134.864 #f #f)) (indexed-future-event 62 '#s(future-event 1 2 block 1334778392134.876 #f #f)) (indexed-future-event 63 '#s(future-event 1 2 suspend 1334778392134.877 #f #f)) (indexed-future-event 64 '#s(future-event 1 2 end-work 1334778392134.882 #f #f)) (indexed-future-event 65 '#s(future-event 1 0 touch-pause 1334778392134.918 #f #f)) (indexed-future-event 66 '#s(future-event 1 0 touch-resume 1334778392134.918 #f #f)) (indexed-future-event 67 '#s(future-event 1 0 block 1334778392134.94 > #f)) (indexed-future-event 68 '#s(future-event 1 0 result 1334778392134.943 #f #f)) (indexed-future-event 69 '#s(future-event 1 1 start-work 1334778392134.949 #f #f)) (indexed-future-event 70 '#s(future-event 1 1 block 1334778392134.952 #f #f)) (indexed-future-event 71 '#s(future-event 1 1 suspend 1334778392134.953 #f #f)) (indexed-future-event 72 '#s(future-event 1 1 end-work 1334778392134.96 #f #f)) (indexed-future-event 73 '#s(future-event 1 0 touch-pause 1334778392134.991 #f #f)) (indexed-future-event 74 '#s(future-event 1 0 touch-resume 1334778392134.991 #f #f)) (indexed-future-event 75 '#s(future-event 1 0 block 1334778392135.013 * #f)) (indexed-future-event 76 '#s(future-event 1 0 result 1334778392135.016 #f #f)) (indexed-future-event 77 '#s(future-event 1 2 start-work 1334778392135.027 #f #f)) (indexed-future-event 78 '#s(future-event 1 2 block 1334778392135.033 #f #f)) (indexed-future-event 79 '#s(future-event 1 2 suspend 1334778392135.034 #f #f)) (indexed-future-event 80 '#s(future-event 1 2 end-work 1334778392135.04 #f #f)) (indexed-future-event 81 '#s(future-event 1 0 touch-pause 1334778392135.075 #f #f)) (indexed-future-event 82 '#s(future-event 1 0 touch-resume 1334778392135.075 #f #f)) (indexed-future-event 83 '#s(future-event 1 0 block 1334778392135.098 > #f)) (indexed-future-event 84 '#s(future-event 1 0 result 1334778392135.101 #f #f)) (indexed-future-event 85 '#s(future-event 1 1 start-work 1334778392135.107 #f #f)) (indexed-future-event 86 '#s(future-event 1 1 block 1334778392135.117 #f #f)) (indexed-future-event 87 '#s(future-event 1 1 suspend 1334778392135.118 #f #f)) (indexed-future-event 88 '#s(future-event 1 1 end-work 1334778392135.123 #f #f)) (indexed-future-event 89 '#s(future-event 1 0 touch-pause 1334778392135.159 #f #f)) (indexed-future-event 90 '#s(future-event 1 0 touch-resume 1334778392135.159 #f #f)) (indexed-future-event 91 '#s(future-event 1 0 block 1334778392135.181 * #f)) (indexed-future-event 92 '#s(future-event 1 0 result 1334778392135.184 #f #f)) (indexed-future-event 93 '#s(future-event 1 2 start-work 1334778392135.19 #f #f)) (indexed-future-event 94 '#s(future-event 1 2 block 1334778392135.191 #f #f)) (indexed-future-event 95 '#s(future-event 1 2 suspend 1334778392135.192 #f #f)) (indexed-future-event 96 '#s(future-event 1 2 end-work 1334778392135.192 #f #f)) (indexed-future-event 97 '#s(future-event 1 0 touch-pause 1334778392135.221 #f #f)) (indexed-future-event 98 '#s(future-event 1 0 touch-resume 1334778392135.221 #f #f)) (indexed-future-event 99 '#s(future-event 1 0 block 1334778392135.243 > #f)) ))) @interaction-eval-show[ #:eval future-eval (timeline-pict bad-log #:x 0 #:y 0 #:width 600 #:height 300) ] Each horizontal row represents an OS-level thread, and the colored dots represent important events in the execution of the program (they are color-coded to distinguish one event type from another). The upper-left blue dot in the timeline represents the future's creation. The future executes for a brief period (represented by a green bar in the second line) on thread 1, and then pauses to allow the runtime thread to perform a future-unsafe operation. In the Racket implementation, future-unsafe operations fall into one of two categories. A @deftech{blocking} operation halts the evaluation of the future, and will not allow it to continue until it is touched. After the operation completes within @racket[touch], the remainder of the future's work will be evaluated sequentially by the runtime thread. A @deftech{synchronized} operation also halts the future, but the runtime thread may perform the operation at any time and, once completed, the future may continue running in parallel. Memory allocation and JIT compilation are two common examples of synchronized operations. In the timeline, we see an orange dot just to the right of the green bar on thread 1 -- this dot represents a synchronized operation (memory allocation). The first orange dot on thread 0 shows that the runtime thread performed the allocation shortly after the future paused. A short time later, the future halts on a blocking operation (the first red dot) and must wait until the @racket[touch] for it to be evaluated (slightly after the 1049ms mark). When you move your mouse over an event, the visualizer shows you detailed information about the event and draws arrows connecting all of the events in the corresponding future. This image shows those connections for our future. @interaction-eval-show[ #:eval future-eval (timeline-pict bad-log #:x 0 #:y 0 #:width 600 #:height 300 #:selected-event-index 6) ] The dotted orange line connects the first event in the future to the future that created it, and the purple lines connect adjacent events within the future. The reason that we see no parallelism is that the @racket[<] and @racket[*] operations in the lower portion of the loop in @racket[mandelbrot] involve a mixture of floating-point and fixed (integer) values. Such mixtures typically trigger a slow path in execution, and the general slow path will usually be blocking. Changing constants to be floating-points numbers in @racket[mandelbrot] addresses that first problem: @racketblock[ (define (mandelbrot iterations x y n) (let ([ci (- (/ (* 2.0 y) n) 1.0)] [cr (- (/ (* 2.0 x) n) 1.5)]) (let loop ([i 0] [zr 0.0] [zi 0.0]) (if (> i iterations) i (let ([zrq (* zr zr)] [ziq (* zi zi)]) (cond [(> (+ zrq ziq) 4.0) i] [else (loop (add1 i) (+ (- zrq ziq) cr) (+ (* 2.0 zr zi) ci))])))))) ] With that change, @racket[mandelbrot] computations can run in parallel. Nevertheless, we still see a special type of slow-path operation limiting our parallelism (orange dots): @interaction-eval[ #:eval future-eval (define better-log (list (indexed-future-event 0 '#s(future-event #f 0 create 1334779296782.22 #f 2)) (indexed-future-event 1 '#s(future-event 2 2 start-work 1334779296782.265 #f #f)) (indexed-future-event 2 '#s(future-event 2 2 sync 1334779296782.378 #f #f)) (indexed-future-event 3 '#s(future-event 2 0 sync 1334779296795.582 [allocate memory] #f)) (indexed-future-event 4 '#s(future-event 2 0 result 1334779296795.587 #f #f)) (indexed-future-event 5 '#s(future-event 2 2 result 1334779296795.6 #f #f)) (indexed-future-event 6 '#s(future-event 2 2 sync 1334779296795.689 #f #f)) (indexed-future-event 7 '#s(future-event 2 0 sync 1334779296795.807 [allocate memory] #f)) (indexed-future-event 8 '#s(future-event 2 0 result 1334779296795.812 #f #f)) (indexed-future-event 9 '#s(future-event 2 2 result 1334779296795.818 #f #f)) (indexed-future-event 10 '#s(future-event 2 2 sync 1334779296795.827 #f #f)) (indexed-future-event 11 '#s(future-event 2 0 sync 1334779296806.627 [allocate memory] #f)) (indexed-future-event 12 '#s(future-event 2 0 result 1334779296806.635 #f #f)) (indexed-future-event 13 '#s(future-event 2 2 result 1334779296806.646 #f #f)) (indexed-future-event 14 '#s(future-event 2 2 sync 1334779296806.879 #f #f)) (indexed-future-event 15 '#s(future-event 2 0 sync 1334779296806.994 [allocate memory] #f)) (indexed-future-event 16 '#s(future-event 2 0 result 1334779296806.999 #f #f)) (indexed-future-event 17 '#s(future-event 2 2 result 1334779296807.007 #f #f)) (indexed-future-event 18 '#s(future-event 2 2 sync 1334779296807.023 #f #f)) (indexed-future-event 19 '#s(future-event 2 0 sync 1334779296814.198 [allocate memory] #f)) (indexed-future-event 20 '#s(future-event 2 0 result 1334779296814.206 #f #f)) (indexed-future-event 21 '#s(future-event 2 2 result 1334779296814.221 #f #f)) (indexed-future-event 22 '#s(future-event 2 2 sync 1334779296814.29 #f #f)) (indexed-future-event 23 '#s(future-event 2 0 sync 1334779296820.796 [allocate memory] #f)) (indexed-future-event 24 '#s(future-event 2 0 result 1334779296820.81 #f #f)) (indexed-future-event 25 '#s(future-event 2 2 result 1334779296820.835 #f #f)) (indexed-future-event 26 '#s(future-event 2 2 sync 1334779296821.089 #f #f)) (indexed-future-event 27 '#s(future-event 2 0 sync 1334779296825.217 [allocate memory] #f)) (indexed-future-event 28 '#s(future-event 2 0 result 1334779296825.226 #f #f)) (indexed-future-event 29 '#s(future-event 2 2 result 1334779296825.242 #f #f)) (indexed-future-event 30 '#s(future-event 2 2 sync 1334779296825.305 #f #f)) (indexed-future-event 31 '#s(future-event 2 0 sync 1334779296832.541 [allocate memory] #f)) (indexed-future-event 32 '#s(future-event 2 0 result 1334779296832.549 #f #f)) (indexed-future-event 33 '#s(future-event 2 2 result 1334779296832.562 #f #f)) (indexed-future-event 34 '#s(future-event 2 2 sync 1334779296832.667 #f #f)) (indexed-future-event 35 '#s(future-event 2 0 sync 1334779296836.269 [allocate memory] #f)) (indexed-future-event 36 '#s(future-event 2 0 result 1334779296836.278 #f #f)) (indexed-future-event 37 '#s(future-event 2 2 result 1334779296836.326 #f #f)) (indexed-future-event 38 '#s(future-event 2 2 sync 1334779296836.396 #f #f)) (indexed-future-event 39 '#s(future-event 2 0 sync 1334779296843.481 [allocate memory] #f)) (indexed-future-event 40 '#s(future-event 2 0 result 1334779296843.49 #f #f)) (indexed-future-event 41 '#s(future-event 2 2 result 1334779296843.501 #f #f)) (indexed-future-event 42 '#s(future-event 2 2 sync 1334779296843.807 #f #f)) (indexed-future-event 43 '#s(future-event 2 0 sync 1334779296847.291 [allocate memory] #f)) (indexed-future-event 44 '#s(future-event 2 0 result 1334779296847.3 #f #f)) (indexed-future-event 45 '#s(future-event 2 2 result 1334779296847.312 #f #f)) (indexed-future-event 46 '#s(future-event 2 2 sync 1334779296847.375 #f #f)) (indexed-future-event 47 '#s(future-event 2 0 sync 1334779296854.487 [allocate memory] #f)) (indexed-future-event 48 '#s(future-event 2 0 result 1334779296854.495 #f #f)) (indexed-future-event 49 '#s(future-event 2 2 result 1334779296854.507 #f #f)) (indexed-future-event 50 '#s(future-event 2 2 sync 1334779296854.656 #f #f)) (indexed-future-event 51 '#s(future-event 2 0 sync 1334779296857.374 [allocate memory] #f)) (indexed-future-event 52 '#s(future-event 2 0 result 1334779296857.383 #f #f)) (indexed-future-event 53 '#s(future-event 2 2 result 1334779296857.421 #f #f)) (indexed-future-event 54 '#s(future-event 2 2 sync 1334779296857.488 #f #f)) (indexed-future-event 55 '#s(future-event 2 0 sync 1334779296869.919 [allocate memory] #f)) (indexed-future-event 56 '#s(future-event 2 0 result 1334779296869.947 #f #f)) (indexed-future-event 57 '#s(future-event 2 2 result 1334779296869.981 #f #f)) (indexed-future-event 58 '#s(future-event 2 2 sync 1334779296870.32 #f #f)) (indexed-future-event 59 '#s(future-event 2 0 sync 1334779296879.438 [allocate memory] #f)) (indexed-future-event 60 '#s(future-event 2 0 result 1334779296879.446 #f #f)) (indexed-future-event 61 '#s(future-event 2 2 result 1334779296879.463 #f #f)) (indexed-future-event 62 '#s(future-event 2 2 sync 1334779296879.526 #f #f)) (indexed-future-event 63 '#s(future-event 2 0 sync 1334779296882.928 [allocate memory] #f)) (indexed-future-event 64 '#s(future-event 2 0 result 1334779296882.935 #f #f)) (indexed-future-event 65 '#s(future-event 2 2 result 1334779296882.944 #f #f)) (indexed-future-event 66 '#s(future-event 2 2 sync 1334779296883.311 #f #f)) (indexed-future-event 67 '#s(future-event 2 0 sync 1334779296890.471 [allocate memory] #f)) (indexed-future-event 68 '#s(future-event 2 0 result 1334779296890.479 #f #f)) (indexed-future-event 69 '#s(future-event 2 2 result 1334779296890.517 #f #f)) (indexed-future-event 70 '#s(future-event 2 2 sync 1334779296890.581 #f #f)) (indexed-future-event 71 '#s(future-event 2 0 sync 1334779296894.362 [allocate memory] #f)) (indexed-future-event 72 '#s(future-event 2 0 result 1334779296894.369 #f #f)) (indexed-future-event 73 '#s(future-event 2 2 result 1334779296894.382 #f #f)) (indexed-future-event 74 '#s(future-event 2 2 sync 1334779296894.769 #f #f)) (indexed-future-event 75 '#s(future-event 2 0 sync 1334779296901.501 [allocate memory] #f)) (indexed-future-event 76 '#s(future-event 2 0 result 1334779296901.51 #f #f)) (indexed-future-event 77 '#s(future-event 2 2 result 1334779296901.556 #f #f)) (indexed-future-event 78 '#s(future-event 2 2 sync 1334779296901.62 #f #f)) (indexed-future-event 79 '#s(future-event 2 0 sync 1334779296905.428 [allocate memory] #f)) (indexed-future-event 80 '#s(future-event 2 0 result 1334779296905.434 #f #f)) (indexed-future-event 81 '#s(future-event 2 2 result 1334779296905.447 #f #f)) (indexed-future-event 82 '#s(future-event 2 2 sync 1334779296905.743 #f #f)) (indexed-future-event 83 '#s(future-event 2 0 sync 1334779296912.538 [allocate memory] #f)) (indexed-future-event 84 '#s(future-event 2 0 result 1334779296912.547 #f #f)) (indexed-future-event 85 '#s(future-event 2 2 result 1334779296912.564 #f #f)) (indexed-future-event 86 '#s(future-event 2 2 sync 1334779296912.625 #f #f)) (indexed-future-event 87 '#s(future-event 2 0 sync 1334779296916.094 [allocate memory] #f)) (indexed-future-event 88 '#s(future-event 2 0 result 1334779296916.1 #f #f)) (indexed-future-event 89 '#s(future-event 2 2 result 1334779296916.108 #f #f)) (indexed-future-event 90 '#s(future-event 2 2 sync 1334779296916.243 #f #f)) (indexed-future-event 91 '#s(future-event 2 0 sync 1334779296927.233 [allocate memory] #f)) (indexed-future-event 92 '#s(future-event 2 0 result 1334779296927.242 #f #f)) (indexed-future-event 93 '#s(future-event 2 2 result 1334779296927.262 #f #f)) (indexed-future-event 94 '#s(future-event 2 2 sync 1334779296927.59 #f #f)) (indexed-future-event 95 '#s(future-event 2 0 sync 1334779296934.603 [allocate memory] #f)) (indexed-future-event 96 '#s(future-event 2 0 result 1334779296934.612 #f #f)) (indexed-future-event 97 '#s(future-event 2 2 result 1334779296934.655 #f #f)) (indexed-future-event 98 '#s(future-event 2 2 sync 1334779296934.72 #f #f)) (indexed-future-event 99 '#s(future-event 2 0 sync 1334779296938.773 [allocate memory] #f)) )) ] @interaction-eval-show[ #:eval future-eval (timeline-pict better-log #:x 0 #:y 0 #:width 600 #:height 300) ] The problem is that most every arithmetic operation in this example produces an inexact number whose storage must be allocated. While some allocation can safely be performed exclusively without the aid of the runtime thread, especially frequent allocation requires synchronized operations which defeat any performance improvement. By using @tech{flonum}-specific operations (see @secref["fixnums+flonums"]), we can re-write @racket[mandelbrot] to use much less allocation: @interaction-eval[ #:eval future-eval (define good-log (list (indexed-future-event 0 '#s(future-event #f 0 create 1334778395768.733 #f 3)) (indexed-future-event 1 '#s(future-event 3 2 start-work 1334778395768.771 #f #f)) (indexed-future-event 2 '#s(future-event 3 2 complete 1334778395864.648 #f #f)) (indexed-future-event 3 '#s(future-event 3 2 end-work 1334778395864.652 #f #f)) )) ] @racketblock[ (define (mandelbrot iterations x y n) (let ([ci (fl- (fl/ (* 2.0 (->fl y)) (->fl n)) 1.0)] [cr (fl- (fl/ (* 2.0 (->fl x)) (->fl n)) 1.5)]) (let loop ([i 0] [zr 0.0] [zi 0.0]) (if (> i iterations) i (let ([zrq (fl* zr zr)] [ziq (fl* zi zi)]) (cond [(fl> (fl+ zrq ziq) 4.0) i] [else (loop (add1 i) (fl+ (fl- zrq ziq) cr) (fl+ (fl* 2.0 (fl* zr zi)) ci))])))))) ] This conversion can speed @racket[mandelbrot] by a factor of 8, even in sequential mode, but avoiding allocation also allows @racket[mandelbrot] to run usefully faster in parallel. Executing this program yields the following in the visualizer: @interaction-eval-show[ #:eval future-eval (timeline-pict good-log #:x 0 #:y 0 #:width 600 #:height 300) ] Notice that only one green bar is shown here because one of the mandelbrot computations is not being evaluated by a future (on the runtime thread). As a general guideline, any operation that is inlined by the @tech{JIT} compiler runs safely in parallel, while other operations that are not inlined (including all operations if the JIT compiler is disabled) are considered unsafe. The @exec{raco decompile} tool annotates operations that can be inlined by the compiler (see @secref[#:doc '(lib "scribblings/raco/raco.scrbl") "decompile"]), so the decompiler can be used to help predict parallel performance. @close-eval[future-eval]