Compile a `for[*]/list` form to behave more like `map` by `cons`ing
onto a recursive call, instead of accumulating a list to reverse.
This style of compilation requires a different strategy than before.
A form like
(for*/fold ([v 0]) ([i (in-range M)]
[j (in-range N)])
j)
compiles as nested loops, like
(let i-loop ([v 0] [i 0])
(if (unsafe-fx< i M)
(i-loop (let j-loop ([v v] [j 0])
(if (unsafe-fx< j N)
(j-loop (SEL v j) (unsafe-fx+ j 1))
v))
(unsafe-fx+ i 1))
v))
instead of mutually recursive loops, like
(let i-loop ([v 0] [i 0])
(if (unsafe-fx< i M)
(let j-loop ([v v] [j 0])
(if (unsafe-fx< j N)
(j-loop (SEL v j) (unsafe-fx+ j 1))
(i-loop v (unsafe-fx+ i 1))))
v))
The former runs slightly faster. It's difficult to say why, for
certain, but the reason may be that the JIT can generate more direct
jumps for self-recursion than mutual recursion. (In the case of mutual
recursion, the JIT has to generate one function or the other to get a
known address to jump to.)
Nested loops con't work for `for/list`, though, since each `cons`
needs to be wrapped around the whole continuation of the computation.
So, the `for` compiler adapts, depending on the initial form. (With a
base, CPS-like approach to support `for/list`, it's easy to use the
nested mode when it works by just not fully CPSing.)
Forms that use `#:break` or `#:final` use the mutual-recursion
approach, because `#:break` and #:final` are easier and faster that
way. Internallt, that simplies the imoplementation. Externally, a
`for` loop with `#:break` or `#:final` can be slightly faster than
before.