Some possible optimizations with application: If any of the operands are constant (either by being variable lookups or literal constants), and if all of them are side-effect free, then juggle-operands might not be necessary. I think this is similar to the "reorder" optimization described in casey's paper. In a self-application, it's not necessary to compute the operator, since the value is in the top control frame. A parameterization can maintain the current lam in the top of the control frame. Given that, then there's no need to juggle operands either, since we can grab the operator afterwards and put it in place. For a kernel primitive call, if all of the operands are all constant, stack references, or kernel primitive calls, then there's no need to push for fresh stack space. ---------------------------------------------------------------------- Multiple values There's interplay between compile-proc-appl and the linkage compiling functions compile-linkage and compile-application-linkage. When we deal with multiple values, we'll have to do something here to make the values efficient. There's a paper by J. Michael Ashley and R. Kent Dybvig called "An Efficient Implementation of Multiple Return Values in Scheme" that I'll need to read. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.39.1668&rep=rep1&type=pdf Basic idea: each return address is actually a pair, where the secondary address lies at a fixed offset of the first and handles multiple value return. Multiple values are returned back by keeping them on the stack, and assigning argcount to the number of the returned values. In the context of my compiler: the compiler implicitly defines a singleton, statement context by using next-linkage. But some uses of next-linkage ignore the number of values that come back, and others should raise an error. Here are the contexts that care: app let1 install-value toplevel-set (define-values, assign) For the contexts that don't care, we need to set up a return address that just pops those values off. Before introducing the multiple-value jumps (172b1d9e5de823b53a6705fc87babfdd61152924), test-conform-browser reports the following times: fermi ~/work/js-sicp-5-5 $ racket test-conform-browser.rkt running test... ok (5248 milliseconds) fermi ~/work/js-sicp-5-5 $ racket test-conform-browser.rkt running test... ok (5478 milliseconds) fermi ~/work/js-sicp-5-5 $ racket test-conform-browser.rkt running test... ok (5501 milliseconds) fermi ~/work/js-sicp-5-5 $ racket test-conform-browser.rkt running test... ok (5853 milliseconds) fermi ~/work/js-sicp-5-5 $ racket test-conform-browser.rkt running test... ok (5532 milliseconds) fermi ~/work/js-sicp-5-5 $ racket test-conform-browser.rkt running test... ok (5498 milliseconds) fermi ~/work/js-sicp-5-5 $ racket test-conform-browser.rkt running test... ok (5351 milliseconds) fermi ~/work/js-sicp-5-5 $ racket test-conform-browser.rkt running test... ok (5464 milliseconds) fermi ~/work/js-sicp-5-5 $ racket test-conform-browser.rkt running test... ok (5545 milliseconds) fermi ~/work/js-sicp-5-5 $ racket test-conform-browser.rkt running test... ok (5405 milliseconds) After introducing the mutiple value jumps targets (cc1c156df79bab09ca37164e75ae0afe0ac1b0d0), test-conform-browser is reporting the following times: running test... ok (5281 milliseconds) fermi ~/work/js-sicp-5-5 $ racket test-conform-browser.rkt running test... ok (5554 milliseconds) fermi ~/work/js-sicp-5-5 $ racket test-conform-browser.rkt running test... ok (5588 milliseconds) fermi ~/work/js-sicp-5-5 $ racket test-conform-browser.rkt running test... ok (5509 milliseconds) fermi ~/work/js-sicp-5-5 $ racket test-conform-browser.rkt running test... ok (5428 milliseconds) fermi ~/work/js-sicp-5-5 $ racket test-conform-browser.rkt running test... ok (5387 milliseconds) fermi ~/work/js-sicp-5-5 $ racket test-conform-browser.rkt running test... ok (5539 milliseconds) fermi ~/work/js-sicp-5-5 $ racket test-conform-browser.rkt running test... ok (5355 milliseconds) fermi ~/work/js-sicp-5-5 $ racket test-conform-browser.rkt running test... ok (5551 milliseconds) fermi ~/work/js-sicp-5-5 $ racket test-conform-browser.rkt running test... ok (5331 milliseconds) At a rough glance, I see no appreciable extra cost for this program, since it doesn't use multiple-value-return. Thankfully, it looks like the JIT in JavaScript isn't significantly hurt when we set the attribute to the procedure. What's left to do: forms for using the values coming from multiple value returns (with-values, define-values, let-values) runtime error traps for contexts that must not receive multiple values. fixing apply definition so it doesn't return multiple values when given a single argument. \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ ---------------------------------------------------------------------- Open coding: I want to be able to write the definitions of kernel primitives once, and reuse those definitions for both the open-coding as well as the real runtime. I also need to be able to encode the type checks. I want to be able to say: (make-kernel-primitive '+ (arity 0 #t) (lambda (args) (values (mapi (lambda (arg i) (test arg i number?)) arg) (string-join args "+")))) and have it magically generate the definitions for the open-coding primitive as well as: PRIMITIVES["+"] = function(MACHINE, arity) { var result = 0; for (var i = 0 ; i < arity; i++) { test(isNumber(MACHINE.env[MACHINE.env.length - 1 - i]), i, "number"); result += MACHINE.env[MACHINE.env.length - 1 - i]; } return result; }; Is this completely unrealistic? I have to see how Rabbit and Orbit do this. ---------------------------------------------------------------------- Runtime values and types are in in the plt.runtime namespace. I need to move types from WeScheme into here. ---------------------------------------------------------------------- Frames and environments. A CallFrame consists of: A return address back to the caller. A procedure (the callee). A stack. A set of continuation marks. A PromptFrame consists of: A return address back to the caller. A tag. A set of continuation marks. On exit from a CallFrame, MACHINE.env = frame.env On a regular, generic function call: The operator and operands are computed and placed in MACHINE.env's scratch space. A new call frame is constructed. The frame remembers the environment. The machine jumps into the procedure entry. On a tail call, The operator and operands are computed and placed in MACHINE.env's scratch space. The existing call frame is reused. The frame's environment consumes those elements from MACHINE.env MACHINE.env = the new stack segment Optimizations with IL The sequence PushEnvironment ... AssignImmediateStatement (EnvLexicalAddress ...) where we're assigning directly to a spot we just allocated, can be reduced to a single instruction. We can do some constant folding in operands. e.g. MACHINE.env[MACHINE.env.length - 1 - 3] = MACHINE.env[MACHINE.env.length - 1 - 7]; => MACHINE.env[MACHINE.env.length - 4] = MACHINE.env[MACHINE.env.length - 8]; On tail calls, when we're reusing all of the arguments on the stack, there's no need to splice, since we won't be popping anything off: MACHINE.env.splice(MACHINE.env.length - (MACHINE.argcount + ((10) - MACHINE.argcount)), ((10) - MACHINE.argcount)); is a no-op. In the case where a closure has a prefix, but all the uses of the prefix are to open-coded primitives, then we don't need to close over it after all. e.g. (test '(begin (letrec ([f (lambda (x) (* x x))] [g (lambda (x) (* x x x))]) (- (g (f (+ (g 3) (f 3)))) 1))) 2176782335 #:debug? #t) since (* -) are both open-coded, there's no need to capture the prefix, and we can reduce some allocation. I can eliminate the first instruction in the pair: #(struct:AssignPrimOpStatement val #(struct:GetCompiledProcedureEntry)) #(struct:GotoStatement #(struct:Label lamEntry259)) since the val isn't even being used here... This is the case when we statically know the lambda target. - this is done now. I can coalese (PushEnvironment 1 #f) (AssignPrimOpStatement (EnvLexicalReference 0 #f) (MakeCompiledProcedure 'lamEntry265 1 '(2 1) 'diff) into a single statement. If lambdas don't escape, then we can make their closures empty by simply explicitly passing in the free arguments. There's no good reason why the IL has both AssignImmediateStatement and AssignPrimOpStatement. The distinction is artificial because I'm allowing the RHS of assignments to use arbitrary expressions, since my runtime (JavaScript) supports it. I should consolidate these structures; it may allow me to remove a few more instructions (like setting ControlLabel to 'val). flush-output must immediately yield control to the browser, because the browser needs control back to display changes to the dom. Basically, we're simulating an IO interrupt here... April 17, 2011 The dynamic recomputation for gas is only controlling one parameter: how many times to run the trampoline before bouncing off to the browser. But we really have two parameters that need dynamic computation * FN: the number of function calls before invoking the trampoline. FN is necessarily bounded above by the browser. The larger it is, the more efficient the trampoline can be. * TI: the number of trampoline invokations before yielding to the browser Both of these should be under some dynamic controller. We want to optimize the efficiency of the runtime. I don't know what the function is, but we want to optimize the parameters FN and TI such that it maximizes FN and minimizes TI, and yet gives us the browser reactivity we want.