whalesong/NOTES

Some possible optimizations with application:

    If any of the operands are constant (either by being variable
    lookups or literal constants), and if all of them are side-effect
    free, then juggle-operands might not be necessary.

    In a self-application, it's not necessary to compute the operator,
    since the value is in the top control frame.  A parameterization
    can maintain the current lam in the top of the control frame.
    Given that, then there's no need to juggle operands either, since
    we can grab the operator afterwards and put it in place.

    For a kernel primitive call, if all of the operands are all
    constant, stack references, or kernel primitive calls, then
    there's no need to push for fresh stack space.


----------------------------------------------------------------------


Multiple values

There's interplay between compile-proc-appl and the linkage compiling
functions compile-linkage and compile-application-linkage.  When we
deal with multiple values, we'll have to do something here to make the
values efficient.  There's a paper by J. Michael Ashley and R. Kent
Dybvig called "An Efficient Implementation of Multiple Return Values
in Scheme" that I'll need to read.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.39.1668&rep=rep1&type=pdf


Basic idea: each return address is actually a pair, where the
secondary address lies at a fixed offset of the first and handles
multiple value return.  Multiple values are returned back by keeping
them on the stack, and assigning argcount to the number of the
returned values.


In the context of my compiler: the compiler implicitly defines a
singleton, statement context by using next-linkage.  But some uses of
next-linkage ignore the number of values that come back, and others
should raise an error.  Here are the contexts that care:

    app
    let1
    install-value
    toplevel-set (define-values, assign)


For the contexts that don't care, we need to set up a return address
that just pops those values off.


----------------------------------------------------------------------


Open coding:

I want to be able to write the definitions of kernel primitives once,
and reuse those definitions for both the open-coding as well as the
real runtime.  I also need to be able to encode the type checks.  I
want to be able to say:


(make-kernel-primitive '+
                       (arity 0 #t)

                       (lambda (args)
			 (values (mapi (lambda (arg i)
					 (test arg i number?))
				       arg)
				 (string-join args "+"))))

and have it magically generate the definitions for the open-coding
primitive as well as:

    PRIMITIVES["+"] = function(MACHINE, arity) {
                          var result = 0;
                          for (var i = 0 ; i < arity; i++) {
                              test(isNumber(MACHINE.env[MACHINE.env.length - 1 - i]),
                                   i,
                                   "number");
                              result += MACHINE.env[MACHINE.env.length - 1 - i];
			  }
                          return result;
                      };

Is this completely unrealistic?  I have to see how Rabbit and Orbit do this.


----------------------------------------------------------------------

Runtime values and types are in in the plt.runtime namespace.  I need
to move types from WeScheme into here.


----------------------------------------------------------------------


Frames and environments.


A CallFrame consists of:

   A return address back to the caller.
   A procedure (the callee).
   A stack.
   A set of continuation marks.


A PromptFrame consists of:

   A return address back to the caller.
   A tag.
   A set of continuation marks.


On exit from a CallFrame,

    MACHINE.env = frame.env


On a regular, generic function call:

    The operator and operands are computed and placed in MACHINE.env's
    scratch space.

    A new call frame is constructed.  The frame remembers the environment.

    The machine jumps into the procedure entry.


On a tail call,

    The operator and operands are computed and placed in MACHINE.env's
    scratch space.

    The existing call frame is reused.
        The frame's environment consumes those elements from MACHINE.env
        MACHINE.env = the new stack segment


Optimizations with IL

The sequence PushEnvironment ... AssignImmediateStatement (EnvLexicalAddress ...)
where we're assigning directly to a spot we just allocated, can be reduced to
a single instruction.

We can do some constant folding in operands.  e.g.

    MACHINE.env[MACHINE.env.length - 1 - 3] = MACHINE.env[MACHINE.env.length - 1 - 7];

=>

    MACHINE.env[MACHINE.env.length - 4] = MACHINE.env[MACHINE.env.length - 8];


On tail calls, when we're reusing all of the arguments on the stack,
there's no need to splice, since we won't be popping anything off:

   MACHINE.env.splice(MACHINE.env.length - (MACHINE.argcount + ((10) - MACHINE.argcount)), ((10) - MACHINE.argcount));

is a no-op.


In the case where a closure has a prefix, but all the uses of the prefix are to open-coded primitives, then we don't need to close over it after all.  e.g.

    (test '(begin (letrec ([f (lambda (x) (* x x))]
                           [g (lambda (x) (* x x x))])
                    (- (g (f (+ (g 3) (f 3)))) 1)))
          2176782335
          #:debug? #t)

since (* -) are both open-coded, there's no need to capture the
prefix, and we can reduce some allocation.


I can eliminate the first instruction in the pair:


    #(struct:AssignPrimOpStatement val #(struct:GetCompiledProcedureEntry))
    #(struct:GotoStatement #(struct:Label lamEntry259))


since the val isn't even being used here...  This is the case when we
statically know the lambda target.


    - this is done now.


I can coalese

     (PushEnvironment 1 #f)
     (AssignPrimOpStatement (EnvLexicalReference 0 #f) (MakeCompiledProcedure 'lamEntry265 1 '(2 1) 'diff)

into a single statement.


If lambdas don't escape, then we can make their closures empty by
simply explicitly passing in the free arguments.