racket/doc/release-notes/stepper/DESIGN-NOTES

[Ed. Note.: Much of this refers to the Zodiac version of the stepper, which
I am busily replacing at this very moment.  jbc, 2001-12-04]

variable references: there are three kinds of variable references:
1) bound variable refs
2) unit-bound variable refs
3) top-level variable refs

You might be forgiven for some confusion: these three appear to overlap
heavily.  Here are more accurate defintions for each one:

unit-bound variable references are those which occur as the left-hand
sides of top-level definitions within a unit.

bound variable references are those which occur within the scope of a
lambda, case-lambda, let, let*, letrec, or other form which introduces a
limited lexical scope.  This includes `local', but not the unit-bound
variables mentioned above.

top-level references are the rest of the references.

One difference between top-level and bound varrefs are the way that they
are handled at runtime.  Top-level varrefs are looked up in a table; if
they are not found in this table, a runtime error is signalled.  Note
that this lookup occurs only when the varref is evaluated, not when it
is first `encountered' (e.g., in the body of a closure). One reason that
this mechanism is necessary is that a Scheme REPL permits top-level
references to variables that have not yet been defined.

Bound varrefs have a known lexical binding location, and they can be
looked up directly, rather than going through the indirection of
checking a table.  These variables may be introduced by forms like
`letrec' or `local', and they may furthermore be used before their
binding definition has been evaluated.  In this case, they have the
`<undefined>' value.  In most language levels, a reference to a variable
which contains the `<undefined>' value is an error.  In such a language
level, any variable which may have this value must be checked on every
evaluated reference.

So here's the problem: unit-bound varrefs are similar to those inside a
`local'.  Syntactically, their bindings are introduced by `define', and
their scope extends in both directions. Semantically they are similar to
bound variables, in that the interpreter can lexically fix the binding
of the variable.  In both of these regards they are similar to the
bindings in a `local'.  However, zodiac does not parse them like those
in a `local'.  Rather, it parses them as `top-level-varref's.  Why? I
forget, and I'm about to ask Matthew yet again.  Then I'll record the
answer here.

Now things get a bit more complicated.  Top-level varrefs never need to
be checked for the '<undefined>' value; before they are bound, they have
no runtime lookup location at all.  Bound varrefs and unit varrefs, on
the other hand, may contain the `<undefined>' value.  In particular,
those bound by letrec, local, and units may contain this value.  Others,
like those bound by lambda, let, and let*, will not.  For the first and
third categories, we do not need to check for the undefined value at
runtime.  Only when we are looking at a bound or unit varref which may
contain the `<undefined>' value do we need to insert a runtime check.

*******

Another topic entirely is that of sharing.  When a break occurs, the
stepper reconstructs the state of memory. However, two closures may
refer to the same binding. For instance,

(define-values (setter getter)
  (let ([a '*undefined*])
    (values
     (lambda (x) (set! a x))
     (lambda () a))))

If each closure is linked to a record of the form (lambda ()
values-of-free-vars), there's no way to tell whether the first and
second closure refer to the same binding of a or not.  So in this case,
we must devise some other technique to detect sharing.  A simple one
suggested by Matthew is to store mutators in the closure record; then,
sharing can be detected by the old bang-one-and-see-if-the-other-changes
technique.

*********

A note about source locations: I'm using the "start" locations of sexps
(assigned by Zodiac) to uniquely identify those expressions: I don't
believe there are any instances where two expressions share a start
location.

Later: this is now obsolete: I'm just storing the parsed zodiac
expressions.  Forget all of this source correlation crap. Zodiac does it
for me.

[Ed. Note: this observation turned out to be completely wrong. cf. later
notes.]

*********

Robby has a good point: Matthew's technique for detecting gaps in the
continuation-mark chain (look for applications whose arguments are fully
evaluated but are still on the list of current marks) depends on the
assumption that every "jump site" has the jump as its tail action.  In
other words, what about things like "invoke-unit/open", which jumps to
some code, evaluates it, >then comes back and binds unit values in the
environment<.  In this case, the "invoke-unit/open" continuation will
not be handed directly to the evaluation of the unit, because work
remains to be done after the evaluation of the unit's definitions.
Therefore, it will be impossible to tell when un-annotated code is
appearing on the stack in uses of "invoke-unit/open."  Problem.

*********

So what the heck does a mark contain for the stepper? it looks like this:

(lambda () (list <source-expr> <var-list>))

with

var-list = (list-of var)

and

var = (list <val> z:varref)

*********

Let me say a few words here about the overall structure of the
annotator/stepper combination. We have a choice when rebuilding the
source: we can follow the source itself, or we can follow the parsed
expression emitted by zodiac.  If our task is simply to spit out source
code, then it's clear that we should simply follow the source.  However,
we need to replace certain variables with the values of their bindings
(in particular, lambda-bound ones). Well, in beginner mode anyway...

*******

Okay, I'm about to extend the stepper significantly, and I want to do at
least a little bit of design work first.  The concept is this: I want
the stepper to stop _after_ each reduction, as well as before it.  One
principal difference between the new and old step types is that in the
new one, the continuation cannot be rectified entirely based upon the
continuation marks; the value that is produced by the expression in
question is also needed.

Here's a question: can I prove, for the setup I put together, that the
part of the continuation _outside_ the highlighted region does not
change? This should be the case; after all, the continuation itself does
not change.

Of course, there are some reductions which do not immediately produce a
value; procedure applications, and ... uh oh, what about cond and if
expressions?  We want the stepper to use the appropriate "answer" as the
"result" of the step.  So there's some context sensitivity here.

Wait, maybe not. It seems like _every_ expression will have to have a
"stop on entry" step. Further, these types of steps will _not_ have
values associated with them. Hmmm....

Okay, this isn't that hard.  Yes, it's true that every expression that
becomes ... no, it's not obvious that the expression which is
substituted ... jesus, it's not even always the case that a
"substitution" occurs in the simplistic sense I'm imagining.  Damn, I
wish my reduction semantics were finished.

(Much later): The real issue is that the "stop-on-enter" code is
inserted based on the surrounding code, and


So, here's the next macro we need to handle: define-struct.


*********

Don't forget a test like

(cond [blah]
      [else cond [blah] [blah]])


**********

Okay, I'm a complete moron.  In particular, I threw out all of the
source correlation code a week ago because I somehow convinced myself
that the parsed expressions retained references to the read expressions.
That's not true; all that's kept is a "location" structure, which
records the file and offset and all that jazz.

So I tried to fix that by inserting these source expressions into the
marks, along with the parsed expressions.  This doesn't work because I
need to find the read expressions for expressions that don't get
marks...  or do I?  Yes, I do.  In particular, to unparse (define a 3),
I need to see the read expression to know that it wasn't really
(define-values (a) (values 3)).

Maybe I can add a field to zodiac structures a la maybe-undefined?

************

That worked great!

************

Man, there's a lot of shared code in here.

************

Okay, back to the drawing board on a lot of things.

1) Matthias and Robby are of the opinion that the break for an expression
should be triggered only when that expression becomes the redex.  For
example, the breakpoint for an if expression is triggered _after_ the
test expression is evaluated.

2) I've realized that I need a more general approach in the annotater to
handle binding constructs other than lambda.  In particular, the new
scheme handles top-level variables differently than lexically bound ones.
In particular, the mark for an expression contains the value of a
top-level variable if (1) the variable occurs free in the expression, and
(2) the expression is on the spine of the current procedure or definition.
Lexically bound variables are placed in the mark if (1) they occur free in
the expression, and (2) they are in tail position relative to the innermost
binding expression for the variable.

*** Wait, no.  This is crap, because the bodies of lambdas need to store
all free variables, regardless of whether they're lexically tail w.r.t.
the binding occurrence. Maybe it really would just be easier to do this
in two passes.  How would this work?  One pass would attach the free
variables to each expression.  Then, the variables you must store in the
mark for an expression are those which (1) occur free and (2) are not
contained in some lexically enclosing expression. I guess we can use the
register-client ability of zodiac for this...

We're helped out in the lexical variables by the fact that zodiac
renames all lexically bound variables, so no two bindings have the same
name. Of course, that's not the case for the special variables inserted
by the annotator.  Most of these ... well, no, all of these will have to
appear in marks now.  The question is whether they'll ever fight with
each other.  In the case of applications, I'm okay, because the only
expressions which appear in tail ... wait, wait, the only problem that I
could have here arises when top-level variables have the same names as
lexically bound ones, and since all of the special ones are lexically
bound, this is fine.


************

I'm taking these comments out of the program file.  They just clutter
things up.

           ; make-debug-info takes a list of variables and an expression and
           ; creates a thunk closed over the expression and (if bindings-needed is true)
           ; the following information for each variable in kept-vars:
           ; 1) the name of the variable (could actually be inferred)
           ; 2) the value of the variable
           ; 3) a mutator for the variable, if it appears in mutated-vars.
           ; (The reason for the third of these is actually that it can be used
           ;  in the stepper to determine which bindings refer to the same location,
           ;  as per Matthew's suggestion.)
           ;
           ; as an optimization:
           ; note that the mutators are needed only for the bindings which appear in
           ; closures; no location ambiguity can occur in the 'currently-live' bindings,
           ; since at most one location can exist for any given stack binding.  That is,
           ; using the source, I can tell whether variables referenced directly in the
           ; continuation chain refer to the same location.

           ; okay, things have changed a bit.  For this iteration, I'm simply not going to
           ; store mutators.  later, I'll add them in.


************

Okay, I'm back to the one-pass scheme, and here's how it's going to
work.  Top-level variables are handled differently from lexically bound
ones.  Annotate/inner takes an expression to annotate, and a list of
variables whose bindings the current expression is in tail position to.
This list may optionally also hold the symbol 'all, which indicates that
all variables which occur free should be placed in the mark.


***********

Regarding the question: what the heck is this lexically-bound-vars
argument to annotate-source-expr?  The answer is that if we're
displaying a lambda, we do not have values for the variables whose
bindings are the arguments to the lambda.  For instance, suppose we
have:

(define my-top-level 13)

(define my-closure
  (lambda (x) (x top-level)))

When we're displaying my-closure, we better not try to find a value for
x when reconstructing the body, as there isn't one.

*************

This may come back to haunt me: the temporary variables I'm introducing
for applications and 'if's are funny: they have no bindings.  They have
no orig-name's.  They _must_ be expanded, always. This may be a problem
when I stop displaying the values of lambda-bound variables.

***************

currently on the stack:

yank all of that 'comes-from-blah' crap if read->raw works.

*************

annotater philosophy: don't look at the source; just expand based on the
parsed expression.  The information you need to reconstruct the

*************

for savings, I could elide the guard-marks on all but the top level.

***********

months later; October 99.

major reorganization, along a model-view-controller philosophy. Here's
how it works:

The view and controller (for the regular stepper) are combined in a gui
unit.  This unit takes a text%, handles all gui stuff, and invokes the
model unit (one for each stepping).

The model unit is a compound unit.  It consists of the annotater, the
reconstructor, and the model unit itself.

Gee whiz; there's so much stuff I haven't talked about.  Like for
instance the fact that the stepper now has before and after steps.  The
point of this reorganization is to permit a natural test suite.  Jesus,
that's been a long time coming. At some point, I'm also hoping to
combine the stepper into the main DrScheme frame.

Oh yes, another major change was that evaluation is now strictly on a
one- expression-at-a-time basis.  The read, parse, and step are now done
indiv- idually for each expression.  This has the ancillary benefit that
there's no longer any need to reconstruct _all_ of the old expressions
at every step.

************

You know, I should never have started that ******** divider.  I have no
idea how many stars are supposed to be there.  Oh well.

************

The version for DrS-101 is out, and I've restructured the stepper into a
"model/view/controller" architecture, primarily to ease testing.  Of
course, I haven't actually written the tester yet. So now, the view and
controller are combined in stepper-view-controller.ss, and the model
(instantiated once per step-process) is in stepper-model.ss.  In fact,
the view-controller is also instantiated once per step-process, so I'm
not utilizing the division in that way, but the tester will definitely
want to instantiate the model repeatedly.

***********

I also want to comment a little bit on some severe ugliness regarding
pretty- printing.  The real problem is how to use the existing
pretty-print code, while still having enough control to highlight in the
right locations.

Okay, let me explain this one step at a time.

The way the pretty-printer currently works is this: there are four hooks
into the pretty-printing process.  The first one is used to determine
the width of an element.  The result of this procedure is used to decide
whether a line break is necessary.  However, this hook is _also_ used to
determine whether or not the pretty-printer will try to print the string
itself or hand off responsibility to the display-handler hook.  In other
words, if the width- hook procedure returns a non-false value, then the
display-handler will be called to print the actual string.  The other
pair of hook procedures is first, a procedure which is called _before_
display of any subexpression, and one which is called _after_ display of
any subexpression.

So how does the stepper use this to do its work?  Well, the stepper has
two tricky tasks to accomplish.  First, it must highlight some
subexpression.  Second, it must manually insert elements (i.e. images)
which the pretty-printer does not handle.

Let's talk about images first.  In order to display images, the
width-hook procedure detects images and (if one is encountered) returns
a width explicitly. (Currently that width is always one, which can lead
to display errors, but let's leave that for later.)  Remember, whenever
the width returned by this hook is non-false, the display handler will
be called to insert the object.  That's perfect: the display hander
inserts the image just fine.

One down, one to go.

The stepper needs to detect the beginning of the (let's call it the)
redex.  The obvious way to do this is (almost) the right way: the
before-printing handler checks to see whether the element about to be
printed is the redex (by an eq?-test).  If so, it sets the beginning of
the highlight region.  A corresponding test determines the end of the
highlight region.  When the pretty-printing is complete, we highlight
the desired region.  Fine.

BUT, sometimes we want to highlight things like numbers and symbols; in
other words, non-heap values.  For instance, suppose I tell you that the
expression that we're printing is (if #t #t #t) and that you're supposed
to be highlight- ing the #t.  Well, I can't tell which of the #t's you
want to highlight.  So this isn't enough information.

To solve this problem, the result of the reconstructor is split up into
two pieces: the reconstructed stuff outside the box, with a special
gensym occurring where the redex should be, and a separate expression
containing the redex.  Now at least the displayer has enough information
to do its job.

Now, what happens is that when the width-hook runs into the special
gensym, it knows that it must insert the redex.  Well, that's fine, but
remember, if this procedure wants to take control of the printing
process, it must do so by returning the width of the printed object, and
then this object must be printed by the display-hook.  The problem here
is that neither of these procedures have the faintest idea about
line-breaks; that's the pretty- printer's job.  In other words, this
solution only works for things (like numbers, symbols and booleans)
which cannot be split across lines. What do we do?

Well, the solution is ugly.  Remember, the only reason we had to resort
to this baroque solution in the first place is that values like numbers,
symbols, and booleans couldn't be identified uniquely by eq?.  So we
take a two- pronged approach.  For non-confusable values, we insert them
in place of the gensym before doing the printing.  For confusable
values, we leave the placeholder in and take control of the printing
process manually.

In other words, the _only_ reason this solution works is because of the
chance overlap between confusable values and non-breakable values.  To
be more precise, it just so happens that all confusable values are non-
line-breakable.

Lucky.

And Ugly.

*****

January, 2000

I'm working on the debugger, now, and in particular extending the
annotater to handle all of the Zodiac forms. Let and Letrec turn out to
be quite ugly.  I'm still a little unsure about certain aspects of
variable references, like for example whether or not they stay renamed,
or whether they return to their original names. [ed. note: they get new
uninterned symbols that print like their original names]

But that's not what I'm here to talk about.  No, the topic of the day is
'floating variables.'  A floating variable is one whose value must be
captured in a continuation mark even though it doesn't occur free in the
expression that the wcm wraps.  Let me give an example:

(unit/sig some-sig^
  (import)

  (define a 13)
  (define b (wcm <must grab a> (+ 3 4))))

In this case, the continuation-mark must hold the value of a, even
though a does not occur free in the rhs of b's definition.  Floating
variables are stored in a parameter of annotate/inner.  In other words,
they propagate downward.  Furthermore, they're subject to the same
potential elision as all other variables; you only need to store the
ones which are also contained in the set tail-bound.  Also note that
(thank God) Zodiac standardizes names apart, so we don't need to worry
about duplications.  Also note that floating variables may only be
bound-varrefs.

********

Okay, well that doesn't work at all; dynamic scope blows it away
completely.  For instance, imagine the following unit:

(unit/sig some-sig^
  (import sig-that-includes-c^)

  (define a 13)
  (define b (c)))

Now, during the execution of c, there's no mark on the stack which holds
the bindings of a. DUH! I can't believe I didn't think of this before.
Okay, one possible solution for this would be to use _different keys_
for the marks, so that a mark on the unit-evaluation-continuaiton could
be retained.


*********


Okay, time to do units.  Compound units are dead easy.  Just wrap them
in a wcm that captures all free vars.  No problemo.  Normal units are
more tricky, because of their scoping rules.  Here's my canonical
translation:

(unit
  (import vars)
  (export vars)

  (define a a-exp)

  b

  (define c c-exp)

  d

  etc.)

... goes to ...

(unit
  (import vars)
  (export vars)

  (wcm blah ; including imported vars
    (begin
      (set! a a-exp)
      b
      (set! c c-exp)
      d))

   (define a a)
   (define c c)
   ...)

************

Well, I still haven't written the code to annotate units, so it's a damn
good thing I wrote down the transformation.  I'm here today (thank you
very much) to talk about annotation schemes.

I just (okay, a month ago --- it's now 2000-05-23) folded aries into the
stepper. the upshot of this is that aries now supports two different
annotation modes: "cheap-wrap," which is what aries used to do, and the
regular annotation, used for the algebraic stepper.

However, I'm beginning to see a need for a third annotation, to be used
for (non- algebraic) debugging.  In particular, much of the bulk
involved in annotating the program source is due to the strict algebraic
nature of the stepper.  For instance, I'm now annotating lets.  The
actual step taken by the let is after the evaluation of all bindings.
So we need a break there.  However, the body expression is _also_ going
to have a mark and a break around it, for the "result-break" of the let.
I thought I could leave out the outer break, but it doesn't work.
Actually, maybe I could leave out the inner one.  Gee whiz.  This stuff
is really complicated.

*************

Okay, well, I figured all that stuff out, but now I've got to
restructure the reconstructor to handle lifting---PRE-lifting, that
is---on let/letrec/local.  In particular, the reconstruct-inner function
will now return four things: the free bindings, the reconstructed expr,
the "before" definitions, and the "after" definitions.  These before and
after definitions are wrapped around the current set of generated
definitions.  Case in point; I'm about to execute the (+ 7 8) in the
following expression:

(let ([a 4]
      [b (let ([h 3]
               [i (+ 7 8)]
               [j 9])
            (+ h i j))]
      [c 19])
   (+ a b c))

How do we reconstruct this?  Well, first we reconstruct the (+ 7 8) itself, that's
easy.  Then, we encounter a let.  The return value of this will be the _before_
expressions:
(define ~h~0 3)
the _after_expressions:
(define ~i~0 (+ 7 8))
(define ~j~0 9)
and the reconstructed expression:
(+ ~h~0 ~i~0 ~j~0)

Now, we recur, using the reconstructed expression.  The next step outward is _also_
a let, so we get the following before expressions:
(define ~a~0 4)
the following after expressions:
(define ~b~0 (+ ~h~0 ~i~0 ~j~0))  <---here is where the reconstructed expr appears
(define ~c~0 19)
and the reconstructed expression:
(+ ~a~0 ~b~0 ~c~0)

So then, the final assembly occurs when the "before" expressions are
slapped together, last first, then the "after" expressions, first first,
and then whatever reconstructed expression is left over.

Ugh.

***********

Wow. more complications.  Here's the new problem.  Let's say I have an
expression like this:

(define (make-thunk)
  (let ([lexical-binding 14]
        [returned-thunk (lambda () lexical-binding)])
     returned-thunk))

(define first-thunk (make-thunk))
(define second-thunk (make-thunk))

(first-thunk)

Now, when I'm just inside the body of first-thunk, and trying to
reconstruct "lexical- binding", I need to know what lifted name it got.

There are a bunch of ways to try to do this, but I'm going to take the
most straightforward approach (which came to me after about a day of
thought), which is to expand every lexical binding into a pair of
bindings; one which refers to the bound value (with the same name as the
original binding), and a new, gensym'ed one, which indicates what index
number this binding has received.

2000-06-05

***********

So here's the new format of a full mark:

(make-mark label source bindings)

where label is a symbol, source is a zodiac:parsed, and bindings is an
association list from bindings to values.  Note, however, that every
let-type binding now has _two_ entries in this list.  The first one
supplies the binding's value, and the second one supplies the lifted
name's index.

[ed note.: see note for 2000-09-26]

2000-06-06

***********

How do we guarantee that lifted names do not clash? Well, for
each binding we use the original name, with two numbers appended
to it, separated by zeros; the first one indicates which binding
it is (more than one binding may have the same original name),
and the second one indicates which dynamic occurrence of this
binding it is.

So, for instance, if a program contains one binding named 'foo', and
it's evaluated three times, the third evaluation would result in the
lifted name 'foo0002'.  I personally guarantee that no namespace clashes
can occur in this scheme. Yep.

2000-06-06

***********

Oh.. Well, Matthias prefers a naming scheme whereby all bindings are
assigned sequential numbers, regardless of the binding name. So this
name clash isn't really an issue anymore.

2000-09-09

***********

To handle units, marks must now contain "top-level" (actually,
unit-bound) variables.  For this reason, the datatype for a full mark
must change.  a full mark is now:

(make-full-mark location label bindings)

where location is a zodiac:location
      label    is a symbol,
  and bindings is an association list containing <bindings> and values

a <binding> is either a zodiac:binding (for bound vars), or a slot (for
unit-bound vars in the zodiac:top-level-varref/bind/unit struct).

***********

Ooookay.  We're in Boston now, and I'm rewriting the stepper completely
to work with version 200.  In other words, we're scrapping Zodiac
completely.  This is an interesting SE task, because from a data-driven
design standpoint, the code is starting from zero again; all of my data
have different shapes now.

Another change is that with the demise of DrScheme Jr and the
institution of the static-compilation module mechanism, there's no
longer a need for two separate collections.  I've therefore scrapped
stepper-graphical, and moved everything back into stepper.

Also, the stepper no longer needs to be tightly integrated with DrScheme
itself; it can now be simply a tool.  I've already done the front-end
work to tie in to the new tool interface; I think this stuff is all
done.

So, here's the plan.  The major pain is in the annotater, and that's
what I'm tackling now.  I'm proceeding along an iterative refinement
path; first, I want to get a bare-bones annotation working, without any
macro-reversal (hence source-correlation) stuff.

Bindings.  What's a binding?  It looks to me like the syntax object
representing the binding occurrence of the variable should serve
admirably as a 'binding' for our purposes.

***********

I'm dumping the tracking of the 'never-undefined property.  It was
originally used for two purposes; first, varrefs had to be wrapped with
an undefined check.  Second, varrefs in ankle- and cheap-wrap were not
wrapped if the variables were known never to be undefined. Now, the
undefined check is (at last) inserted by the language's elaborator, so
the first use is obsolete.  The second one is more or less obsolete as
well, because I'm not sure that cheap- or ankle-wrap are ever going to
be used again.

Also, the 'lambda-bound-var' property is going away; in v200, I don't
see a good way to get from a bound variable to its binding, which makes
it more or less impossible to keep track of things by attaching
properties to bindings.  In fact, it doesn't really even make sense to
try and find the binding for an occurrence in v200, because it's not
even known.  Instead, I've just added another recursion argument called
'let-bound-variables", which is basically what the property was anyway.

2002-01-08

***********

Why, for the love of God, do we need to put a wcm around a quote? I can
see how we need one if there's a pre-break there, but otherwise, it
seems totally useless.

Ditto for quote-syntax

2002-01-08

[Later Note: This is preposterous.  Of course I need a wcm there, to
replace an existing one if necessary.  Maybe if it's in non-tail
position...]

***********

Here's a nice optimization I'm not taking advantage of: the application
of all lambda-bound vars doesn't need all those temp vars.  OTOH, this
won't help much with beginner/intermediate, because you never have a
lexical var in the application position.  I suppose you can generalize
this to say that you only need arg-temps for things that are not
lambda-bound vars.  Well, maybe some other day...

2002-01-08

***********

Okay, as much as I hate to admit it, reconstruct is not just getting a
face lift; it's being largely rewritten.  The major change is this: I'm
going to delay macro unwinding until the end.  Toward this end, the
recon (formerly "rectify") procedures will produce syntax objects with
attached properties that record the macro expansions and the primary
origin of the form.  After all reconstruction is done, we go through
again and look for things that need to be rewritten.  This will separate
the macro unwinding from the basic reconstruction of the expression.
Hopefully, at the end we can just use (syntax-object->datum) to discard
all of the side information.

Please, let this work. Yikes.

2002-01-12

*******

There's a problem with the reconstruction of let-values, which only
surfaces in the presence of multiple-values.  This is okay for now,
because beginner and intermediate do not allow multiple values.  The
problem is that if you allow expressions like this --- (let-values ([()
(values)]) 3) --- that is, where there can be an empty set of variables
in a lhs position, you may not be able to tell at runtime what
expression you're in the middle of.  The problem is that when we stop
during the evaluation of a rhs in a let, we figure out which rhs we're
evaluating by which lhs-vars have been changed from their original
values.  Oh, dang.  This is totally broken for letrec's in which the rhs
evaluates to the undefined value.

Well, I guess I'm going to have to fix this the right way, by adding a
counter to every let which is incremented explicitly after the
evaluation of each rhs. Yikes.

********

Ha! Did I actually say "right way?" This is totally the _wrong_ way;
keeping information about the continuation by mutating the store is
guaranteed to fail when continuations are invoked.

2002-06-21

*********

Well, another year has passed.  How swiftly they fly!  Nathan is almost
walking, Alex is almost three, and I'm about to graduate.  But I'd
better get the Intermediate stepper working first.

A note about lifting; I keep looking for the right idiom in which to
code the search for the highlight. In fact, the real problem is my
inability to cleanly express the location of the highlight.  The one
I've settled on as the least egregious is this: a location in a syntax
object is expressed as a list of context records, where each one
contains an index indicating the location of the subterm.  This index
makes coding the search less pleasant than it might otherwise be; right
now, I'm searching by constructing a list of subterms paired with
indices, and then iterating through these.

2003-07-13

**********

Intermediate stepper now working.  I developed a much better way of
specifying the highlight: the reconstruct engine now delivers a syntax
object to the display engine, which allows me to use syntax properties.
Much much better.

2004-01-15

************

A year and a half has passed since I've thought about this file, and I'm
now in the midst of a Google Summer of Code (SoC) grant which is
supposed to get me to support mutation, and make the corresponding
changes to the interface.

A thought I had while walking the California mountainsides (BTW: I've
just graduated, and gotten a job at Cal Poly)--why do I do the
reconstruction from the inside out?  Wouldn't it be much much easier to
do from the outside in?  Feh.

2005-08-02


*************

Well, the dang summer is almost over, and I've still got a long, long
way to go.

The basic change to the model is that instead of storing completed
definitions as pre-formatted s-expressions, I'm now storing them as
2-element lists containing the syntax object associated with the
definition and a 'getter' which returns the value that the binding
refers to.  The actual definition is reformatted for each step.  This is
a bit silly, but it would be easy to cache the definitions along with
the present values if this is actually a performance bottleneck. I
suspect it won't matter a bit.

In the presence of mutation, the existing separators don't make sense,
either.  I'm scrapping them, for the moment.  A nice interface change
would be to separate only the definitions that had changed. For them
moment, they'll all be separated.

The first order of business, after mucking around in the model for some
time to get the flavor of how things will work, is to go and set up the
interface so I can get things running.

*************

Okay, I've "completed" the google project, but there are still things to
wrap up.  Right now I'm working on the highlighting for mutated
bindings, which is inferred from differences in the rendered steps.

So, for instance, if the left-hand-side has (define a 3), and the
right-hand-side has (define a 4), well then we'd better highlight the 3
on the left and the 4 on the right, because this binding was mutated.

Now, this kind of highlighting--reconstructing highlighting from
observed differences, rather than obtaining direct evidence of the
mutation--clearly has some shortcomings.  For instance, concurrent
code... well, concurrent code is all messed up to begin with; a more
interesting problem occurs when you have mutations that share structure.
So, what if a is mutated from (list 3 4) to (list 4 5).  Should the
whole thing be highlighted?  Certainly that's what you'd get from a a
normal reduction semantics.  In some sense, though, highlighting _just_
the 3 and the 4 (and the 4 and the 5) corresponds to a smaller set of
changes that produces the same result.

Another problem that's coming out is the problem of "intermediate"
completed expressions that arise from partially evaluated letrecs (and
all the things that expand into them).  These should also be scanned for
mutation, right?  What about the "future" ones?  There are other
rendering problems with forward mutation in letrecs which I haven't
tackled, as well.  I find myself leaning toward depending on the
"user-source" syntax property.  As I've observed before, though, the
syntax properties form a sort of creeping mush; they don't need to be
explicitly expressed as arguments or return values, and errors of
omission in the syntax properties are hard to catch. A lot can hide in
the "syntax?" contract.

2005-09-21

**************

Time to clean up for v300.  Let's see if we can get begin and begin0
working.

2005-11-14

*************

Okay, it turns out that begin expands into a let-values with empty
bindings, so I'm working on getting this going.  With this addition, the
annotation for 'let' is a complete monster, chewing up a substantial
fraction of the annotation code all by itself.

Also, I've come across a design optimization that improves if & set!,
which is this: there's no reason to have if-temp & set!-temp.  Putting
these inline is a great improvement: it reduces code in the
reconstructor, in the annotator, all over the place.  The caveat: I
haven't finished it yet, so who knows what kind of horrible thing will
crop up.

The architecture change here is that we need a new kind of break that's
like a normal-break (blecch! terrible name!) but carries a value along
with it.  I'm going to call this the normal-break/value break.  Blurrch!

2006-01-12

*************

Begin STILL isn't working.  My last plan, to wrap each 'begin' body with
a mark that indicates the source, is naturally broken because an inner
mark can destroy that one.  The solution is (hopefully) easy; just
eta-expand to prevent the mark from being lost.  Since it's a non-tail
call, this doesn't destroy tail-calling.

**************

Okay, begin now works.  I decided that one of my implicit
invariants--that the source expressions linked to by the marks always be
actual parts of the source--was too restrictive.  I've now introduced a
"fake-exp" which signals an artificially constructed expression.  This
made all my problems go away, and now I'm a happy man.  If only begin0
worked, too...

2006-11-13

**************

Yeah, I think begin0 works now, too... supporting check- forms turned
out to be MUCH harder than I expected.  Don't ask me about lazy scheme.
Or Advanced.  Grr!

2008-05-08

**************

Okay, you can ask me about lazy scheme now. Err, lazy racket. The answer
is simple: Stephen Chang is working on it :). I'm back in here trying to
figure out why I changed to stepper-syntax-property, and I think the
answer is that Back In The Day, it wasn't possible to extract all of the
syntax properties associated with the stepper.  Hmm... actually, maybe I
should leave this alone; do I want programs to be able to mangle their
own stepper marks? No, probably not.

Other changes on the GUI side: the stepper now eagerly precomputes
steps, rather than doing a complex back-and-forth locking; this is less
code, and works better, too.

Nice to be here again...

2010-12-04