965 lines
38 KiB
Plaintext
965 lines
38 KiB
Plaintext
[Ed. Note.: Much of this refers to the Zodiac version of the stepper, which
|
|
I am busily replacing at this very moment. jbc, 2001-12-04]
|
|
|
|
variable references: there are three kinds of variable references:
|
|
1) bound variable refs
|
|
2) unit-bound variable refs
|
|
3) top-level variable refs
|
|
|
|
You might be forgiven for some confusion: these three appear to overlap
|
|
heavily. Here are more accurate defintions for each one:
|
|
|
|
unit-bound variable references are those which occur as the left-hand
|
|
sides of top-level definitions within a unit.
|
|
|
|
bound variable references are those which occur within the scope of a
|
|
lambda, case-lambda, let, let*, letrec, or other form which introduces a
|
|
limited lexical scope. This includes `local', but not the unit-bound
|
|
variables mentioned above.
|
|
|
|
top-level references are the rest of the references.
|
|
|
|
One difference between top-level and bound varrefs are the way that they
|
|
are handled at runtime. Top-level varrefs are looked up in a table; if
|
|
they are not found in this table, a runtime error is signalled. Note
|
|
that this lookup occurs only when the varref is evaluated, not when it
|
|
is first `encountered' (e.g., in the body of a closure). One reason that
|
|
this mechanism is necessary is that a Scheme REPL permits top-level
|
|
references to variables that have not yet been defined.
|
|
|
|
Bound varrefs have a known lexical binding location, and they can be
|
|
looked up directly, rather than going through the indirection of
|
|
checking a table. These variables may be introduced by forms like
|
|
`letrec' or `local', and they may furthermore be used before their
|
|
binding definition has been evaluated. In this case, they have the
|
|
`<undefined>' value. In most language levels, a reference to a variable
|
|
which contains the `<undefined>' value is an error. In such a language
|
|
level, any variable which may have this value must be checked on every
|
|
evaluated reference.
|
|
|
|
So here's the problem: unit-bound varrefs are similar to those inside a
|
|
`local'. Syntactically, their bindings are introduced by `define', and
|
|
their scope extends in both directions. Semantically they are similar to
|
|
bound variables, in that the interpreter can lexically fix the binding
|
|
of the variable. In both of these regards they are similar to the
|
|
bindings in a `local'. However, zodiac does not parse them like those
|
|
in a `local'. Rather, it parses them as `top-level-varref's. Why? I
|
|
forget, and I'm about to ask Matthew yet again. Then I'll record the
|
|
answer here.
|
|
|
|
Now things get a bit more complicated. Top-level varrefs never need to
|
|
be checked for the '<undefined>' value; before they are bound, they have
|
|
no runtime lookup location at all. Bound varrefs and unit varrefs, on
|
|
the other hand, may contain the `<undefined>' value. In particular,
|
|
those bound by letrec, local, and units may contain this value. Others,
|
|
like those bound by lambda, let, and let*, will not. For the first and
|
|
third categories, we do not need to check for the undefined value at
|
|
runtime. Only when we are looking at a bound or unit varref which may
|
|
contain the `<undefined>' value do we need to insert a runtime check.
|
|
|
|
*******
|
|
|
|
Another topic entirely is that of sharing. When a break occurs, the
|
|
stepper reconstructs the state of memory. However, two closures may
|
|
refer to the same binding. For instance,
|
|
|
|
(define-values (setter getter)
|
|
(let ([a '*undefined*])
|
|
(values
|
|
(lambda (x) (set! a x))
|
|
(lambda () a))))
|
|
|
|
If each closure is linked to a record of the form (lambda ()
|
|
values-of-free-vars), there's no way to tell whether the first and
|
|
second closure refer to the same binding of a or not. So in this case,
|
|
we must devise some other technique to detect sharing. A simple one
|
|
suggested by Matthew is to store mutators in the closure record; then,
|
|
sharing can be detected by the old bang-one-and-see-if-the-other-changes
|
|
technique.
|
|
|
|
*********
|
|
|
|
A note about source locations: I'm using the "start" locations of sexps
|
|
(assigned by Zodiac) to uniquely identify those expressions: I don't
|
|
believe there are any instances where two expressions share a start
|
|
location.
|
|
|
|
Later: this is now obsolete: I'm just storing the parsed zodiac
|
|
expressions. Forget all of this source correlation crap. Zodiac does it
|
|
for me.
|
|
|
|
[Ed. Note: this observation turned out to be completely wrong. cf. later
|
|
notes.]
|
|
|
|
*********
|
|
|
|
Robby has a good point: Matthew's technique for detecting gaps in the
|
|
continuation-mark chain (look for applications whose arguments are fully
|
|
evaluated but are still on the list of current marks) depends on the
|
|
assumption that every "jump site" has the jump as its tail action. In
|
|
other words, what about things like "invoke-unit/open", which jumps to
|
|
some code, evaluates it, >then comes back and binds unit values in the
|
|
environment<. In this case, the "invoke-unit/open" continuation will
|
|
not be handed directly to the evaluation of the unit, because work
|
|
remains to be done after the evaluation of the unit's definitions.
|
|
Therefore, it will be impossible to tell when un-annotated code is
|
|
appearing on the stack in uses of "invoke-unit/open." Problem.
|
|
|
|
*********
|
|
|
|
So what the heck does a mark contain for the stepper? it looks like this:
|
|
|
|
(lambda () (list <source-expr> <var-list>))
|
|
|
|
with
|
|
|
|
var-list = (list-of var)
|
|
|
|
and
|
|
|
|
var = (list <val> z:varref)
|
|
|
|
*********
|
|
|
|
Let me say a few words here about the overall structure of the
|
|
annotator/stepper combination. We have a choice when rebuilding the
|
|
source: we can follow the source itself, or we can follow the parsed
|
|
expression emitted by zodiac. If our task is simply to spit out source
|
|
code, then it's clear that we should simply follow the source. However,
|
|
we need to replace certain variables with the values of their bindings
|
|
(in particular, lambda-bound ones). Well, in beginner mode anyway...
|
|
|
|
*******
|
|
|
|
Okay, I'm about to extend the stepper significantly, and I want to do at
|
|
least a little bit of design work first. The concept is this: I want
|
|
the stepper to stop _after_ each reduction, as well as before it. One
|
|
principal difference between the new and old step types is that in the
|
|
new one, the continuation cannot be rectified entirely based upon the
|
|
continuation marks; the value that is produced by the expression in
|
|
question is also needed.
|
|
|
|
Here's a question: can I prove, for the setup I put together, that the
|
|
part of the continuation _outside_ the highlighted region does not
|
|
change? This should be the case; after all, the continuation itself does
|
|
not change.
|
|
|
|
Of course, there are some reductions which do not immediately produce a
|
|
value; procedure applications, and ... uh oh, what about cond and if
|
|
expressions? We want the stepper to use the appropriate "answer" as the
|
|
"result" of the step. So there's some context sensitivity here.
|
|
|
|
Wait, maybe not. It seems like _every_ expression will have to have a
|
|
"stop on entry" step. Further, these types of steps will _not_ have
|
|
values associated with them. Hmmm....
|
|
|
|
Okay, this isn't that hard. Yes, it's true that every expression that
|
|
becomes ... no, it's not obvious that the expression which is
|
|
substituted ... jesus, it's not even always the case that a
|
|
"substitution" occurs in the simplistic sense I'm imagining. Damn, I
|
|
wish my reduction semantics were finished.
|
|
|
|
(Much later): The real issue is that the "stop-on-enter" code is
|
|
inserted based on the surrounding code, and
|
|
|
|
|
|
So, here's the next macro we need to handle: define-struct.
|
|
|
|
|
|
*********
|
|
|
|
Don't forget a test like
|
|
|
|
(cond [blah]
|
|
[else cond [blah] [blah]])
|
|
|
|
|
|
**********
|
|
|
|
Okay, I'm a complete moron. In particular, I threw out all of the
|
|
source correlation code a week ago because I somehow convinced myself
|
|
that the parsed expressions retained references to the read expressions.
|
|
That's not true; all that's kept is a "location" structure, which
|
|
records the file and offset and all that jazz.
|
|
|
|
So I tried to fix that by inserting these source expressions into the
|
|
marks, along with the parsed expressions. This doesn't work because I
|
|
need to find the read expressions for expressions that don't get
|
|
marks... or do I? Yes, I do. In particular, to unparse (define a 3),
|
|
I need to see the read expression to know that it wasn't really
|
|
(define-values (a) (values 3)).
|
|
|
|
Maybe I can add a field to zodiac structures a la maybe-undefined?
|
|
|
|
************
|
|
|
|
That worked great!
|
|
|
|
************
|
|
|
|
Man, there's a lot of shared code in here.
|
|
|
|
************
|
|
|
|
Okay, back to the drawing board on a lot of things.
|
|
|
|
1) Matthias and Robby are of the opinion that the break for an expression
|
|
should be triggered only when that expression becomes the redex. For
|
|
example, the breakpoint for an if expression is triggered _after_ the
|
|
test expression is evaluated.
|
|
|
|
2) I've realized that I need a more general approach in the annotater to
|
|
handle binding constructs other than lambda. In particular, the new
|
|
scheme handles top-level variables differently than lexically bound ones.
|
|
In particular, the mark for an expression contains the value of a
|
|
top-level variable if (1) the variable occurs free in the expression, and
|
|
(2) the expression is on the spine of the current procedure or definition.
|
|
Lexically bound variables are placed in the mark if (1) they occur free in
|
|
the expression, and (2) they are in tail position relative to the innermost
|
|
binding expression for the variable.
|
|
|
|
*** Wait, no. This is crap, because the bodies of lambdas need to store
|
|
all free variables, regardless of whether they're lexically tail w.r.t.
|
|
the binding occurrence. Maybe it really would just be easier to do this
|
|
in two passes. How would this work? One pass would attach the free
|
|
variables to each expression. Then, the variables you must store in the
|
|
mark for an expression are those which (1) occur free and (2) are not
|
|
contained in some lexically enclosing expression. I guess we can use the
|
|
register-client ability of zodiac for this...
|
|
|
|
We're helped out in the lexical variables by the fact that zodiac
|
|
renames all lexically bound variables, so no two bindings have the same
|
|
name. Of course, that's not the case for the special variables inserted
|
|
by the annotator. Most of these ... well, no, all of these will have to
|
|
appear in marks now. The question is whether they'll ever fight with
|
|
each other. In the case of applications, I'm okay, because the only
|
|
expressions which appear in tail ... wait, wait, the only problem that I
|
|
could have here arises when top-level variables have the same names as
|
|
lexically bound ones, and since all of the special ones are lexically
|
|
bound, this is fine.
|
|
|
|
|
|
************
|
|
|
|
I'm taking these comments out of the program file. They just clutter
|
|
things up.
|
|
|
|
; make-debug-info takes a list of variables and an expression and
|
|
; creates a thunk closed over the expression and (if bindings-needed is true)
|
|
; the following information for each variable in kept-vars:
|
|
; 1) the name of the variable (could actually be inferred)
|
|
; 2) the value of the variable
|
|
; 3) a mutator for the variable, if it appears in mutated-vars.
|
|
; (The reason for the third of these is actually that it can be used
|
|
; in the stepper to determine which bindings refer to the same location,
|
|
; as per Matthew's suggestion.)
|
|
;
|
|
; as an optimization:
|
|
; note that the mutators are needed only for the bindings which appear in
|
|
; closures; no location ambiguity can occur in the 'currently-live' bindings,
|
|
; since at most one location can exist for any given stack binding. That is,
|
|
; using the source, I can tell whether variables referenced directly in the
|
|
; continuation chain refer to the same location.
|
|
|
|
; okay, things have changed a bit. For this iteration, I'm simply not going to
|
|
; store mutators. later, I'll add them in.
|
|
|
|
|
|
************
|
|
|
|
Okay, I'm back to the one-pass scheme, and here's how it's going to
|
|
work. Top-level variables are handled differently from lexically bound
|
|
ones. Annotate/inner takes an expression to annotate, and a list of
|
|
variables whose bindings the current expression is in tail position to.
|
|
This list may optionally also hold the symbol 'all, which indicates that
|
|
all variables which occur free should be placed in the mark.
|
|
|
|
|
|
***********
|
|
|
|
Regarding the question: what the heck is this lexically-bound-vars
|
|
argument to annotate-source-expr? The answer is that if we're
|
|
displaying a lambda, we do not have values for the variables whose
|
|
bindings are the arguments to the lambda. For instance, suppose we
|
|
have:
|
|
|
|
(define my-top-level 13)
|
|
|
|
(define my-closure
|
|
(lambda (x) (x top-level)))
|
|
|
|
When we're displaying my-closure, we better not try to find a value for
|
|
x when reconstructing the body, as there isn't one.
|
|
|
|
*************
|
|
|
|
This may come back to haunt me: the temporary variables I'm introducing
|
|
for applications and 'if's are funny: they have no bindings. They have
|
|
no orig-name's. They _must_ be expanded, always. This may be a problem
|
|
when I stop displaying the values of lambda-bound variables.
|
|
|
|
***************
|
|
|
|
currently on the stack:
|
|
|
|
yank all of that 'comes-from-blah' crap if read->raw works.
|
|
|
|
*************
|
|
|
|
annotater philosophy: don't look at the source; just expand based on the
|
|
parsed expression. The information you need to reconstruct the
|
|
|
|
*************
|
|
|
|
for savings, I could elide the guard-marks on all but the top level.
|
|
|
|
***********
|
|
|
|
months later; October 99.
|
|
|
|
major reorganization, along a model-view-controller philosophy. Here's
|
|
how it works:
|
|
|
|
The view and controller (for the regular stepper) are combined in a gui
|
|
unit. This unit takes a text%, handles all gui stuff, and invokes the
|
|
model unit (one for each stepping).
|
|
|
|
The model unit is a compound unit. It consists of the annotater, the
|
|
reconstructor, and the model unit itself.
|
|
|
|
Gee whiz; there's so much stuff I haven't talked about. Like for
|
|
instance the fact that the stepper now has before and after steps. The
|
|
point of this reorganization is to permit a natural test suite. Jesus,
|
|
that's been a long time coming. At some point, I'm also hoping to
|
|
combine the stepper into the main DrScheme frame.
|
|
|
|
Oh yes, another major change was that evaluation is now strictly on a
|
|
one- expression-at-a-time basis. The read, parse, and step are now done
|
|
indiv- idually for each expression. This has the ancillary benefit that
|
|
there's no longer any need to reconstruct _all_ of the old expressions
|
|
at every step.
|
|
|
|
************
|
|
|
|
You know, I should never have started that ******** divider. I have no
|
|
idea how many stars are supposed to be there. Oh well.
|
|
|
|
************
|
|
|
|
The version for DrS-101 is out, and I've restructured the stepper into a
|
|
"model/view/controller" architecture, primarily to ease testing. Of
|
|
course, I haven't actually written the tester yet. So now, the view and
|
|
controller are combined in stepper-view-controller.ss, and the model
|
|
(instantiated once per step-process) is in stepper-model.ss. In fact,
|
|
the view-controller is also instantiated once per step-process, so I'm
|
|
not utilizing the division in that way, but the tester will definitely
|
|
want to instantiate the model repeatedly.
|
|
|
|
***********
|
|
|
|
I also want to comment a little bit on some severe ugliness regarding
|
|
pretty- printing. The real problem is how to use the existing
|
|
pretty-print code, while still having enough control to highlight in the
|
|
right locations.
|
|
|
|
Okay, let me explain this one step at a time.
|
|
|
|
The way the pretty-printer currently works is this: there are four hooks
|
|
into the pretty-printing process. The first one is used to determine
|
|
the width of an element. The result of this procedure is used to decide
|
|
whether a line break is necessary. However, this hook is _also_ used to
|
|
determine whether or not the pretty-printer will try to print the string
|
|
itself or hand off responsibility to the display-handler hook. In other
|
|
words, if the width- hook procedure returns a non-false value, then the
|
|
display-handler will be called to print the actual string. The other
|
|
pair of hook procedures is first, a procedure which is called _before_
|
|
display of any subexpression, and one which is called _after_ display of
|
|
any subexpression.
|
|
|
|
So how does the stepper use this to do its work? Well, the stepper has
|
|
two tricky tasks to accomplish. First, it must highlight some
|
|
subexpression. Second, it must manually insert elements (i.e. images)
|
|
which the pretty-printer does not handle.
|
|
|
|
Let's talk about images first. In order to display images, the
|
|
width-hook procedure detects images and (if one is encountered) returns
|
|
a width explicitly. (Currently that width is always one, which can lead
|
|
to display errors, but let's leave that for later.) Remember, whenever
|
|
the width returned by this hook is non-false, the display handler will
|
|
be called to insert the object. That's perfect: the display hander
|
|
inserts the image just fine.
|
|
|
|
One down, one to go.
|
|
|
|
The stepper needs to detect the beginning of the (let's call it the)
|
|
redex. The obvious way to do this is (almost) the right way: the
|
|
before-printing handler checks to see whether the element about to be
|
|
printed is the redex (by an eq?-test). If so, it sets the beginning of
|
|
the highlight region. A corresponding test determines the end of the
|
|
highlight region. When the pretty-printing is complete, we highlight
|
|
the desired region. Fine.
|
|
|
|
BUT, sometimes we want to highlight things like numbers and symbols; in
|
|
other words, non-heap values. For instance, suppose I tell you that the
|
|
expression that we're printing is (if #t #t #t) and that you're supposed
|
|
to be highlight- ing the #t. Well, I can't tell which of the #t's you
|
|
want to highlight. So this isn't enough information.
|
|
|
|
To solve this problem, the result of the reconstructor is split up into
|
|
two pieces: the reconstructed stuff outside the box, with a special
|
|
gensym occurring where the redex should be, and a separate expression
|
|
containing the redex. Now at least the displayer has enough information
|
|
to do its job.
|
|
|
|
Now, what happens is that when the width-hook runs into the special
|
|
gensym, it knows that it must insert the redex. Well, that's fine, but
|
|
remember, if this procedure wants to take control of the printing
|
|
process, it must do so by returning the width of the printed object, and
|
|
then this object must be printed by the display-hook. The problem here
|
|
is that neither of these procedures have the faintest idea about
|
|
line-breaks; that's the pretty- printer's job. In other words, this
|
|
solution only works for things (like numbers, symbols and booleans)
|
|
which cannot be split across lines. What do we do?
|
|
|
|
Well, the solution is ugly. Remember, the only reason we had to resort
|
|
to this baroque solution in the first place is that values like numbers,
|
|
symbols, and booleans couldn't be identified uniquely by eq?. So we
|
|
take a two- pronged approach. For non-confusable values, we insert them
|
|
in place of the gensym before doing the printing. For confusable
|
|
values, we leave the placeholder in and take control of the printing
|
|
process manually.
|
|
|
|
In other words, the _only_ reason this solution works is because of the
|
|
chance overlap between confusable values and non-breakable values. To
|
|
be more precise, it just so happens that all confusable values are non-
|
|
line-breakable.
|
|
|
|
Lucky.
|
|
|
|
And Ugly.
|
|
|
|
*****
|
|
|
|
January, 2000
|
|
|
|
I'm working on the debugger, now, and in particular extending the
|
|
annotater to handle all of the Zodiac forms. Let and Letrec turn out to
|
|
be quite ugly. I'm still a little unsure about certain aspects of
|
|
variable references, like for example whether or not they stay renamed,
|
|
or whether they return to their original names. [ed. note: they get new
|
|
uninterned symbols that print like their original names]
|
|
|
|
But that's not what I'm here to talk about. No, the topic of the day is
|
|
'floating variables.' A floating variable is one whose value must be
|
|
captured in a continuation mark even though it doesn't occur free in the
|
|
expression that the wcm wraps. Let me give an example:
|
|
|
|
(unit/sig some-sig^
|
|
(import)
|
|
|
|
(define a 13)
|
|
(define b (wcm <must grab a> (+ 3 4))))
|
|
|
|
In this case, the continuation-mark must hold the value of a, even
|
|
though a does not occur free in the rhs of b's definition. Floating
|
|
variables are stored in a parameter of annotate/inner. In other words,
|
|
they propagate downward. Furthermore, they're subject to the same
|
|
potential elision as all other variables; you only need to store the
|
|
ones which are also contained in the set tail-bound. Also note that
|
|
(thank God) Zodiac standardizes names apart, so we don't need to worry
|
|
about duplications. Also note that floating variables may only be
|
|
bound-varrefs.
|
|
|
|
********
|
|
|
|
Okay, well that doesn't work at all; dynamic scope blows it away
|
|
completely. For instance, imagine the following unit:
|
|
|
|
(unit/sig some-sig^
|
|
(import sig-that-includes-c^)
|
|
|
|
(define a 13)
|
|
(define b (c)))
|
|
|
|
Now, during the execution of c, there's no mark on the stack which holds
|
|
the bindings of a. DUH! I can't believe I didn't think of this before.
|
|
Okay, one possible solution for this would be to use _different keys_
|
|
for the marks, so that a mark on the unit-evaluation-continuaiton could
|
|
be retained.
|
|
|
|
|
|
*********
|
|
|
|
|
|
Okay, time to do units. Compound units are dead easy. Just wrap them
|
|
in a wcm that captures all free vars. No problemo. Normal units are
|
|
more tricky, because of their scoping rules. Here's my canonical
|
|
translation:
|
|
|
|
(unit
|
|
(import vars)
|
|
(export vars)
|
|
|
|
(define a a-exp)
|
|
|
|
b
|
|
|
|
(define c c-exp)
|
|
|
|
d
|
|
|
|
etc.)
|
|
|
|
... goes to ...
|
|
|
|
(unit
|
|
(import vars)
|
|
(export vars)
|
|
|
|
(wcm blah ; including imported vars
|
|
(begin
|
|
(set! a a-exp)
|
|
b
|
|
(set! c c-exp)
|
|
d))
|
|
|
|
(define a a)
|
|
(define c c)
|
|
...)
|
|
|
|
************
|
|
|
|
Well, I still haven't written the code to annotate units, so it's a damn
|
|
good thing I wrote down the transformation. I'm here today (thank you
|
|
very much) to talk about annotation schemes.
|
|
|
|
I just (okay, a month ago --- it's now 2000-05-23) folded aries into the
|
|
stepper. the upshot of this is that aries now supports two different
|
|
annotation modes: "cheap-wrap," which is what aries used to do, and the
|
|
regular annotation, used for the algebraic stepper.
|
|
|
|
However, I'm beginning to see a need for a third annotation, to be used
|
|
for (non- algebraic) debugging. In particular, much of the bulk
|
|
involved in annotating the program source is due to the strict algebraic
|
|
nature of the stepper. For instance, I'm now annotating lets. The
|
|
actual step taken by the let is after the evaluation of all bindings.
|
|
So we need a break there. However, the body expression is _also_ going
|
|
to have a mark and a break around it, for the "result-break" of the let.
|
|
I thought I could leave out the outer break, but it doesn't work.
|
|
Actually, maybe I could leave out the inner one. Gee whiz. This stuff
|
|
is really complicated.
|
|
|
|
*************
|
|
|
|
Okay, well, I figured all that stuff out, but now I've got to
|
|
restructure the reconstructor to handle lifting---PRE-lifting, that
|
|
is---on let/letrec/local. In particular, the reconstruct-inner function
|
|
will now return four things: the free bindings, the reconstructed expr,
|
|
the "before" definitions, and the "after" definitions. These before and
|
|
after definitions are wrapped around the current set of generated
|
|
definitions. Case in point; I'm about to execute the (+ 7 8) in the
|
|
following expression:
|
|
|
|
(let ([a 4]
|
|
[b (let ([h 3]
|
|
[i (+ 7 8)]
|
|
[j 9])
|
|
(+ h i j))]
|
|
[c 19])
|
|
(+ a b c))
|
|
|
|
How do we reconstruct this? Well, first we reconstruct the (+ 7 8) itself, that's
|
|
easy. Then, we encounter a let. The return value of this will be the _before_
|
|
expressions:
|
|
(define ~h~0 3)
|
|
the _after_expressions:
|
|
(define ~i~0 (+ 7 8))
|
|
(define ~j~0 9)
|
|
and the reconstructed expression:
|
|
(+ ~h~0 ~i~0 ~j~0)
|
|
|
|
Now, we recur, using the reconstructed expression. The next step outward is _also_
|
|
a let, so we get the following before expressions:
|
|
(define ~a~0 4)
|
|
the following after expressions:
|
|
(define ~b~0 (+ ~h~0 ~i~0 ~j~0)) <---here is where the reconstructed expr appears
|
|
(define ~c~0 19)
|
|
and the reconstructed expression:
|
|
(+ ~a~0 ~b~0 ~c~0)
|
|
|
|
So then, the final assembly occurs when the "before" expressions are
|
|
slapped together, last first, then the "after" expressions, first first,
|
|
and then whatever reconstructed expression is left over.
|
|
|
|
Ugh.
|
|
|
|
***********
|
|
|
|
Wow. more complications. Here's the new problem. Let's say I have an
|
|
expression like this:
|
|
|
|
(define (make-thunk)
|
|
(let ([lexical-binding 14]
|
|
[returned-thunk (lambda () lexical-binding)])
|
|
returned-thunk))
|
|
|
|
(define first-thunk (make-thunk))
|
|
(define second-thunk (make-thunk))
|
|
|
|
(first-thunk)
|
|
|
|
Now, when I'm just inside the body of first-thunk, and trying to
|
|
reconstruct "lexical- binding", I need to know what lifted name it got.
|
|
|
|
There are a bunch of ways to try to do this, but I'm going to take the
|
|
most straightforward approach (which came to me after about a day of
|
|
thought), which is to expand every lexical binding into a pair of
|
|
bindings; one which refers to the bound value (with the same name as the
|
|
original binding), and a new, gensym'ed one, which indicates what index
|
|
number this binding has received.
|
|
|
|
2000-06-05
|
|
|
|
***********
|
|
|
|
So here's the new format of a full mark:
|
|
|
|
(make-mark label source bindings)
|
|
|
|
where label is a symbol, source is a zodiac:parsed, and bindings is an
|
|
association list from bindings to values. Note, however, that every
|
|
let-type binding now has _two_ entries in this list. The first one
|
|
supplies the binding's value, and the second one supplies the lifted
|
|
name's index.
|
|
|
|
[ed note.: see note for 2000-09-26]
|
|
|
|
2000-06-06
|
|
|
|
***********
|
|
|
|
How do we guarantee that lifted names do not clash? Well, for
|
|
each binding we use the original name, with two numbers appended
|
|
to it, separated by zeros; the first one indicates which binding
|
|
it is (more than one binding may have the same original name),
|
|
and the second one indicates which dynamic occurrence of this
|
|
binding it is.
|
|
|
|
So, for instance, if a program contains one binding named 'foo', and
|
|
it's evaluated three times, the third evaluation would result in the
|
|
lifted name 'foo0002'. I personally guarantee that no namespace clashes
|
|
can occur in this scheme. Yep.
|
|
|
|
2000-06-06
|
|
|
|
***********
|
|
|
|
Oh.. Well, Matthias prefers a naming scheme whereby all bindings are
|
|
assigned sequential numbers, regardless of the binding name. So this
|
|
name clash isn't really an issue anymore.
|
|
|
|
2000-09-09
|
|
|
|
***********
|
|
|
|
To handle units, marks must now contain "top-level" (actually,
|
|
unit-bound) variables. For this reason, the datatype for a full mark
|
|
must change. a full mark is now:
|
|
|
|
(make-full-mark location label bindings)
|
|
|
|
where location is a zodiac:location
|
|
label is a symbol,
|
|
and bindings is an association list containing <bindings> and values
|
|
|
|
a <binding> is either a zodiac:binding (for bound vars), or a slot (for
|
|
unit-bound vars in the zodiac:top-level-varref/bind/unit struct).
|
|
|
|
***********
|
|
|
|
Ooookay. We're in Boston now, and I'm rewriting the stepper completely
|
|
to work with version 200. In other words, we're scrapping Zodiac
|
|
completely. This is an interesting SE task, because from a data-driven
|
|
design standpoint, the code is starting from zero again; all of my data
|
|
have different shapes now.
|
|
|
|
Another change is that with the demise of DrScheme Jr and the
|
|
institution of the static-compilation module mechanism, there's no
|
|
longer a need for two separate collections. I've therefore scrapped
|
|
stepper-graphical, and moved everything back into stepper.
|
|
|
|
Also, the stepper no longer needs to be tightly integrated with DrScheme
|
|
itself; it can now be simply a tool. I've already done the front-end
|
|
work to tie in to the new tool interface; I think this stuff is all
|
|
done.
|
|
|
|
So, here's the plan. The major pain is in the annotater, and that's
|
|
what I'm tackling now. I'm proceeding along an iterative refinement
|
|
path; first, I want to get a bare-bones annotation working, without any
|
|
macro-reversal (hence source-correlation) stuff.
|
|
|
|
Bindings. What's a binding? It looks to me like the syntax object
|
|
representing the binding occurrence of the variable should serve
|
|
admirably as a 'binding' for our purposes.
|
|
|
|
***********
|
|
|
|
I'm dumping the tracking of the 'never-undefined property. It was
|
|
originally used for two purposes; first, varrefs had to be wrapped with
|
|
an undefined check. Second, varrefs in ankle- and cheap-wrap were not
|
|
wrapped if the variables were known never to be undefined. Now, the
|
|
undefined check is (at last) inserted by the language's elaborator, so
|
|
the first use is obsolete. The second one is more or less obsolete as
|
|
well, because I'm not sure that cheap- or ankle-wrap are ever going to
|
|
be used again.
|
|
|
|
Also, the 'lambda-bound-var' property is going away; in v200, I don't
|
|
see a good way to get from a bound variable to its binding, which makes
|
|
it more or less impossible to keep track of things by attaching
|
|
properties to bindings. In fact, it doesn't really even make sense to
|
|
try and find the binding for an occurrence in v200, because it's not
|
|
even known. Instead, I've just added another recursion argument called
|
|
'let-bound-variables", which is basically what the property was anyway.
|
|
|
|
2002-01-08
|
|
|
|
***********
|
|
|
|
Why, for the love of God, do we need to put a wcm around a quote? I can
|
|
see how we need one if there's a pre-break there, but otherwise, it
|
|
seems totally useless.
|
|
|
|
Ditto for quote-syntax
|
|
|
|
2002-01-08
|
|
|
|
[Later Note: This is preposterous. Of course I need a wcm there, to
|
|
replace an existing one if necessary. Maybe if it's in non-tail
|
|
position...]
|
|
|
|
***********
|
|
|
|
Here's a nice optimization I'm not taking advantage of: the application
|
|
of all lambda-bound vars doesn't need all those temp vars. OTOH, this
|
|
won't help much with beginner/intermediate, because you never have a
|
|
lexical var in the application position. I suppose you can generalize
|
|
this to say that you only need arg-temps for things that are not
|
|
lambda-bound vars. Well, maybe some other day...
|
|
|
|
2002-01-08
|
|
|
|
***********
|
|
|
|
Okay, as much as I hate to admit it, reconstruct is not just getting a
|
|
face lift; it's being largely rewritten. The major change is this: I'm
|
|
going to delay macro unwinding until the end. Toward this end, the
|
|
recon (formerly "rectify") procedures will produce syntax objects with
|
|
attached properties that record the macro expansions and the primary
|
|
origin of the form. After all reconstruction is done, we go through
|
|
again and look for things that need to be rewritten. This will separate
|
|
the macro unwinding from the basic reconstruction of the expression.
|
|
Hopefully, at the end we can just use (syntax-object->datum) to discard
|
|
all of the side information.
|
|
|
|
Please, let this work. Yikes.
|
|
|
|
2002-01-12
|
|
|
|
*******
|
|
|
|
There's a problem with the reconstruction of let-values, which only
|
|
surfaces in the presence of multiple-values. This is okay for now,
|
|
because beginner and intermediate do not allow multiple values. The
|
|
problem is that if you allow expressions like this --- (let-values ([()
|
|
(values)]) 3) --- that is, where there can be an empty set of variables
|
|
in a lhs position, you may not be able to tell at runtime what
|
|
expression you're in the middle of. The problem is that when we stop
|
|
during the evaluation of a rhs in a let, we figure out which rhs we're
|
|
evaluating by which lhs-vars have been changed from their original
|
|
values. Oh, dang. This is totally broken for letrec's in which the rhs
|
|
evaluates to the undefined value.
|
|
|
|
Well, I guess I'm going to have to fix this the right way, by adding a
|
|
counter to every let which is incremented explicitly after the
|
|
evaluation of each rhs. Yikes.
|
|
|
|
********
|
|
|
|
Ha! Did I actually say "right way?" This is totally the _wrong_ way;
|
|
keeping information about the continuation by mutating the store is
|
|
guaranteed to fail when continuations are invoked.
|
|
|
|
2002-06-21
|
|
|
|
*********
|
|
|
|
Well, another year has passed. How swiftly they fly! Nathan is almost
|
|
walking, Alex is almost three, and I'm about to graduate. But I'd
|
|
better get the Intermediate stepper working first.
|
|
|
|
A note about lifting; I keep looking for the right idiom in which to
|
|
code the search for the highlight. In fact, the real problem is my
|
|
inability to cleanly express the location of the highlight. The one
|
|
I've settled on as the least egregious is this: a location in a syntax
|
|
object is expressed as a list of context records, where each one
|
|
contains an index indicating the location of the subterm. This index
|
|
makes coding the search less pleasant than it might otherwise be; right
|
|
now, I'm searching by constructing a list of subterms paired with
|
|
indices, and then iterating through these.
|
|
|
|
2003-07-13
|
|
|
|
**********
|
|
|
|
Intermediate stepper now working. I developed a much better way of
|
|
specifying the highlight: the reconstruct engine now delivers a syntax
|
|
object to the display engine, which allows me to use syntax properties.
|
|
Much much better.
|
|
|
|
2004-01-15
|
|
|
|
************
|
|
|
|
A year and a half has passed since I've thought about this file, and I'm
|
|
now in the midst of a Google Summer of Code (SoC) grant which is
|
|
supposed to get me to support mutation, and make the corresponding
|
|
changes to the interface.
|
|
|
|
A thought I had while walking the California mountainsides (BTW: I've
|
|
just graduated, and gotten a job at Cal Poly)--why do I do the
|
|
reconstruction from the inside out? Wouldn't it be much much easier to
|
|
do from the outside in? Feh.
|
|
|
|
2005-08-02
|
|
|
|
|
|
*************
|
|
|
|
Well, the dang summer is almost over, and I've still got a long, long
|
|
way to go.
|
|
|
|
The basic change to the model is that instead of storing completed
|
|
definitions as pre-formatted s-expressions, I'm now storing them as
|
|
2-element lists containing the syntax object associated with the
|
|
definition and a 'getter' which returns the value that the binding
|
|
refers to. The actual definition is reformatted for each step. This is
|
|
a bit silly, but it would be easy to cache the definitions along with
|
|
the present values if this is actually a performance bottleneck. I
|
|
suspect it won't matter a bit.
|
|
|
|
In the presence of mutation, the existing separators don't make sense,
|
|
either. I'm scrapping them, for the moment. A nice interface change
|
|
would be to separate only the definitions that had changed. For them
|
|
moment, they'll all be separated.
|
|
|
|
The first order of business, after mucking around in the model for some
|
|
time to get the flavor of how things will work, is to go and set up the
|
|
interface so I can get things running.
|
|
|
|
*************
|
|
|
|
Okay, I've "completed" the google project, but there are still things to
|
|
wrap up. Right now I'm working on the highlighting for mutated
|
|
bindings, which is inferred from differences in the rendered steps.
|
|
|
|
So, for instance, if the left-hand-side has (define a 3), and the
|
|
right-hand-side has (define a 4), well then we'd better highlight the 3
|
|
on the left and the 4 on the right, because this binding was mutated.
|
|
|
|
Now, this kind of highlighting--reconstructing highlighting from
|
|
observed differences, rather than obtaining direct evidence of the
|
|
mutation--clearly has some shortcomings. For instance, concurrent
|
|
code... well, concurrent code is all messed up to begin with; a more
|
|
interesting problem occurs when you have mutations that share structure.
|
|
So, what if a is mutated from (list 3 4) to (list 4 5). Should the
|
|
whole thing be highlighted? Certainly that's what you'd get from a a
|
|
normal reduction semantics. In some sense, though, highlighting _just_
|
|
the 3 and the 4 (and the 4 and the 5) corresponds to a smaller set of
|
|
changes that produces the same result.
|
|
|
|
Another problem that's coming out is the problem of "intermediate"
|
|
completed expressions that arise from partially evaluated letrecs (and
|
|
all the things that expand into them). These should also be scanned for
|
|
mutation, right? What about the "future" ones? There are other
|
|
rendering problems with forward mutation in letrecs which I haven't
|
|
tackled, as well. I find myself leaning toward depending on the
|
|
"user-source" syntax property. As I've observed before, though, the
|
|
syntax properties form a sort of creeping mush; they don't need to be
|
|
explicitly expressed as arguments or return values, and errors of
|
|
omission in the syntax properties are hard to catch. A lot can hide in
|
|
the "syntax?" contract.
|
|
|
|
2005-09-21
|
|
|
|
**************
|
|
|
|
Time to clean up for v300. Let's see if we can get begin and begin0
|
|
working.
|
|
|
|
2005-11-14
|
|
|
|
*************
|
|
|
|
Okay, it turns out that begin expands into a let-values with empty
|
|
bindings, so I'm working on getting this going. With this addition, the
|
|
annotation for 'let' is a complete monster, chewing up a substantial
|
|
fraction of the annotation code all by itself.
|
|
|
|
Also, I've come across a design optimization that improves if & set!,
|
|
which is this: there's no reason to have if-temp & set!-temp. Putting
|
|
these inline is a great improvement: it reduces code in the
|
|
reconstructor, in the annotator, all over the place. The caveat: I
|
|
haven't finished it yet, so who knows what kind of horrible thing will
|
|
crop up.
|
|
|
|
The architecture change here is that we need a new kind of break that's
|
|
like a normal-break (blecch! terrible name!) but carries a value along
|
|
with it. I'm going to call this the normal-break/value break. Blurrch!
|
|
|
|
2006-01-12
|
|
|
|
*************
|
|
|
|
Begin STILL isn't working. My last plan, to wrap each 'begin' body with
|
|
a mark that indicates the source, is naturally broken because an inner
|
|
mark can destroy that one. The solution is (hopefully) easy; just
|
|
eta-expand to prevent the mark from being lost. Since it's a non-tail
|
|
call, this doesn't destroy tail-calling.
|
|
|
|
**************
|
|
|
|
Okay, begin now works. I decided that one of my implicit
|
|
invariants--that the source expressions linked to by the marks always be
|
|
actual parts of the source--was too restrictive. I've now introduced a
|
|
"fake-exp" which signals an artificially constructed expression. This
|
|
made all my problems go away, and now I'm a happy man. If only begin0
|
|
worked, too...
|
|
|
|
2006-11-13
|
|
|
|
**************
|
|
|
|
Yeah, I think begin0 works now, too... supporting check- forms turned
|
|
out to be MUCH harder than I expected. Don't ask me about lazy scheme.
|
|
Or Advanced. Grr!
|
|
|
|
2008-05-08
|
|
|
|
**************
|
|
|
|
Okay, you can ask me about lazy scheme now. Err, lazy racket. The answer
|
|
is simple: Stephen Chang is working on it :). I'm back in here trying to
|
|
figure out why I changed to stepper-syntax-property, and I think the
|
|
answer is that Back In The Day, it wasn't possible to extract all of the
|
|
syntax properties associated with the stepper. Hmm... actually, maybe I
|
|
should leave this alone; do I want programs to be able to mangle their
|
|
own stepper marks? No, probably not.
|
|
|
|
Other changes on the GUI side: the stepper now eagerly precomputes
|
|
steps, rather than doing a complex back-and-forth locking; this is less
|
|
code, and works better, too.
|
|
|
|
Nice to be here again...
|
|
|
|
2010-12-04
|