890 lines
36 KiB
Plaintext
890 lines
36 KiB
Plaintext
[Ed. Note.: Much of this refers to the Zodiac version of the stepper, which
|
|
I am busily replacing at this very moment. jbc, 2001-12-04]
|
|
|
|
variable references: there are three kinds of variable references:
|
|
1) bound variable refs
|
|
2) unit-bound variable refs
|
|
3) top-level variable refs
|
|
|
|
You might be forgiven for some confusion: these three appear to overlap
|
|
heavily. Here are more accurate defintions for each one:
|
|
|
|
unit-bound variable references are those which occur as the left-hand sides of
|
|
top-level definitions within a unit.
|
|
|
|
bound variable references are those which occur within the scope of a
|
|
lambda, case-lambda, let, let*, letrec, or other form which introduces a
|
|
limited lexical scope. This includes `local', but not the unit-bound
|
|
variables mentioned above.
|
|
|
|
top-level references are the rest of the references.
|
|
|
|
One difference between top-level and bound varrefs are the way that they
|
|
are handled at runtime. Top-level varrefs are looked up in a table; if
|
|
they are not found in this table, a runtime error is signalled. Note that
|
|
this lookup occurs only when the varref is evaluated, not when it is first
|
|
`encountered' (e.g., in the body of a closure). One reason that this
|
|
mechanism is necessary is that a Scheme REPL permits top-level references
|
|
to variables that have not yet been defined.
|
|
|
|
Bound varrefs have a known lexical binding location, and they can be looked
|
|
up directly, rather than going through the indirection of checking a table.
|
|
These variables may be introduced by forms like `letrec' or `local', and
|
|
they may furthermore be used before their binding definition has been
|
|
evaluated. In this case, they have the `<undefined>' value. In most
|
|
language levels, a reference to a variable which contains the `<undefined>'
|
|
value is an error. In such a language level, any variable which may have
|
|
this value must be checked on every evaluated reference.
|
|
|
|
So here's the problem: unit-bound varrefs are similar to those inside a
|
|
`local'. Syntactically, their bindings are introduced by `define', and their
|
|
scope extends in both directions. Semantically they are similar to
|
|
bound variables, in that the interpreter can lexically fix the binding of
|
|
the variable. In both of these regards they are similar to the bindings
|
|
in a `local'. However, zodiac does not parse them like those in a
|
|
`local'. Rather, it parses them as `top-level-varref's. Why? I forget,
|
|
and I'm about to ask Matthew yet again. Then I'll record the answer here.
|
|
|
|
Now things get a bit more complicated. Top-level varrefs never need to be
|
|
checked for the '<undefined>' value; before they are bound, they have no
|
|
runtime lookup location at all. Bound varrefs and unit varrefs, on the
|
|
other hand, may contain the `<undefined>' value. In particular, those
|
|
bound by letrec, local, and units may contain this value. Others, like
|
|
those bound by lambda, let, and let*, will not. For the first and third
|
|
categories, we do not need to check for the undefined value at runtime.
|
|
Only when we are looking at a bound or unit varref which may contain the
|
|
`<undefined>' value do we need to insert a runtime check.
|
|
|
|
*******
|
|
|
|
Another topic entirely is that of sharing. When a break occurs, the
|
|
stepper reconstructs the state of memory. However, two closures may refer
|
|
to the same binding. For instance,
|
|
|
|
(define-values (setter getter)
|
|
(let ([a '*undefined*])
|
|
(values
|
|
(lambda (x) (set! a x))
|
|
(lambda () a))))
|
|
|
|
If each closure is linked to a record of the form (lambda ()
|
|
values-of-free-vars), there's no way to tell whether the first and second
|
|
closure refer to the same binding of a or not. So in this case, we must
|
|
devise some other technique to detect sharing. A simple one suggested by
|
|
Matthew is to store mutators in the closure record; then, sharing can be
|
|
detected by the old bang-one-and-see-if-the-other-changes technique.
|
|
|
|
*********
|
|
|
|
A note about source locations: I'm using the "start" locations of sexps
|
|
(assigned by Zodiac) to uniquely identify those expressions: I don't
|
|
believe there are any instances where two expressions share a start
|
|
location.
|
|
|
|
Later: this is now obsolete: I'm just storing the parsed zodiac
|
|
expressions. Forget all of this source correlation crap. Zodiac does it
|
|
for me.
|
|
|
|
[Ed. Note: this observation turned out to be completely wrong. cf. later
|
|
notes.]
|
|
|
|
*********
|
|
|
|
Robby has a good point: Matthew's technique for detecting gaps in the
|
|
continuation-mark chain (look for applications whose arguments are fully
|
|
evaluated but are still on the list of current marks) depends on the
|
|
assumption that every "jump site" has the jump as its tail action. In
|
|
other words, what about things like "invoke-unit/open", which jumps to some
|
|
code, evaluates it, >then comes back and binds unit values in the
|
|
environment<. In this case, the "invoke-unit/open" continuation will not
|
|
be handed directly to the evaluation of the unit, because work remains to
|
|
be done after the evaluation of the unit's definitions. Therefore, it will
|
|
be impossible to tell when un-annotated code is appearing on the stack in
|
|
uses of "invoke-unit/open." Problem.
|
|
|
|
*********
|
|
|
|
So what the heck does a mark contain for the stepper? it looks like this:
|
|
|
|
(lambda () (list <source-expr> <var-list>))
|
|
|
|
with
|
|
|
|
var-list = (list-of var)
|
|
|
|
and
|
|
|
|
var = (list <val> z:varref)
|
|
|
|
*********
|
|
|
|
Let me say a few words here about the overall structure of the
|
|
annotator/stepper combination. We have a choice when rebuilding the source:
|
|
we can follow the source itself, or we can follow the parsed expression
|
|
emitted by zodiac. If our task is simply to spit out source code, then
|
|
it's clear that we should simply follow the source. However, we need to
|
|
replace certain variables with the values of their bindings (in
|
|
particular, lambda-bound ones). Well, in beginner mode anyway...
|
|
|
|
*******
|
|
|
|
Okay, I'm about to extend the stepper significantly, and I want to do at
|
|
least a little bit of design work first. The concept is this: I want the
|
|
stepper to stop _after_ each reduction, as well as before it. One principal
|
|
difference between the new and old step types is that in the new one,
|
|
the continuation cannot be rectified entirely based upon the continuation
|
|
marks; the value that is produced by the expression in question is also
|
|
needed.
|
|
|
|
Here's a question: can I prove, for the setup I put together, that the part
|
|
of the continuation _outside_ the highlighted region does not change? This
|
|
should be the case; after all, the continuation itself does not change.
|
|
|
|
Of course, there are some reductions which do not immediately produce a value;
|
|
procedure applications, and ... uh oh, what about cond and if expressions?
|
|
We want the stepper to use the appropriate "answer" as the "result" of
|
|
the step. So there's some context sensitivity here.
|
|
|
|
Wait, maybe not. It seems like _every_ expression will have to have a "stop
|
|
on entry" step. Further, these types of steps will _not_ have values associated
|
|
with them. Hmmm....
|
|
|
|
Okay, this isn't that hard. Yes, it's true that every expression that becomes
|
|
... no, it's not obvious that the expression which is substituted ... jesus,
|
|
it's not even always the case that a "substitution" occurs in the simplistic
|
|
sense I'm imagining. Damn, I wish my reduction semantics were finished.
|
|
|
|
(Much later): The real issue is that the "stop-on-enter" code is inserted based
|
|
on the surrounding code, and
|
|
|
|
|
|
So, here's the next macro we need to handle: define-struct.
|
|
|
|
|
|
*********
|
|
|
|
Don't forget a test like
|
|
|
|
(cond [blah]
|
|
[else cond [blah] [blah]])
|
|
|
|
|
|
**********
|
|
|
|
Okay, I'm a complete moron. In particular, I threw out all of the source
|
|
correlation code a week ago because I somehow convinced myself that the
|
|
parsed expressions retained references to the read expressions. That's
|
|
not true; all that's kept is a "location" structure, which records the file
|
|
and offset and all that jazz.
|
|
|
|
So I tried to fix that by inserting these source expressions into the
|
|
marks, along with the parsed expressions. This doesn't work because I
|
|
need to find the read expressions for expressions that don't get marks...
|
|
or do I? Yes, I do. In particular, to unparse (define a 3), I need to see
|
|
the read expression to know that it wasn't really (define-values (a)
|
|
(values 3)).
|
|
|
|
Maybe I can add a field to zodiac structures a la maybe-undefined?
|
|
|
|
************
|
|
|
|
That worked great!
|
|
|
|
************
|
|
|
|
Man, there's a lot of shared code in here.
|
|
|
|
************
|
|
|
|
Okay, back to the drawing board on a lot of things.
|
|
|
|
1) Matthias and Robby are of the opinion that the break for an expression
|
|
should be triggered only when that expression becomes the redex. For
|
|
example, the breakpoint for an if expression is triggered _after_ the test
|
|
expression is evaluated.
|
|
|
|
2) I've realized that I need a more general approach in the annotater to
|
|
handle binding constructs other than lambda. In particular, the new
|
|
scheme handles top-level variables differently than lexically bound ones.
|
|
In particular, the mark for an expression contains the value of a
|
|
top-level variable if (1) the variable occurs free in the expression, and
|
|
(2) the expression is on the spine of the current procedure or definition.
|
|
Lexically bound variables are placed in the mark if (1) they occur free in
|
|
the expression, and (2) they are in tail position relative to the innermost
|
|
binding expression for the variable.
|
|
|
|
*** Wait, no. This is crap, because the bodies of lambdas need to store
|
|
all free variables, regardless of whether they're lexically tail w.r.t.
|
|
the binding occurrence. Maybe it really would just be easier to do this in
|
|
two passes. How would this work? One pass would attach the free variables
|
|
to each expression. Then, the variables you must store in the mark for an
|
|
expression are those which (1) occur free and (2) are not contained in
|
|
some lexically enclosing expression. I guess we can use the
|
|
register-client ability of zodiac for this...
|
|
|
|
We're helped out in the lexical variables by the fact that zodiac renames
|
|
all lexically bound variables, so no two bindings have the same name. Of
|
|
course, that's not the case for the special variables inserted by the
|
|
annotator. Most of these ... well, no, all of these will have to appear
|
|
in marks now. The question is whether they'll ever fight with each other.
|
|
In the case of applications, I'm okay, because the only expressions which
|
|
appear in tail ... wait, wait, the only problem that I could have here
|
|
arises when top-level variables have the same names as lexically bound
|
|
ones, and since all of the special ones are lexically bound, this is fine.
|
|
|
|
|
|
************
|
|
|
|
I'm taking these comments out of the program file. They just clutter
|
|
things up.
|
|
|
|
; make-debug-info takes a list of variables and an expression and
|
|
; creates a thunk closed over the expression and (if bindings-needed is true)
|
|
; the following information for each variable in kept-vars:
|
|
; 1) the name of the variable (could actually be inferred)
|
|
; 2) the value of the variable
|
|
; 3) a mutator for the variable, if it appears in mutated-vars.
|
|
; (The reason for the third of these is actually that it can be used
|
|
; in the stepper to determine which bindings refer to the same location,
|
|
; as per Matthew's suggestion.)
|
|
;
|
|
; as an optimization:
|
|
; note that the mutators are needed only for the bindings which appear in
|
|
; closures; no location ambiguity can occur in the 'currently-live' bindings,
|
|
; since at most one location can exist for any given stack binding. That is,
|
|
; using the source, I can tell whether variables referenced directly in the
|
|
; continuation chain refer to the same location.
|
|
|
|
; okay, things have changed a bit. For this iteration, I'm simply not going to
|
|
; store mutators. later, I'll add them in.
|
|
|
|
|
|
************
|
|
|
|
Okay, I'm back to the one-pass scheme, and here's how it's going to work.
|
|
Top-level variables are handled differently from lexically bound ones.
|
|
Annotate/inner takes an expression to annotate, and a list of variables whose
|
|
bindings the current expression is in tail position to. This list may
|
|
optionally also hold the symbol 'all, which indicates that all variables
|
|
which occur free should be placed in the mark.
|
|
|
|
|
|
***********
|
|
|
|
Regarding the question: what the heck is this lexically-bound-vars argument
|
|
to annotate-source-expr? The answer is that if we're displaying a lambda,
|
|
we do not have values for the variables whose bindings are the arguments
|
|
to the lambda. For instance, suppose we have:
|
|
|
|
(define my-top-level 13)
|
|
|
|
(define my-closure
|
|
(lambda (x) (x top-level)))
|
|
|
|
When we're displaying my-closure, we better not try to find a value for x
|
|
when reconstructing the body, as there isn't one.
|
|
|
|
*************
|
|
|
|
This may come back to haunt me: the temporary variables I'm introducing for
|
|
applications and 'if's are funny: they have no bindings. They have no
|
|
orig-name's. They _must_ be expanded, always. This may be a problem when
|
|
I stop displaying the values of lambda-bound variables.
|
|
|
|
***************
|
|
|
|
currently on the stack:
|
|
|
|
yank all of that 'comes-from-blah' crap if read->raw works.
|
|
|
|
*************
|
|
|
|
annotater philosophy: don't look at the source; just expand based on the
|
|
parsed expression. The information you need to reconstruct the
|
|
|
|
*************
|
|
|
|
for savings, I could elide the guard-marks on all but the top level.
|
|
|
|
***********
|
|
|
|
months later; October 99.
|
|
|
|
major reorganization, along a model-view-controller philosophy. Here's how it
|
|
works:
|
|
|
|
The view and controller (for the regular stepper) are combined in a gui unit.
|
|
This unit takes a text%, handles all gui stuff, and invokes the model unit
|
|
(one for each stepping).
|
|
|
|
The model unit is a compound unit. It consists of the annotater, the
|
|
reconstructor, and the model unit itself.
|
|
|
|
Gee whiz; there's so much stuff I haven't talked about. Like for instance the
|
|
fact that the stepper now has before and after steps. The point of this
|
|
reorganization is to permit a natural test suite. Jesus, that's been a long
|
|
time coming. At some point, I'm also hoping to combine the stepper into the
|
|
main DrScheme frame.
|
|
|
|
Oh yes, another major change was that evaluation is now strictly on a one-
|
|
expression-at-a-time basis. The read, parse, and step are now done indiv-
|
|
idually for each expression. This has the ancillary benefit that there's no
|
|
longer any need to reconstruct _all_ of the old expressions at every step.
|
|
|
|
************
|
|
|
|
You know, I should never have started that ******** divider. I have no idea how
|
|
many stars are supposed to be there. Oh well.
|
|
|
|
************
|
|
|
|
The version for DrS-101 is out, and I've restructured the stepper into a
|
|
"model/view/controller" architecture, primarily to ease testing. Of course,
|
|
I haven't actually written the tester yet. So now, the view and controller are
|
|
combined in stepper-view-controller.ss, and the model (instantiated once per
|
|
step-process) is in stepper-model.ss. In fact, the view-controller is also
|
|
instantiated once per step-process, so I'm not utilizing the division in that
|
|
way, but the tester will definitely want to instantiate the model repeatedly.
|
|
|
|
***********
|
|
|
|
I also want to comment a little bit on some severe ugliness regarding pretty-
|
|
printing. The real problem is how to use the existing pretty-print code, while
|
|
still having enough control to highlight in the right locations.
|
|
|
|
Okay, let me explain this one step at a time.
|
|
|
|
The way the pretty-printer currently works is this: there are four hooks into
|
|
the pretty-printing process. The first one is used to determine the width of
|
|
an element. The result of this procedure is used to decide whether a line
|
|
break is necessary. However, this hook is _also_ used to determine whether
|
|
or not the pretty-printer will try to print the string itself or hand off
|
|
responsibility to the display-handler hook. In other words, if the width-
|
|
hook procedure returns a non-false value, then the display-handler will be
|
|
called to print the actual string. The other pair of hook procedures is
|
|
first, a procedure which is called _before_ display of any subexpression,
|
|
and one which is called _after_ display of any subexpression.
|
|
|
|
So how does the stepper use this to do its work? Well, the stepper has two
|
|
tricky tasks to accomplish. First, it must highlight some subexpression.
|
|
Second, it must manually insert elements (i.e. images) which the pretty-printer
|
|
does not handle.
|
|
|
|
Let's talk about images first. In order to display images, the width-hook
|
|
procedure detects images and (if one is encountered) returns a width
|
|
explicitly. (Currently that width is always one, which can lead to display
|
|
errors, but let's leave that for later.) Remember, whenever the width returned
|
|
by this hook is non-false, the display handler will be called to insert the
|
|
object. That's perfect: the display hander inserts the image just fine.
|
|
|
|
One down, one to go.
|
|
|
|
The stepper needs to detect the beginning of the (let's call it the) redex.
|
|
The obvious way to do this is (almost) the right way: the before-printing
|
|
handler checks to see whether the element about to be printed is the redex
|
|
(by an eq?-test). If so, it sets the beginning of the highlight region.
|
|
A corresponding test determines the end of the highlight region. When the
|
|
pretty-printing is complete, we highlight the desired region. Fine.
|
|
|
|
BUT, sometimes we want to highlight things like numbers and symbols; in other
|
|
words, non-heap values. For instance, suppose I tell you that the expression
|
|
that we're printing is (if #t #t #t) and that you're supposed to be highlight-
|
|
ing the #t. Well, I can't tell which of the #t's you want to highlight. So
|
|
this isn't enough information.
|
|
|
|
To solve this problem, the result of the reconstructor is split up into two
|
|
pieces: the reconstructed stuff outside the box, with a special gensym
|
|
occurring where the redex should be, and a separate expression containing
|
|
the redex. Now at least the displayer has enough information to do its job.
|
|
|
|
Now, what happens is that when the width-hook runs into the special gensym,
|
|
it knows that it must insert the redex. Well, that's fine, but remember,
|
|
if this procedure wants to take control of the printing process, it must do
|
|
so by returning the width of the printed object, and then this object must
|
|
be printed by the display-hook. The problem here is that neither of these
|
|
procedures have the faintest idea about line-breaks; that's the pretty-
|
|
printer's job. In other words, this solution only works for things (like
|
|
numbers, symbols and booleans) which cannot be split across lines. What
|
|
do we do?
|
|
|
|
Well, the solution is ugly. Remember, the only reason we had to resort to
|
|
this baroque solution in the first place is that values like numbers, symbols,
|
|
and booleans couldn't be identified uniquely by eq?. So we take a two-
|
|
pronged approach. For non-confusable values, we insert them in place
|
|
of the gensym before doing the printing. For confusable values, we leave
|
|
the placeholder in and take control of the printing process manually.
|
|
|
|
In other words, the _only_ reason this solution works is because of the
|
|
chance overlap between confusable values and non-breakable values. To
|
|
be more precise, it just so happens that all confusable values are non-
|
|
line-breakable.
|
|
|
|
Lucky.
|
|
|
|
And Ugly.
|
|
|
|
*****
|
|
|
|
January, 2000
|
|
|
|
I'm working on the debugger, now, and in particular extending the annotater
|
|
to handle all of the Zodiac forms. Let and Letrec turn out to be quite ugly.
|
|
I'm still a little unsure about certain aspects of variable references, like
|
|
for example whether or not they stay renamed, or whether they return to their
|
|
original names. [ed. note: they get new uninterned symbols that print like
|
|
their original names]
|
|
|
|
But that's not what I'm here to talk about. No, the topic of the day is
|
|
'floating variables.' A floating variable is one whose value must be captured
|
|
in a continuation mark even though it doesn't occur free in the expression that
|
|
the wcm wraps. Let me give an example:
|
|
|
|
(unit/sig some-sig^
|
|
(import)
|
|
|
|
(define a 13)
|
|
(define b (wcm <must grab a> (+ 3 4))))
|
|
|
|
In this case, the continuation-mark must hold the value of a, even though a
|
|
does not occur free in the rhs of b's definition. Floating variables are
|
|
stored in a parameter of annotate/inner. In other words, they propagate
|
|
downward. Furthermore, they're subject to the same potential elision as all
|
|
other variables; you only need to store the ones which are also contained in
|
|
the set tail-bound. Also note that (thank God) Zodiac standardizes names
|
|
apart, so we don't need to worry about duplications. Also note that
|
|
floating variables may only be bound-varrefs.
|
|
|
|
********
|
|
|
|
Okay, well that doesn't work at all; dynamic scope blows it away completely. For instance, imagine the following unit:
|
|
|
|
(unit/sig some-sig^
|
|
(import sig-that-includes-c^)
|
|
|
|
(define a 13)
|
|
(define b (c)))
|
|
|
|
Now, during the execution of c, there's no mark on the stack which holds the bindings of a. DUH! I can't believe I didn't think of this before. Okay, one possible solution for this would be to use _different keys_ for the marks, so that a mark on the unit-evaluation-continuaiton could be retained.
|
|
|
|
|
|
*********
|
|
|
|
|
|
Okay, time to do units. Compound units are dead easy. Just wrap them in a wcm that captures all free vars. No problemo. Normal units are more tricky, because of their scoping rules. Here's my canonical translation:
|
|
|
|
(unit
|
|
(import vars)
|
|
(export vars)
|
|
|
|
(define a a-exp)
|
|
|
|
b
|
|
|
|
(define c c-exp)
|
|
|
|
d
|
|
|
|
etc.)
|
|
|
|
... goes to ...
|
|
|
|
(unit
|
|
(import vars)
|
|
(export vars)
|
|
|
|
(wcm blah ; including imported vars
|
|
(begin
|
|
(set! a a-exp)
|
|
b
|
|
(set! c c-exp)
|
|
d))
|
|
|
|
(define a a)
|
|
(define c c)
|
|
...)
|
|
|
|
************
|
|
|
|
Well, I still haven't written the code to annotate units, so it's a damn good thing
|
|
I wrote down the transformation. I'm here today (thank you very much) to talk about
|
|
annotation schemes.
|
|
|
|
I just (okay, a month ago --- it's now 2000-05-23) folded aries into the stepper. the
|
|
upshot of this is that aries now supports two different annotation modes: "cheap-wrap,"
|
|
which is what aries used to do, and the regular annotation, used for the algebraic
|
|
stepper.
|
|
|
|
However, I'm beginning to see a need for a third annotation, to be used for (non-
|
|
algebraic) debugging. In particular, much of the bulk involved in annotating the
|
|
program source is due to the strict algebraic nature of the stepper. For instance,
|
|
I'm now annotating lets. The actual step taken by the let is after the evaluation
|
|
of all bindings. So we need a break there. However, the body expression is
|
|
_also_ going to have a mark and a break around it, for the "result-break" of the
|
|
let. I thought I could leave out the outer break, but it doesn't work. Actually,
|
|
maybe I could leave out the inner one. Gee whiz. This stuff is really complicated.
|
|
|
|
*************
|
|
|
|
Okay, well, I figured all that stuff out, but now I've got to restructure the
|
|
reconstructor to handle lifting---PRE-lifting, that is---on let/letrec/local.
|
|
In particular, the reconstruct-inner function will now return four things: the
|
|
free bindings, the reconstructed expr, the "before" definitions, and the "after"
|
|
definitions. These before and after definitions are wrapped around the current
|
|
set of generated definitions. Case in point; I'm about to execute the (+ 7 8)
|
|
in the following expression:
|
|
|
|
(let ([a 4]
|
|
[b (let ([h 3]
|
|
[i (+ 7 8)]
|
|
[j 9])
|
|
(+ h i j))]
|
|
[c 19])
|
|
(+ a b c))
|
|
|
|
How do we reconstruct this? Well, first we reconstruct the (+ 7 8) itself, that's
|
|
easy. Then, we encounter a let. The return value of this will be the _before_
|
|
expressions:
|
|
(define ~h~0 3)
|
|
the _after_expressions:
|
|
(define ~i~0 (+ 7 8))
|
|
(define ~j~0 9)
|
|
and the reconstructed expression:
|
|
(+ ~h~0 ~i~0 ~j~0)
|
|
|
|
Now, we recur, using the reconstructed expression. The next step outward is _also_
|
|
a let, so we get the following before expressions:
|
|
(define ~a~0 4)
|
|
the following after expressions:
|
|
(define ~b~0 (+ ~h~0 ~i~0 ~j~0)) <---here is where the reconstructed expr appears
|
|
(define ~c~0 19)
|
|
and the reconstructed expression:
|
|
(+ ~a~0 ~b~0 ~c~0)
|
|
|
|
So then, the final assembly occurs when the "before" expressions are slapped together,
|
|
last first, then the "after" expressions, first first, and then whatever reconstructed
|
|
expression is left over.
|
|
|
|
Ugh.
|
|
|
|
***********
|
|
|
|
Wow. more complications. Here's the new problem. Let's say I have an expression like
|
|
this:
|
|
|
|
(define (make-thunk)
|
|
(let ([lexical-binding 14]
|
|
[returned-thunk (lambda () lexical-binding)])
|
|
returned-thunk))
|
|
|
|
(define first-thunk (make-thunk))
|
|
(define second-thunk (make-thunk))
|
|
|
|
(first-thunk)
|
|
|
|
Now, when I'm just inside the body of first-thunk, and trying to reconstruct "lexical-
|
|
binding", I need to know what lifted name it got.
|
|
|
|
There are a bunch of ways to try to do this, but I'm going to take the most
|
|
straightforward approach (which came to me after about a day of thought),
|
|
which is to expand every lexical binding into a pair of bindings; one which
|
|
refers to the bound value (with the same name as the original binding), and
|
|
a new, gensym'ed one, which indicates what index number this binding has
|
|
received.
|
|
|
|
2000-06-05
|
|
|
|
***********
|
|
|
|
So here's the new format of a full mark:
|
|
(make-mark label source bindings)
|
|
where label is a symbol, source is a zodiac:parsed, and bindings is an association
|
|
list from bindings to values. Note, however, that every let-type binding now has
|
|
_two_ entries in this list. The first one supplies the binding's value, and the
|
|
second one supplies the lifted name's index.
|
|
|
|
[ed note.: see note for 2000-09-26]
|
|
|
|
2000-06-06
|
|
|
|
***********
|
|
|
|
How do we guarantee that lifted names do not clash? Well, for
|
|
each binding we use the original name, with two numbers appended
|
|
to it, separated by zeros; the first one indicates which binding
|
|
it is (more than one binding may have the same original name),
|
|
and the second one indicates which dynamic occurrence of this
|
|
binding it is.
|
|
|
|
So, for instance, if a program contains one binding named 'foo', and it's
|
|
evaluated three times, the third evaluation would result in the lifted name
|
|
'foo0002'. I personally guarantee that no namespace clashes can occur
|
|
in this scheme. Yep.
|
|
|
|
2000-06-06
|
|
|
|
***********
|
|
|
|
Oh.. Well, Matthias prefers a naming scheme whereby all bindings are assigned
|
|
sequential numbers, regardless of the binding name. So this name clash isn't
|
|
really an issue anymore.
|
|
|
|
2000-09-09
|
|
|
|
***********
|
|
|
|
To handle units, marks must now contain "top-level" (actually, unit-bound) variables.
|
|
For this reason, the datatype for a full mark must change. a full mark is now:
|
|
|
|
(make-full-mark location label bindings)
|
|
|
|
where location is a zodiac:location
|
|
label is a symbol,
|
|
and bindings is an association list containing <bindings> and values
|
|
|
|
a <binding> is either a zodiac:binding (for bound vars), or a slot (for
|
|
unit-bound vars in the zodiac:top-level-varref/bind/unit struct).
|
|
|
|
***********
|
|
|
|
Ooookay. We're in Boston now, and I'm rewriting the stepper
|
|
completely to work with version 200. In other words, we're scrapping
|
|
Zodiac completely. This is an interesting SE task, because from a
|
|
data-driven design standpoint, the code is starting from zero again;
|
|
all of my data have different shapes now.
|
|
|
|
Another change is that with the demise of DrScheme Jr and the
|
|
institution of the static-compilation module mechanism, there's no
|
|
longer a need for two separate collections. I've therefore scrapped
|
|
stepper-graphical, and moved everything back into stepper.
|
|
|
|
Also, the stepper no longer needs to be tightly integrated with
|
|
DrScheme itself; it can now be simply a tool. I've already done the
|
|
front-end work to tie in to the new tool interface; I think this stuff
|
|
is all done.
|
|
|
|
So, here's the plan. The major pain is in the annotater, and that's
|
|
what I'm tackling now. I'm proceeding along an iterative refinement
|
|
path; first, I want to get a bare-bones annotation working, without
|
|
any macro-reversal (hence source-correlation) stuff.
|
|
|
|
Bindings. What's a binding? It looks to me like the syntax object
|
|
representing the binding occurrence of the variable should serve
|
|
admirably as a 'binding' for our purposes.
|
|
|
|
***********
|
|
|
|
I'm dumping the tracking of the 'never-undefined property. It was
|
|
originally used for two purposes; first, varrefs had to be wrapped
|
|
with an undefined check. Second, varrefs in ankle- and cheap-wrap
|
|
were not wrapped if the variables were known never to be
|
|
undefined. Now, the undefined check is (at last) inserted by the
|
|
language's elaborator, so the first use is obsolete. The second one
|
|
is more or less obsolete as well, because I'm not sure that cheap- or
|
|
ankle-wrap are ever going to be used again.
|
|
|
|
Also, the 'lambda-bound-var' property is going away; in v200, I don't
|
|
see a good way to get from a bound variable to its binding, which
|
|
makes it more or less impossible to keep track of things by attaching
|
|
properties to bindings. In fact, it doesn't really even make sense
|
|
to try and find the binding for an occurrence in v200, because it's
|
|
not even known. Instead, I've just added another recursion argument
|
|
called 'let-bound-variables", which is basically what the property
|
|
was anyway.
|
|
|
|
2002-01-08
|
|
|
|
***********
|
|
|
|
Why, for the love of God, do we need to put a wcm around a quote? I
|
|
can see how we need one if there's a pre-break there, but otherwise,
|
|
it seems totally useless.
|
|
|
|
Ditto for quote-syntax
|
|
|
|
2002-01-08
|
|
|
|
[Later Note: This is preposterous. Of course I need a wcm there,
|
|
to replace an existing one if necessary. Maybe if it's in
|
|
non-tail position...]
|
|
|
|
***********
|
|
|
|
Here's a nice optimization I'm not taking advantage of: the application
|
|
of all lambda-bound vars doesn't need all those temp vars. OTOH, this won't
|
|
help much with beginner/intermediate, because you never have a lexical
|
|
var in the application position. I suppose you can generalize this to
|
|
say that you only need arg-temps for things that are not lambda-bound
|
|
vars. Well, maybe some other day...
|
|
|
|
2002-01-08
|
|
|
|
***********
|
|
|
|
Okay, as much as I hate to admit it, reconstruct is not just getting a
|
|
face lift; it's being largely rewritten. The major change is this:
|
|
I'm going to delay macro unwinding until the end. Toward this end,
|
|
the recon (formerly "rectify") procedures will produce syntax objects
|
|
with attached properties that record the macro expansions and the
|
|
primary origin of the form. After all reconstruction is done, we go
|
|
through again and look for things that need to be rewritten. This
|
|
will separate the macro unwinding from the basic reconstruction of the
|
|
expression. Hopefully, at the end we can just use
|
|
(syntax-object->datum) to discard all of the side information.
|
|
|
|
Please, let this work. Yikes.
|
|
|
|
2002-01-12
|
|
|
|
*******
|
|
|
|
There's a problem with the reconstruction of let-values, which only
|
|
surfaces in the presence of multiple-values. This is okay for now,
|
|
because beginner and intermediate do not allow multiple values. The
|
|
problem is that if you allow expressions like this --- (let-values
|
|
([() (values)]) 3) --- that is, where there can be an empty set of
|
|
variables in a lhs position, you may not be able to tell at runtime
|
|
what expression you're in the middle of. The problem is that when we
|
|
stop during the evaluation of a rhs in a let, we figure out which rhs
|
|
we're evaluating by which lhs-vars have been changed from their
|
|
original values. Oh, dang. This is totally broken for letrec's in
|
|
which the rhs evaluates to the undefined value.
|
|
|
|
Well, I guess I'm going to have to fix this the right way, by adding a
|
|
counter to every let which is incremented explicitly after the
|
|
evaluation of each rhs. Yikes.
|
|
|
|
********
|
|
|
|
Ha! Did I actually say "right way?" This is totally the _wrong_
|
|
way; keeping information about the continuation by mutating the
|
|
store is guaranteed to fail when continuations are invoked.
|
|
|
|
2002-06-21
|
|
|
|
*********
|
|
|
|
Well, another year has passed. How swiftly they fly! Nathan is
|
|
almost walking, Alex is almost three, and I'm about to graduate.
|
|
But I'd better get the Intermediate stepper working first.
|
|
|
|
A note about lifting; I keep looking for the right idiom in which
|
|
to code the search for the highlight. In fact, the real problem
|
|
is my inability to cleanly express the location of the highlight.
|
|
The one I've settled on as the least egregious is this: a location
|
|
in a syntax object is expressed as a list of context records, where
|
|
each one contains an index indicating the location of the subterm.
|
|
This index makes coding the search less pleasant than it might
|
|
otherwise be; right now, I'm searching by constructing a list
|
|
of subterms paired with indices, and then iterating through these.
|
|
|
|
2003-07-13
|
|
|
|
**********
|
|
|
|
Intermediate stepper now working. I developed a much better
|
|
way of specifying the highlight: the reconstruct engine now delivers
|
|
a syntax object to the display engine, which allows me to use
|
|
syntax properties. Much much better.
|
|
|
|
2004-01-15
|
|
|
|
************
|
|
|
|
A year and a half has passed since I've thought about this file, and
|
|
I'm now in the midst of a Google Summer of Code (SoC) grant which is
|
|
supposed to get me to support mutation, and make the corresponding changes
|
|
to the interface.
|
|
|
|
A thought I had while walking the California mountainsides (BTW: I've just
|
|
graduated, and gotten a job at Cal Poly)--why do I do the reconstruction
|
|
from the inside out? Wouldn't it be much much easier to do from the
|
|
outside in? Feh.
|
|
|
|
2005-08-02
|
|
|
|
|
|
*************
|
|
|
|
Well, the dang summer is almost over, and I've still got a long, long way to
|
|
go.
|
|
|
|
The basic change to the model is that instead of storing completed definitions
|
|
as pre-formatted s-expressions, I'm now storing them as 2-element lists
|
|
containing the syntax object associated with the definition and a 'getter'
|
|
which returns the value that the binding refers to. The actual definition is
|
|
reformatted for each step. This is a bit silly, but it would be easy to cache
|
|
the definitions along with the present values if this is actually a performance
|
|
bottleneck. I suspect it won't matter a bit.
|
|
|
|
In the presence of mutation, the existing separators don't make sense, either.
|
|
I'm scrapping them, for the moment. A nice interface change would be to
|
|
separate only the definitions that had changed. For them moment, they'll all be
|
|
separated.
|
|
|
|
The first order of business, after mucking around in the model for some time to
|
|
get the flavor of how things will work, is to go and set up the interface so I
|
|
can get things running.
|
|
|
|
*************
|
|
|
|
Okay, I've "completed" the google project, but there are still things
|
|
to wrap up. Right now I'm working on the highlighting for mutated
|
|
bindings, which is inferred from differences in the rendered steps.
|
|
|
|
So, for instance, if the left-hand-side has (define a 3), and the
|
|
right-hand-side has (define a 4), well then we'd better highlight the
|
|
3 on the left and the 4 on the right, because this binding was
|
|
mutated.
|
|
|
|
Now, this kind of highlighting--reconstructing highlighting from
|
|
observed differences, rather than obtaining direct evidence of the
|
|
mutation--clearly has some shortcomings. For instance, concurrent
|
|
code... well, concurrent code is all messed up to begin with; a
|
|
more interesting problem occurs when you have mutations that share
|
|
structure. So, what if a is mutated from (list 3 4) to (list 4 5).
|
|
Should the whole thing be highlighted? Certainly that's what you'd
|
|
get from a a normal reduction semantics. In some sense, though,
|
|
highlighting _just_ the 3 and the 4 (and the 4 and the 5) corresponds
|
|
to a smaller set of changes that produces the same result.
|
|
|
|
Another problem that's coming out is the problem of "intermediate"
|
|
completed expressions that arise from partially evaluated letrecs
|
|
(and all the things that expand into them). These should also be
|
|
scanned for mutation, right? What about the "future" ones? There
|
|
are other rendering problems with forward mutation in letrecs which
|
|
I haven't tackled, as well. I find myself leaning toward depending
|
|
on the "user-source" syntax property. As I've observed before, though,
|
|
the syntax properties form a sort of creeping mush; they don't need
|
|
to be explicitly expressed as arguments or return values, and errors
|
|
of omission in the syntax properties are hard to catch. A lot can
|
|
hide in the "syntax?" contract.
|
|
|
|
2005-09-21
|
|
|
|
**************
|
|
|
|
Time to clean up for v300. Let's see if we can get begin and begin0 working.
|
|
|
|
2005-11-14
|
|
|
|
*************
|
|
|
|
Okay, it turns out that begin expands into a let-values with empty bindings,
|
|
so I'm working on getting this going. With this addition, the annotation for
|
|
'let' is a complete monster, chewing up a substantial fraction of the annotation
|
|
code all by itself.
|
|
|
|
Also, I've come across a design optimization that improves if & set!, which is
|
|
this: there's no reason to have if-temp & set!-temp. Putting these inline
|
|
is a great improvement: it reduces code in the reconstructor, in the annotator,
|
|
all over the place. The caveat: I haven't finished it yet, so who knows what
|
|
kind of horrible thing will crop up.
|
|
|
|
The architecture change here is that we need a new kind of break that's like
|
|
a normal-break (blecch! terrible name!) but carries a value along with it.
|
|
I'm going to call this the normal-break/value break. Blurrch!
|
|
|
|
2006-01-12
|
|
|
|
*************
|