507 lines
22 KiB
Plaintext
507 lines
22 KiB
Plaintext
variable references: there are three kinds of variable references:
|
|
1) bound variable refs
|
|
2) unit-bound variable refs
|
|
3) top-level variable refs
|
|
|
|
You might be forgiven for some confusion: these three appear to overlap
|
|
heavily. Here are more accurate defintions for each one:
|
|
|
|
unit-bound variable references are those which occur as the left-hand sides of
|
|
top-level definitions within a unit.
|
|
|
|
bound variable references are those which occur within the scope of a
|
|
lambda, case-lambda, let, let*, letrec, or other form which introduces a
|
|
limited lexical scope. This includes `local', but not the unit-bound
|
|
variables mentioned above.
|
|
|
|
top-level references are the rest of the references.
|
|
|
|
One difference between top-level and bound varrefs are the way that they
|
|
are handled at runtime. Top-level varrefs are looked up in a table; if
|
|
they are not found in this table, a runtime error is signalled. Note that
|
|
this lookup occurs only when the varref is evaluated, not when it is first
|
|
`encountered' (e.g., in the body of a closure). One reason that this
|
|
mechanism is necessary is that a Scheme REPL permits top-level references
|
|
to variables that have not yet been defined.
|
|
|
|
Bound varrefs have a known lexical binding location, and they can be looked
|
|
up directly, rather than going through the indirection of checking a table.
|
|
These variables may be introduced by forms like `letrec' or `local', and
|
|
they may furthermore be used before their binding definition has been
|
|
evaluated. In this case, they have the `<undefined>' value. In most
|
|
language levels, a reference to a variable which contains the `<undefined>'
|
|
value is an error. In such a language level, any variable which may have
|
|
this value must be checked on every evaluated reference.
|
|
|
|
So here's the problem: unit-bound varrefs are similar to those inside a
|
|
`local'. Syntactically, their bindings are introduced by `define', and their
|
|
scope extends in both directions. Semantically they are similar to
|
|
bound variables, in that the interpreter can lexically fix the binding of
|
|
the variable. In both of these regards they are similar to the bindings
|
|
in a `local'. However, zodiac does not parse them like those in a
|
|
`local'. Rather, it parses them as `top-level-varref's. Why? I forget,
|
|
and I'm about to ask Matthew yet again. Then I'll record the answer here.
|
|
|
|
Now things get a bit more complicated. Top-level varrefs never need to be
|
|
checked for the '<undefined>' value; before they are bound, they have no
|
|
runtime lookup location at all. Bound varrefs and unit varrefs, on the
|
|
other hand, may contain the `<undefined>' value. In particular, those
|
|
bound by letrec, local, and units may contain this value. Others, like
|
|
those bound by lambda, let, and let*, will not. For the first and third
|
|
categories, we do not need to check for the undefined value at runtime.
|
|
Only when we are looking at a bound or unit varref which may contain the
|
|
`<undefined>' value do we need to insert a runtime check.
|
|
|
|
*******
|
|
|
|
Another topic entirely is that of sharing. When a break occurs, the
|
|
stepper reconstructs the state of memory. However, two closures may refer
|
|
to the same binding. For instance,
|
|
|
|
(define-values (setter getter)
|
|
(let ([a '*undefined*])
|
|
(values
|
|
(lambda (x) (set! a x))
|
|
(lambda () a))))
|
|
|
|
If each closure is linked to a record of the form (lambda ()
|
|
values-of-free-vars), there's no way to tell whether the first and second
|
|
closure refer to the same binding of a or not. So in this case, we must
|
|
devise some other technique to detect sharing. A simple one suggested by
|
|
Matthew is to store mutators in the closure record; then, sharing can be
|
|
detected by the old bang-one-and-see-if-the-other-changes technique.
|
|
|
|
*********
|
|
|
|
A note about source locations: I'm using the "start" locations of sexps
|
|
(assigned by Zodiac) to uniquely identify those expressions: I don't
|
|
believe there are any instances where two expressions share a start
|
|
location.
|
|
|
|
Later: this is now obsolete: I'm just storing the parsed zodiac
|
|
expressions. Forget all of this source correlation crap. Zodiac does it
|
|
for me.
|
|
|
|
*********
|
|
|
|
Robby has a good point: Matthew's technique for detecting gaps in the
|
|
continuation-mark chain (look for applications whose arguments are fully
|
|
evaluated but are still on the list of current marks) depends on the
|
|
assumption that every "jump site" has the jump as its tail action. In
|
|
other words, what about things like "invoke-unit/open", which jumps to some
|
|
code, evaluates it, >then comes back and binds unit values in the
|
|
environment<. In this case, the "invoke-unit/open" continuation will not
|
|
be handed directly to the evaluation of the unit, because work remains to
|
|
be done after the evaluation of the unit's definitions. Therefore, it will
|
|
be impossible to tell when un-annotated code is appearing on the stack in
|
|
uses of "invoke-unit/open." Problem.
|
|
|
|
*********
|
|
|
|
So what the heck does a mark contain for the stepper? it looks like this:
|
|
|
|
(lambda () (list <source-expr> <var-list>))
|
|
|
|
with
|
|
|
|
var-list = (list-of var)
|
|
|
|
and
|
|
|
|
var = (list <val> z:varref)
|
|
|
|
*********
|
|
|
|
Let me say a few words here about the overall structure of the
|
|
annotator/stepper combination. We have a choice when rebuilding the source:
|
|
we can follow the source itself, or we can follow the parsed expression
|
|
emitted by zodiac. If our task is simply to spit out source code, then
|
|
it's clear that we should simply follow the source. However, we need to
|
|
replace certain variables with the values of their bindings (in
|
|
particular, lambda-bound ones). Well, in beginner mode anyway...
|
|
|
|
*******
|
|
|
|
Okay, I'm about to extend the stepper significantly, and I want to do at
|
|
least a little bit of design work first. The concept is this: I want the
|
|
stepper to stop _after_ each reduction, as well as before it. One principal
|
|
difference between the new and old step types is that in the new one,
|
|
the continuation cannot be rectified entirely based upon the continuation
|
|
marks; the value that is produced by the expression in question is also
|
|
needed.
|
|
|
|
Here's a question: can I prove, for the setup I put together, that the part
|
|
of the continuation _outside_ the highlighted region does not change? This
|
|
should be the case; after all, the continuation itself does not change.
|
|
|
|
Of course, there are some reductions which do not immediately produce a value;
|
|
procedure applications, and ... uh oh, what about cond and if expressions?
|
|
We want the stepper to use the appropriate "answer" as the "result" of
|
|
the step. So there's some context sensitivity here.
|
|
|
|
Wait, maybe not. It seems like _every_ expression will have to have a "stop
|
|
on entry" step. Further, these types of steps will _not_ have values associated
|
|
with them. Hmmm....
|
|
|
|
Okay, this isn't that hard. Yes, it's true that every expression that becomes
|
|
... no, it's not obvious that the expression which is substituted ... jesus,
|
|
it's not even always the case that a "substitution" occurs in the simplistic
|
|
sense I'm imagining. Damn, I wish my reduction semantics were finished.
|
|
|
|
(Much later): The real issue is that the "stop-on-enter" code is inserted based
|
|
on the surrounding code, and
|
|
|
|
|
|
So, here's the next macro we need to handle: define-struct.
|
|
|
|
|
|
*********
|
|
|
|
Don't forget a test like
|
|
|
|
(cond [blah]
|
|
[else cond [blah] [blah]])
|
|
|
|
|
|
**********
|
|
|
|
Okay, I'm a complete moron. In particular, I threw out all of the source
|
|
correlation code a week ago because I somehow convinced myself that the
|
|
parsed expressions retained references to the read expressions. That's
|
|
not true; all that's kept is a "location" structure, which records the file
|
|
and offset and all that jazz.
|
|
|
|
So I tried to fix that by inserting these source expressions into the
|
|
marks, along with the parsed expressions. This doesn't work because I
|
|
need to find the read expressions for expressions that don't get marks...
|
|
or do I? Yes, I do. In particular, to unparse (define a 3), I need to see
|
|
the read expression to know that it wasn't really (define-values (a)
|
|
(values 3)).
|
|
|
|
Maybe I can add a field to zodiac structures a la maybe-undefined?
|
|
|
|
************
|
|
|
|
That worked great!
|
|
|
|
************
|
|
|
|
Man, there's a lot of shared code in here.
|
|
|
|
************
|
|
|
|
Okay, back to the drawing board on a lot of things.
|
|
|
|
1) Matthias and Robby are of the opinion that the break for an expression
|
|
should be triggered only when that expression becomes the redex. For
|
|
example, the breakpoint for an if expression is triggered _after_ the test
|
|
expression is evaluated.
|
|
|
|
2) I've realized that I need a more general approach in the annotater to
|
|
handle binding constructs other than lambda. In particular, the new
|
|
scheme handles top-level variables differently than lexically bound ones.
|
|
In particular, the mark for an expression contains the value of a
|
|
top-level variable if (1) the variable occurs free in the expression, and
|
|
(2) the expression is on the spine of the current procedure or definition.
|
|
Lexically bound variables are placed in the mark if (1) they occur free in
|
|
the expression, and (2) they are in tail position relative to the innermost
|
|
binding expression for the variable.
|
|
|
|
*** Wait, no. This is crap, because the bodies of lambdas need to store
|
|
all free variables, regardless of whether they're lexically tail w.r.t.
|
|
the binding occurrence. Maybe it really would just be easier to do this in
|
|
two passes. How would this work? One pass would attach the free variables
|
|
to each expression. Then, the variables you must store in the mark for an
|
|
expression are those which (1) occur free and (2) are not contained in
|
|
some lexically enclosing expression. I guess we can use the
|
|
register-client ability of zodiac for this...
|
|
|
|
We're helped out in the lexical variables by the fact that zodiac renames
|
|
all lexically bound variables, so no two bindings have the same name. Of
|
|
course, that's not the case for the special variables inserted by the
|
|
annotator. Most of these ... well, no, all of these will have to appear
|
|
in marks now. The question is whether they'll ever fight with each other.
|
|
In the case of applications, I'm okay, because the only expressions which
|
|
appear in tail ... wait, wait, the only problem that I could have here
|
|
arises when top-level variables have the same names as lexically bound
|
|
ones, and since all of the special ones are lexically bound, this is fine.
|
|
|
|
|
|
************
|
|
|
|
I'm taking these comments out of the program file. They just clutter
|
|
things up.
|
|
|
|
; make-debug-info takes a list of variables and an expression and
|
|
; creates a thunk closed over the expression and (if bindings-needed is true)
|
|
; the following information for each variable in kept-vars:
|
|
; 1) the name of the variable (could actually be inferred)
|
|
; 2) the value of the variable
|
|
; 3) a mutator for the variable, if it appears in mutated-vars.
|
|
; (The reason for the third of these is actually that it can be used
|
|
; in the stepper to determine which bindings refer to the same location,
|
|
; as per Matthew's suggestion.)
|
|
;
|
|
; as an optimization:
|
|
; note that the mutators are needed only for the bindings which appear in
|
|
; closures; no location ambiguity can occur in the 'currently-live' bindings,
|
|
; since at most one location can exist for any given stack binding. That is,
|
|
; using the source, I can tell whether variables referenced directly in the
|
|
; continuation chain refer to the same location.
|
|
|
|
; okay, things have changed a bit. For this iteration, I'm simply not going to
|
|
; store mutators. later, I'll add them in.
|
|
|
|
|
|
************
|
|
|
|
Okay, I'm back to the one-pass scheme, and here's how it's going to work.
|
|
Top-level variables are handled differently from lexically bound ones.
|
|
Annotate/inner takes an expression to annotate, and a list of variables whose
|
|
bindings the current expression is in tail position to. This list may
|
|
optionally also hold the symbol 'all, which indicates that all variables
|
|
which occur free should be placed in the mark.
|
|
|
|
|
|
***********
|
|
|
|
Regarding the question: what the heck is this lexically-bound-vars argument
|
|
to annotate-source-expr? The answer is that if we're displaying a lambda,
|
|
we do not have values for the variables whose bindings are the arguments
|
|
to the lambda. For instance, suppose we have:
|
|
|
|
(define my-top-level 13)
|
|
|
|
(define my-closure
|
|
(lambda (x) (x top-level)))
|
|
|
|
When we're displaying my-closure, we better not try to find a value for x
|
|
when reconstructing the body, as there isn't one.
|
|
|
|
*************
|
|
|
|
This may come back to haunt me: the temporary variables I'm introducing for
|
|
applications and 'if's are funny: they have no bindings. They have no
|
|
orig-name's. They _must_ be expanded, always. This may be a problem when
|
|
I stop displaying the values of lambda-bound variables.
|
|
|
|
***************
|
|
|
|
currently on the stack:
|
|
|
|
yank all of that 'comes-from-blah' crap if read->raw works.
|
|
|
|
*************
|
|
|
|
annotater philosophy: don't look at the source; just expand based on the
|
|
parsed expression. The information you need to reconstruct the
|
|
|
|
*************
|
|
|
|
for savings, I could elide the guard-marks on all but the top level.
|
|
|
|
***********
|
|
|
|
months later; October 99.
|
|
|
|
major reorganization, along a model-view-controller philosophy. Here's how it
|
|
works:
|
|
|
|
The view and controller (for the regular stepper) are combined in a gui unit.
|
|
This unit takes a text%, handles all gui stuff, and invokes the model unit
|
|
(one for each stepping).
|
|
|
|
The model unit is a compound unit. It consists of the annotater, the
|
|
reconstructor, and the model unit itself.
|
|
|
|
Gee whiz; there's so much stuff I haven't talked about. Like for instance the
|
|
fact that the stepper now has before and after steps. The point of this
|
|
reorganization is to permit a natural test suite. Jesus, that's been a long
|
|
time coming. At some point, I'm also hoping to combine the stepper into the
|
|
main DrScheme frame.
|
|
|
|
Oh yes, another major change was that evaluation is now strictly on a one-
|
|
expression-at-a-time basis. The read, parse, and step are now done indiv-
|
|
idually for each expression. This has the ancillary benefit that there's no
|
|
longer any need to reconstruct _all_ of the old expressions at every step.
|
|
|
|
************
|
|
|
|
You know, I should never have started that ******** divider. I have no idea how
|
|
many stars are supposed to be there. Oh well.
|
|
|
|
************
|
|
|
|
The version for DrS-101 is out, and I've restructured the stepper into a
|
|
"model/view/controller" architecture, primarily to ease testing. Of course,
|
|
I haven't actually written the tester yet. So now, the view and controller are
|
|
combined in stepper-view-controller.ss, and the model (instantiated once per
|
|
step-process) is in stepper-model.ss. In fact, the view-controller is also
|
|
instantiated once per step-process, so I'm not utilizing the division in that
|
|
way, but the tester will definitely want to instantiate the model repeatedly.
|
|
|
|
***********
|
|
|
|
I also want to comment a little bit on some severe ugliness regarding pretty-
|
|
printing. The real problem is how to use the existing pretty-print code, while
|
|
still having enough control to highlight in the right locations.
|
|
|
|
Okay, let me explain this one step at a time.
|
|
|
|
The way the pretty-printer currently works is this: there are four hooks into
|
|
the pretty-printing process. The first one is used to determine the width of
|
|
an element. The result of this procedure is used to decide whether a line
|
|
break is necessary. However, this hook is _also_ used to determine whether
|
|
or not the pretty-printer will try to print the string itself or hand off
|
|
responsibility to the display-handler hook. In other words, if the width-
|
|
hook procedure returns a non-false value, then the display-handler will be
|
|
called to print the actual string. The other pair of hook procedures is
|
|
first, a procedure which is called _before_ display of any subexpression,
|
|
and one which is called _after_ display of any subexpression.
|
|
|
|
So how does the stepper use this to do its work? Well, the stepper has two
|
|
tricky tasks to accomplish. First, it must highlight some subexpression.
|
|
Second, it must manually insert elements (i.e. images) which the pretty-printer
|
|
does not handle.
|
|
|
|
Let's talk about images first. In order to display images, the width-hook
|
|
procedure detects images and (if one is encountered) returns a width
|
|
explicitly. (Currently that width is always one, which can lead to display
|
|
errors, but let's leave that for later.) Remember, whenever the width returned
|
|
by this hook is non-false, the display handler will be called to insert the
|
|
object. That's perfect: the display hander inserts the image just fine.
|
|
|
|
One down, one to go.
|
|
|
|
The stepper needs to detect the beginning of the (let's call it the) redex.
|
|
The obvious way to do this is (almost) the right way: the before-printing
|
|
handler checks to see whether the element about to be printed is the redex
|
|
(by an eq?-test). If so, it sets the beginning of the highlight region.
|
|
A corresponding test determines the end of the highlight region. When the
|
|
pretty-printing is complete, we highlight the desired region. Fine.
|
|
|
|
BUT, sometimes we want to highlight things like numbers and symbols; in other
|
|
words, non-heap values. For instance, suppose I tell you that the expression
|
|
that we're printing is (if #t #t #t) and that you're supposed to be highlight-
|
|
ing the #t. Well, I can't tell which of the #t's you want to highlight. So
|
|
this isn't enough information.
|
|
|
|
To solve this problem, the result of the reconstructor is split up into two
|
|
pieces: the reconstructed stuff outside the box, with a special gensym
|
|
occurring where the redex should be, and a separate expression containing
|
|
the redex. Now at least the displayer has enough information to do its job.
|
|
|
|
Now, what happens is that when the width-hook runs into the special gensym,
|
|
it knows that it must insert the redex. Well, that's fine, but remember,
|
|
if this procedure wants to take control of the printing process, it must do
|
|
so by returning the width of the printed object, and then this object must
|
|
be printed by the display-hook. The problem here is that neither of these
|
|
procedures have the faintest idea about line-breaks; that's the pretty-
|
|
printer's job. In other words, this solution only works for things (like
|
|
numbers, symbols and booleans) which cannot be split across lines. What
|
|
do we do?
|
|
|
|
Well, the solution is ugly. Remember, the only reason we had to resort to
|
|
this baroque solution in the first place is that values like numbers, symbols,
|
|
and booleans couldn't be identified uniquely by eq?. So we take a two-
|
|
pronged approach. For non-confusable values, we insert them in place
|
|
of the gensym before doing the printing. For confusable values, we leave
|
|
the placeholder in and take control of the printing process manually.
|
|
|
|
In other words, the _only_ reason this solution works is because of the
|
|
chance overlap between confusable values and non-breakable values. To
|
|
be more precise, it just so happens that all confusable values are non-
|
|
line-breakable.
|
|
|
|
Lucky.
|
|
|
|
And Ugly.
|
|
|
|
*****
|
|
|
|
January, 2000
|
|
|
|
I'm working on the debugger, now, and in particular extending the annotater to handle all of the Zodiac forms. Let and Letrec turn out to be quite ugly. I'm still a little unsure about certain aspects of variable references, like for example whether or not they stay renamed, or whether they return to their original names.
|
|
|
|
But that's not what I'm here to talk about. No, the topic of the day is 'floating variables.' A floating variable is one whose value must be captured in a continuation mark even though it doesn't occur free in the expression that the wcm wraps. Let me give an example:
|
|
|
|
(unit/sig some-sig^
|
|
(import)
|
|
|
|
(define a 13)
|
|
(define b (wcm <must grab a> (+ 3 4))))
|
|
|
|
In this case, the continuation-mark must hold the value of a, even though a does not occur free in the rhs of b's definition. Floating variables are stored in a parameter of annotate/inner. In other words, they propagate downward. Furthermore, they're subject to the same potential elision as all other variables; you only need to store the ones which are also contained in the set tail-bound. Also note that (thank God) Zodiac standardizes names apart, so we don't need to worry about duplications. Also note that floating variables may only be bound-varrefs.
|
|
|
|
********
|
|
|
|
Okay, well that doesn't work at all; dynamic scope blows it away completely. For instance, imagine the following unit:
|
|
|
|
(unit/sig some-sig^
|
|
(import sig-that-includes-c^)
|
|
|
|
(define a 13)
|
|
(define b (c)))
|
|
|
|
Now, during the execution of c, there's no mark on the stack which holds the bindings of a. DUH! I can't believe I didn't think of this before. Okay, one possible solution for this would be to use _different keys_ for the marks, so that a mark on the unit-evaluation-continuaiton could be retained.
|
|
|
|
|
|
*********
|
|
|
|
|
|
Okay, time to do units. Compound units are dead easy. Just wrap them in a wcm that captures all free vars. No problemo. Normal units are more tricky, because of their scoping rules. Here's my canonical translation:
|
|
|
|
(unit
|
|
(import vars)
|
|
(export vars)
|
|
|
|
(define a a-exp)
|
|
|
|
b
|
|
|
|
(define c c-exp)
|
|
|
|
d
|
|
|
|
etc.)
|
|
|
|
... goes to ...
|
|
|
|
(unit
|
|
(import vars)
|
|
(export vars)
|
|
|
|
(wcm blah ; including imported vars
|
|
(begin
|
|
(set! a a-exp)
|
|
b
|
|
(set! c c-exp)
|
|
d))
|
|
|
|
(define a a)
|
|
(define c c)
|
|
...)
|
|
|
|
************
|
|
|
|
Well, I still haven't written the code to annotate units, so it's a damn good thing
|
|
I wrote down the transformation. I'm here today (thank you very much) to talk about
|
|
annotation schemes.
|
|
|
|
I just (okay, a month ago --- it's now 2000-05-23) folded aries into the stepper. the
|
|
upshot of this is that aries now supports two different annotation modes: "cheap-wrap,"
|
|
which is what aries used to do, and the regular annotation, used for the algebraic
|
|
stepper.
|
|
|
|
However, I'm beginning to see a need for a third annotation, to be used for (non-
|
|
algebraic) debugging. In particular, much of the bulk involved in annotating the
|
|
program source is due to the strict algebraic nature of the stepper. For instance,
|
|
I'm now annotating lets. The actual step taken by the let is after the evaluation
|
|
of all bindings. So we need a break there. However, the body expression is
|
|
_also_ going to have a mark and a break around it, for the "result-break" of the
|
|
let. I thought I could leave out the outer break, but it doesn't work. Actually,
|
|
maybe I could leave out the inner one. Gee whiz. This stuff is really complicated.
|
|
|
|
|
|
|