diff --git a/features-and-implementation b/features-and-implementation new file mode 100644 index 0000000..b29019a --- /dev/null +++ b/features-and-implementation @@ -0,0 +1,40 @@ +* gc +* automatic generation of mappings Old → New + + +* safe and automatic generation of 1↔1, 1↔many and many↔1 backward references + +** global directives which apply to all nodes types: insert a pointer to the root, insert a backward pointer for every link (i.e. .back brings back to the previous node) + +** the backward references specify the link that should be reversed, as well as a path from there. That way, methods can have a backward pointer to the "call" instructions referencing them, but the backward pointer should not + directly point to the "call" instruction, but instead to the instruction's containing method, accessed with ".method". Maybe some syntactic sugar can allow this to be specified as #:backward-reference method.instructions.*[call] + +* try to think about PHOAS + +** Strong HOAS article: https://www.schoolofhaskell.com/user/edwardk/phoas + + + +* mapping-placeholder = Placeholder type, one for each mapping +* mapping-incomplete-result = (~> mapping) types within the mapping's result type are replaced by (U mapping-placeholder mapping-incomplete-result), and node types are replaced by (U xxx old-node) +* mapping-with-promises-result = node types within the mapping's result type are replaced by node-promise + +* node-promise = (Promise node-with-promises) +* node-with-promises = node types within the node's fields are replaced by node-promise + + + + + + + + + + + +To discuss: +* maybe add a "self" variable, so that a mapping can give to others a placeholder for the node it is currently generating? That wouldn't play well with mappings which generate multiple nodes and/or part of a node. +** self is a opaque "SELF" token, not an actual placeholder + +Minor TODOs: +* If we can guarantee that any two nodes within the same graph are (not (equal? …)), then we can implement a fast negative equality test, in addition to the existing fast equality test (which compares the "raw" field of nodes). diff --git a/features-and-implementation2 b/features-and-implementation2 new file mode 100644 index 0000000..331ce53 --- /dev/null +++ b/features-and-implementation2 @@ -0,0 +1,142 @@ +On verification +=============== + +* Some guarantees can be enforced statically using the type system. + They are in a way proven to be enforeced in the expanded code, + but in order to write verified compilers, it would be necessary to + prove that the type indeed enforces the desired property, for all + possible macro inputs. + + Advantage: these proofs need to be written only once for the graph macro, + and can be relied on by all compilers written using these static guarantees. +* Some guarantees can be enforced statically "by construction", + and the design patterns used to enforce them are guaranteed to be + correctly applied because the code is macro-generated. + In order to write verified compilers, it would be necessary to + prove that the macro-generated code always enforces the desired property. + + Advantage: these proofs need to be written only once for the graph macro, + and can be relied on by all compilers written using these static guarantees. +* Some guarantees can be enforced at run-time. + This allows catching bugs early, and ensures no incorrect output can occur + when these guarantees are broken. It does not however ensure that the compiler + will never break these guarantees. It is therefore still possible to ship a + broken compiler which fails on valid inputs. + + Drawback: it is necessary for every compiler written using such guards to prove + that the guards are never raised. + + +Features +======== + +* GC: + * Group nodes by SCC + * Two kinds of "indices" pointers: + * index within the current strongly-connected component + * direct pointer to a descendant SCC + index within it + * To limit the waste when there is an SCC not containing nodes of a given type, + do not use a fixed tuple of vectors of nodes. Instead, use a "list with possible omissions". + Problem: the size of the fully-expanded type in TR is O(2ⁿ) for these. + Either: + * use casts + * pay the compile-time price of a huge type (not feasible with large N) + * pay the run-time price of bloated components (feasible but costly with large N) + * use a case→ function, which returns the empty vector as a fallback. With pure functions, this could be doable. + * Problem: generating all the possible functions means O(2^n) templates for closures that may be used at + run-time + * Otherwise, generate one closure over 1 array, one closure over 2 arrays, one over 3 arrays, etc. + and also close over some markers indicating to which "real" arrays these correspond. + This means O(2n) space usage worst-case. + Problem again: the construction function for such a case→ while minimizing the number of values closed on + has a large O(2^n) type too. + * typed clojure has typed heterogeneous maps https://github.com/clojure/core.typed/wiki/Types#heterogeneous-maps + * In theory it could be possible to give hints to the GC, + but Racket's GC does not offer that possibility. + +GC Implementation: +* Must be optional +* Initially, all nodes within a single component, all indices are of the "slef + index" kind. +* Use tarjan's algorithm +* group nodes by SCC + Update references to nodes in the same SCC, using just an index + Update references to nodes in "lower" SCCs, using a pointer to the SCC + an index. +* Due to the higher memory cost (+ 1 reference for references to other SCCs), + and the poor set of options for implementation (cast, exponential size of type, or large memory usage) + we will not implement the GC. + + We argue that in most cases it would not be useful, as compilers will normally work on the whole graph, and + not on a subgraph. In the rare case where only part of a graph is needed, applying the graph-level identity + transformation to the nodes of interest would effectively copy them and the nodes reachable from these new roots, + thereby allowing the original graph to be freed. + +* Invariants + * Scope of the invariants: + * Input contracts, types and structural properties + * Output contracts, types and structural properties + * Transformation invariants (relate the input and output graphs) + * Time of verification: + * Run-time + Should be easy enough to implement: Add a define-graph-invariant form, recognize these options, and check + them at run-time. Implemented as some sorts of contracts. + * Compile-time + * Node types + * Statically-enforced structural invariants at the level of node types + (e.g. disallow cycles in the types, to ensure there are no cycles in the instance) + * Macros can be used to generate code which is known to be correct + e.g. paths + * Concern: static "by construction" guarantees may interfere with each other, if they fill in some nodes, + e.g. a "no cycles starting from this node" constraint would not work as expected if a "backwards link" + is filled in afterwards. We probably need to hardcode a set of constraints which know about eachother + and about the potential interactions. + * Conserve well-scopedness within a transition: pass in nodes flagged with a ∀ type, and check + that the output contains that flag. + * PHOAS + * Specification + * Invariants specified in the graph type + * Transformation invariants specified on the graph creation code + * Checks (run-time or the various compile-time mechanisms) are specified in the graph creation code + The graph creation code must enforce all invariants within the type + That way, it is guaranteed that any instance of the graph type satisfies its invariants, either by + construction, or as a guard at the end of the construction. + +* Automatic generation of mappings + * when there is no mapping taking an old node as an input, a mapping is automatically generated. + The mapping simply translates the old node to the new node type, and recursively transforms its fields, + traversing lists etc. + * When there is a mapping with more than one argument, then no mapping is auto-generated for that input node type, + and instead calls to the mapping must be explicit (i.e. can't return the old node type). + * This means that we have a mechanism before the actual core graph macro, which checks and decides which mappings + to auto-generate. + * We also have a mechanism for auto-calling transformations on input node types + * Possibility to escape this, in order to actually insert a reference to the old graph? + * Some notation in the type, to indicate that it's "protected" in some way? + * Some wrapper indicating that it should merely be unwrapped, not processed? (I think that's better). + +* Structural node equality + * Would be nice to coalesce nodes which are equal? (i.e. hash consing) + so that any two nodes which are equal? within the same graph have the same index. + I suppose this would be rather costly: O(n log n) comparisons, with each comparison potentially + costly, in principle. If we cached results, we could achieve a better run-time in practice, and + perhaps a better theoretical complexity if we handled cycles ourselves. + * The general algorithm when there are no unordered sets is deterministic finite automaton minimization + * The general algorithm when there are unordered sets is nondeterministic finite automaton minimization + * We could cache all calls to equal? on two nodes for a limited dynamic scope. + * If we have this, then we can, for comparisons within the same graph, quickly return #t or #f based on eq? + * Alpha-equivalence comparisons + (i.e. ignore the actual values of some fields, except for constraining the shape of the graph) + Possibility: create values which are unique within the graph, but are ignored when comparing + nodes from different graphs. + I'm not sure whether plainly erasing the information won't be enough, for a weaker form of alpha-equivalence? + * Alpha-equivalence can't easily be implemented atop equal?, dropping it for now. + It's not extremely useful anyway, as thanks to the full-blown graph representation, we are likely to + discard names early on (keeping them just for printing / debuggin purposes). + + + + + + +* Coloring: the existing graph library for Racket has some coloring algorithms: + http://docs.racket-lang.org/graph/index.html#%28def._%28%28lib._graph%2Fmain..rkt%29._coloring%2Fgreedy%29%29 + Maybe we can build a wrapper for those? \ No newline at end of file diff --git a/features-and-implementation3 b/features-and-implementation3 new file mode 100644 index 0000000..5744da7 --- /dev/null +++ b/features-and-implementation3 @@ -0,0 +1,99 @@ +* Invariants + * Scope of the invariants: + * Input contracts, types and structural properties + * Output contracts, types and structural properties + * Transformation invariants (relate the input and output graphs) + * Time of verification: + * Run-time + Should be easy enough to implement: Add a define-graph-invariant form, recognize these options, and check + them at run-time. Implemented as some sorts of contracts. + * Compile-time + * Node types + * Statically-enforced structural invariants at the level of node types + (e.g. disallow cycles in the types, to ensure there are no cycles in the instance) + * Macros can be used to generate code which is known to be correct + e.g. paths + * Concern: static "by construction" guarantees may interfere with each other, if they fill in some nodes, + e.g. a "no cycles starting from this node" constraint would not work as expected if a "backwards link" + is filled in afterwards. We probably need to hardcode a set of constraints which know about eachother + and about the potential interactions. + * Conserve well-scopedness within a transition: pass in nodes flagged with a ∀ type, and check + that the output contains that flag. Potentially out-of-scope fields in the input do not have the flag. + * PHOAS + * Specification + * Invariants specified in the graph type + * Transformation invariants specified on the graph creation code + * Checks (run-time or the various compile-time mechanisms) are specified in the graph creation code + The graph creation code must enforce all invariants within the type + That way, it is guaranteed that any instance of the graph type satisfies its invariants, either by + construction, or as a guard at the end of the construction. + +; The body should produce a function of type (→ (Listof Nodeᵢ) … Boolean) +; The body should also return an indication about which invariants it ensures. Maybe use {~seq #:ensures (invariant-name inariant-arg …)} … + however this does not allow the body to perform some minimal amount of rewriting on the "ensures" options. +(define-syntax/parse (define-graph-contract (name g-descriptor arg …) . body) + #'(define-syntax name + (graph-contract + (λ (g-descriptor arg …) . body)))) + +;; TODO: find a way to translate this to a type, with subtyping for weaker invariants. +;; The type only serves to testify that the invariant was statically enforced or dynamically checked at construction, it does not actually encode the property using the type system. +(struct invariants-wrapper ()) ;; Use a private struct to prevent forging of the invariants aggregated in a case→ (since it is never executed, any non-terminating λ could otherwise be supplied). +(define-for-syntax invariant-introducer + (make-syntax-introducer)) +;; the body should return the syntax for a type, such that less precise invariants are supertypes of that type. +(define-syntax/parse (define-graph-invariant (name g-descriptor arg …) . body) + #'(define-syntax name + (graph-invariant + (λ (g-descriptor arg …) . body)))) + +The type of a graph node with the invariants inv₁ … invₙ on the graph must include an extra dummy field with type +(invariants-wrapper + (case (→ inv₁ inv-arg … #t) + … + (→ invₙ inv-arg … #t))) + +The invariant arguments can be symbols, to indicate node types or field names. + +Concern: invariants on a graph may not have the same semantics if the graph has more or fewer nodes? + +* Automatic generation of mappings + * when there is no mapping taking an old node as an input, a mapping is automatically generated. + The mapping simply translates the old node to the new node type, and recursively transforms its fields, + traversing lists etc. + * When there is a mapping with more than one argument, then no mapping is auto-generated for that input node type, + and instead calls to the mapping must be explicit (i.e. can't return the old node type). + * This means that we have a mechanism before the actual core graph macro, which checks and decides which mappings + to auto-generate. + * We also have a mechanism for auto-calling transformations on input node types + * Possibility to escape this, in order to actually insert a reference to the old graph? + * Some notation in the type, to indicate that the output should be an "old" node here + * In the rare case where we have an (U (Old nd) nd), in the mapping use some (old v) wrapper + indicating that the returned it should merely be unwrapped, not processed. + +(define-syntax define-graph-type + (syntax-parser + [(_ _name [_nodeᵢ [_fieldᵢⱼ :colon _τᵢⱼ] …] …) + ;; save the graph type metadata + ])) + +(define-syntax define-graph-transformer + (syntax-parser + [(_ _name graph-type) + ;; TODO + ])) + +* Structural node equality + * Would be nice to coalesce nodes which are equal? (i.e. hash consing) + so that any two nodes which are equal? within the same graph have the same index. + I suppose this would be rather costly: O(n log n) comparisons, with each comparison potentially + costly, in principle. If we cached results, we could achieve a better run-time in practice, and + perhaps a better theoretical complexity if we handled cycles ourselves. + * The general algorithm when there are no unordered sets is deterministic finite automaton minimization + * The general algorithm when there are unordered sets is nondeterministic finite automaton minimization + * We could cache all calls to equal? on two nodes for a limited dynamic scope. + * If we have this, then we can, for comparisons within the same graph, quickly return #t or #f based on eq? + +* Coloring: the existing graph library for Racket has some coloring algorithms: + http://docs.racket-lang.org/graph/index.html#%28def._%28%28lib._graph%2Fmain..rkt%29._coloring%2Fgreedy%29%29 + Maybe we can build a wrapper for those? \ No newline at end of file