notes

2016-12-16 17:34:06 +01:00 · 2016-12-16 17:34:06 +01:00 · 7bc0b80356
commit 7bc0b80356
parent 60f8915c59
3 changed files with 281 additions and 0 deletions
--- a/40
+++ b/40
@ -0,0 +1,40 @@
 * gc
 * automatic generation of mappings Old → New
 * safe and automatic generation of 1↔1, 1↔many and many↔1 backward references
 ** global directives which apply to all nodes types: insert a pointer to the root, insert a backward pointer for every link (i.e. .back brings back to the previous node)
 ** the backward references specify the link that should be reversed, as well as a path from there. That way, methods can have a backward pointer to the "call" instructions referencing them, but the backward pointer should not
   directly point to the "call" instruction, but instead to the instruction's containing method, accessed with ".method". Maybe some syntactic sugar can allow this to be specified as #:backward-reference method.instructions.*[call]
 * try to think about PHOAS
 ** Strong HOAS article: https://www.schoolofhaskell.com/user/edwardk/phoas
 * mapping-placeholder          = Placeholder type, one for each mapping
 * mapping-incomplete-result    = (~> mapping) types within the mapping's result type are replaced by (U mapping-placeholder mapping-incomplete-result), and node types are replaced by (U xxx old-node)
 * mapping-with-promises-result = node types within the mapping's result type are replaced by node-promise
 * node-promise                 = (Promise node-with-promises)
 * node-with-promises           = node types within the node's fields are replaced by node-promise
 To discuss:
 * maybe add a "self" variable, so that a mapping can give to others a placeholder for the node it is currently generating? That wouldn't play well with mappings which generate multiple nodes and/or part of a node.
 ** self is a opaque "SELF" token, not an actual placeholder
 Minor TODOs:
 * If we can guarantee that any two nodes within the same graph are (not (equal? …)), then we can implement a fast negative equality test, in addition to the existing fast equality test (which compares the "raw" field of nodes).
--- a/142
+++ b/142
@ -0,0 +1,142 @@
 On verification
 ===============
 * Some guarantees can be enforced statically using the type system.
  They are in a way proven to be enforeced in the expanded code,
  but in order to write verified compilers, it would be necessary to
  prove that the type indeed enforces the desired property, for all
  possible macro inputs.
  Advantage: these proofs need to be written only once for the graph macro,
  and can be relied on by all compilers written using these static guarantees.
 * Some guarantees can be enforced statically "by construction",
  and the design patterns used to enforce them are guaranteed to be
  correctly applied because the code is macro-generated.
  In order to write verified compilers, it would be necessary to
  prove that the macro-generated code always enforces the desired property.
  Advantage: these proofs need to be written only once for the graph macro,
  and can be relied on by all compilers written using these static guarantees.
 * Some guarantees can be enforced at run-time.
  This allows catching bugs early, and ensures no incorrect output can occur
  when these guarantees are broken. It does not however ensure that the compiler
  will never break these guarantees. It is therefore still possible to ship a
  broken compiler which fails on valid inputs.
  Drawback: it is necessary for every compiler written using such guards to prove
  that the guards are never raised.
 Features
 ========
 * GC:
  * Group nodes by SCC
  * Two kinds of "indices" pointers:
    * index within the current strongly-connected component
    * direct pointer to a descendant SCC + index within it
  * To limit the waste when there is an SCC not containing nodes of a given type,
    do not use a fixed tuple of vectors of nodes. Instead, use a "list with possible omissions".
    Problem: the size of the fully-expanded type in TR is O(2ⁿ) for these.
    Either:
    * use casts
    * pay the compile-time price of a huge type (not feasible with large N)
    * pay the run-time price of bloated components (feasible but costly with large N)
    * use a case→ function, which returns the empty vector as a fallback. With pure functions, this could be doable.
      * Problem: generating all the possible functions means O(2^n) templates for closures that may be used at
        run-time
      * Otherwise, generate one closure over 1 array, one closure over 2 arrays, one over 3 arrays, etc.
        and also close over some markers indicating to which "real" arrays these correspond.
        This means O(2n) space usage worst-case.
        Problem again: the construction function for such a case→ while minimizing the number of values closed on
        has a large O(2^n) type too.
    * typed clojure has typed heterogeneous maps https://github.com/clojure/core.typed/wiki/Types#heterogeneous-maps
  * In theory it could be possible to give hints to the GC,
    but Racket's GC does not offer that possibility.
 GC Implementation:
 * Must be optional
 * Initially, all nodes within a single component, all indices are of the "slef + index" kind.
 * Use tarjan's algorithm
 * group nodes by SCC
  Update references to nodes in the same SCC, using just an index
  Update references to nodes in "lower" SCCs, using a pointer to the SCC + an index.
 * Due to the higher memory cost (+ 1 reference for references to other SCCs),
  and the poor set of options for implementation (cast, exponential size of type, or large memory usage)
  we will not implement the GC.
  We argue that in most cases it would not be useful, as compilers will normally work on the whole graph, and
  not on a subgraph. In the rare case where only part of a graph is needed, applying the graph-level identity
  transformation to the nodes of interest would effectively copy them and the nodes reachable from these new roots,
  thereby allowing the original graph to be freed.
 * Invariants
  * Scope of the invariants:
    * Input contracts, types and structural properties
    * Output contracts, types and structural properties
    * Transformation invariants (relate the input and output graphs)
  * Time of verification:
    * Run-time
      Should be easy enough to implement: Add a define-graph-invariant form, recognize these options, and check
      them at run-time. Implemented as some sorts of contracts.
    * Compile-time
      * Node types
      * Statically-enforced structural invariants at the level of node types
        (e.g. disallow cycles in the types, to ensure there are no cycles in the instance)
      * Macros can be used to generate code which is known to be correct
        e.g. paths 
        * Concern: static "by construction" guarantees may interfere with each other, if they fill in some nodes,
          e.g. a "no cycles starting from this node" constraint would not work as expected if a "backwards link"
          is filled in afterwards. We probably need to hardcode a set of constraints which know about eachother
          and about the potential interactions.
      * Conserve well-scopedness within a transition: pass in nodes flagged with a ∀ type, and check
        that the output contains that flag.
      * PHOAS
  * Specification
    * Invariants specified in the graph type
    * Transformation invariants specified on the graph creation code
    * Checks (run-time or the various compile-time mechanisms) are specified in the graph creation code
      The graph creation code must enforce all invariants within the type
      That way, it is guaranteed that any instance of the graph type satisfies its invariants, either by
      construction, or as a guard at the end of the construction.
 * Automatic generation of mappings
  * when there is no mapping taking an old node as an input, a mapping is automatically generated.
    The mapping simply translates the old node to the new node type, and recursively transforms its fields,
    traversing lists etc.
  * When there is a mapping with more than one argument, then no mapping is auto-generated for that input node type,
    and instead calls to the mapping must be explicit (i.e. can't return the old node type).
  * This means that we have a mechanism before the actual core graph macro, which checks and decides which mappings
    to auto-generate.
  * We also have a mechanism for auto-calling transformations on input node types
  * Possibility to escape this, in order to actually insert a reference to the old graph?
    * Some notation in the type, to indicate that it's "protected" in some way?
    * Some wrapper indicating that it should merely be unwrapped, not processed? (I think that's better).
 * Structural node equality
  * Would be nice to coalesce nodes which are equal? (i.e. hash consing)
    so that any two nodes which are equal? within the same graph have the same index.
    I suppose this would be rather costly: O(n log n) comparisons, with each comparison potentially
    costly, in principle. If we cached results, we could achieve a better run-time in practice, and
    perhaps a better theoretical complexity if we handled cycles ourselves.
    * The general algorithm when there are no unordered sets is deterministic finite automaton minimization
    * The general algorithm when there are unordered sets is nondeterministic finite automaton minimization
    * We could cache all calls to equal? on two nodes for a limited dynamic scope.
    * If we have this, then we can, for comparisons within the same graph, quickly return #t or #f based on eq?
  * Alpha-equivalence comparisons
    (i.e. ignore the actual values of some fields, except for constraining the shape of the graph)
    Possibility: create values which are unique within the graph, but are ignored when comparing
                 nodes from different graphs.
    I'm not sure whether plainly erasing the information won't be enough, for a weaker form of alpha-equivalence?
    * Alpha-equivalence can't easily be implemented atop equal?, dropping it for now.
      It's not extremely useful anyway, as thanks to the full-blown graph representation, we are likely to
      discard names early on (keeping them just for printing / debuggin purposes).
 * Coloring: the existing graph library for Racket has some coloring algorithms:
  http://docs.racket-lang.org/graph/index.html#%28def._%28%28lib._graph%2Fmain..rkt%29._coloring%2Fgreedy%29%29
  Maybe we can build a wrapper for those?
--- a/99
+++ b/99
@ -0,0 +1,99 @@
 * Invariants
  * Scope of the invariants:
    * Input contracts, types and structural properties
    * Output contracts, types and structural properties
    * Transformation invariants (relate the input and output graphs)
  * Time of verification:
    * Run-time
      Should be easy enough to implement: Add a define-graph-invariant form, recognize these options, and check
      them at run-time. Implemented as some sorts of contracts.
    * Compile-time
      * Node types
      * Statically-enforced structural invariants at the level of node types
        (e.g. disallow cycles in the types, to ensure there are no cycles in the instance)
      * Macros can be used to generate code which is known to be correct
        e.g. paths 
        * Concern: static "by construction" guarantees may interfere with each other, if they fill in some nodes,
          e.g. a "no cycles starting from this node" constraint would not work as expected if a "backwards link"
          is filled in afterwards. We probably need to hardcode a set of constraints which know about eachother
          and about the potential interactions.
      * Conserve well-scopedness within a transition: pass in nodes flagged with a ∀ type, and check
        that the output contains that flag. Potentially out-of-scope fields in the input do not have the flag.
      * PHOAS
  * Specification
    * Invariants specified in the graph type
    * Transformation invariants specified on the graph creation code
    * Checks (run-time or the various compile-time mechanisms) are specified in the graph creation code
      The graph creation code must enforce all invariants within the type
      That way, it is guaranteed that any instance of the graph type satisfies its invariants, either by
      construction, or as a guard at the end of the construction.
 ; The body should produce a function of type (→ (Listof Nodeᵢ) … Boolean)
 ; The body should also return an indication about which invariants it ensures. Maybe use {~seq #:ensures (invariant-name inariant-arg …)} …
  however this does not allow the body to perform some minimal amount of rewriting on the "ensures" options.
 (define-syntax/parse (define-graph-contract (name g-descriptor arg …) . body)
  #'(define-syntax name
      (graph-contract
        (λ (g-descriptor arg …) . body))))
 ;; TODO: find a way to translate this to a type, with subtyping for weaker invariants.
 ;; The type only serves to testify that the invariant was statically enforced or dynamically checked at construction, it does not actually encode the property using the type system.
 (struct invariants-wrapper ()) ;; Use a private struct to prevent forging of the invariants aggregated in a case→ (since it is never executed, any non-terminating λ could otherwise be supplied).
 (define-for-syntax invariant-introducer
  (make-syntax-introducer))
 ;; the body should return the syntax for a type, such that less precise invariants are supertypes of that type.
 (define-syntax/parse (define-graph-invariant (name g-descriptor arg …) . body)
  #'(define-syntax name
      (graph-invariant
        (λ (g-descriptor arg …) . body))))
 The type of a graph node with the invariants inv₁ … invₙ on the graph must include an extra dummy field with type
 (invariants-wrapper
  (case (→ inv₁ inv-arg … #t)
        …
        (→ invₙ inv-arg … #t)))
 The invariant arguments can be symbols, to indicate node types or field names.
 Concern: invariants on a graph may not have the same semantics if the graph has more or fewer nodes?
 * Automatic generation of mappings
  * when there is no mapping taking an old node as an input, a mapping is automatically generated.
    The mapping simply translates the old node to the new node type, and recursively transforms its fields,
    traversing lists etc.
  * When there is a mapping with more than one argument, then no mapping is auto-generated for that input node type,
    and instead calls to the mapping must be explicit (i.e. can't return the old node type).
  * This means that we have a mechanism before the actual core graph macro, which checks and decides which mappings
    to auto-generate.
  * We also have a mechanism for auto-calling transformations on input node types
  * Possibility to escape this, in order to actually insert a reference to the old graph?
    * Some notation in the type, to indicate that the output should be an "old" node here
    * In the rare case where we have an (U (Old nd) nd), in the mapping use some (old v) wrapper
      indicating that the returned it should merely be unwrapped, not processed.
 (define-syntax define-graph-type
  (syntax-parser
   [(_ _name [_nodeᵢ [_fieldᵢⱼ :colon _τᵢⱼ] …] …)
    ;; save the graph type metadata
    ]))
 (define-syntax define-graph-transformer
  (syntax-parser
   [(_ _name graph-type)
    ;; TODO
    ]))
 * Structural node equality
  * Would be nice to coalesce nodes which are equal? (i.e. hash consing)
    so that any two nodes which are equal? within the same graph have the same index.
    I suppose this would be rather costly: O(n log n) comparisons, with each comparison potentially
    costly, in principle. If we cached results, we could achieve a better run-time in practice, and
    perhaps a better theoretical complexity if we handled cycles ourselves.
    * The general algorithm when there are no unordered sets is deterministic finite automaton minimization
    * The general algorithm when there are unordered sets is nondeterministic finite automaton minimization
    * We could cache all calls to equal? on two nodes for a limited dynamic scope.
    * If we have this, then we can, for comparisons within the same graph, quickly return #t or #f based on eq?
 * Coloring: the existing graph library for Racket has some coloring algorithms:
  http://docs.racket-lang.org/graph/index.html#%28def._%28%28lib._graph%2Fmain..rkt%29._coloring%2Fgreedy%29%29
  Maybe we can build a wrapper for those?