phc-graph/features-and-implementation2

On verification
===============

* Some guarantees can be enforced statically using the type system.
  They are in a way proven to be enforeced in the expanded code,
  but in order to write verified compilers, it would be necessary to
  prove that the type indeed enforces the desired property, for all
  possible macro inputs.

  Advantage: these proofs need to be written only once for the graph macro,
  and can be relied on by all compilers written using these static guarantees.
* Some guarantees can be enforced statically "by construction",
  and the design patterns used to enforce them are guaranteed to be
  correctly applied because the code is macro-generated.
  In order to write verified compilers, it would be necessary to
  prove that the macro-generated code always enforces the desired property.

  Advantage: these proofs need to be written only once for the graph macro,
  and can be relied on by all compilers written using these static guarantees.
* Some guarantees can be enforced at run-time.
  This allows catching bugs early, and ensures no incorrect output can occur
  when these guarantees are broken. It does not however ensure that the compiler
  will never break these guarantees. It is therefore still possible to ship a
  broken compiler which fails on valid inputs.

  Drawback: it is necessary for every compiler written using such guards to prove
  that the guards are never raised.


Features
========

* GC:
  * Group nodes by SCC
  * Two kinds of "indices" pointers:
    * index within the current strongly-connected component
    * direct pointer to a descendant SCC + index within it
  * To limit the waste when there is an SCC not containing nodes of a given type,
    do not use a fixed tuple of vectors of nodes. Instead, use a "list with possible omissions".
    Problem: the size of the fully-expanded type in TR is O(2ⁿ) for these.
    Either:
    * use casts
    * pay the compile-time price of a huge type (not feasible with large N)
    * pay the run-time price of bloated components (feasible but costly with large N)
    * use a case→ function, which returns the empty vector as a fallback. With pure functions, this could be doable.
      * Problem: generating all the possible functions means O(2^n) templates for closures that may be used at
        run-time
      * Otherwise, generate one closure over 1 array, one closure over 2 arrays, one over 3 arrays, etc.
        and also close over some markers indicating to which "real" arrays these correspond.
        This means O(2n) space usage worst-case.
        Problem again: the construction function for such a case→ while minimizing the number of values closed on
        has a large O(2^n) type too.
    * typed clojure has typed heterogeneous maps https://github.com/clojure/core.typed/wiki/Types#heterogeneous-maps
  * In theory it could be possible to give hints to the GC,
    but Racket's GC does not offer that possibility.

GC Implementation:
* Must be optional
* Initially, all nodes within a single component, all indices are of the "slef + index" kind.
* Use tarjan's algorithm
* group nodes by SCC
  Update references to nodes in the same SCC, using just an index
  Update references to nodes in "lower" SCCs, using a pointer to the SCC + an index.
* Due to the higher memory cost (+ 1 reference for references to other SCCs),
  and the poor set of options for implementation (cast, exponential size of type, or large memory usage)
  we will not implement the GC.

  We argue that in most cases it would not be useful, as compilers will normally work on the whole graph, and
  not on a subgraph. In the rare case where only part of a graph is needed, applying the graph-level identity
  transformation to the nodes of interest would effectively copy them and the nodes reachable from these new roots,
  thereby allowing the original graph to be freed.

* Invariants
  * Scope of the invariants:
    * Input contracts, types and structural properties
    * Output contracts, types and structural properties
    * Transformation invariants (relate the input and output graphs)
  * Time of verification:
    * Run-time
      Should be easy enough to implement: Add a define-graph-invariant form, recognize these options, and check
      them at run-time. Implemented as some sorts of contracts.
    * Compile-time
      * Node types
      * Statically-enforced structural invariants at the level of node types
        (e.g. disallow cycles in the types, to ensure there are no cycles in the instance)
      * Macros can be used to generate code which is known to be correct
        e.g. paths
        * Concern: static "by construction" guarantees may interfere with each other, if they fill in some nodes,
          e.g. a "no cycles starting from this node" constraint would not work as expected if a "backwards link"
          is filled in afterwards. We probably need to hardcode a set of constraints which know about eachother
          and about the potential interactions.
      * Conserve well-scopedness within a transition: pass in nodes flagged with a ∀ type, and check
        that the output contains that flag.
      * PHOAS
  * Specification
    * Invariants specified in the graph type
    * Transformation invariants specified on the graph creation code
    * Checks (run-time or the various compile-time mechanisms) are specified in the graph creation code
      The graph creation code must enforce all invariants within the type
      That way, it is guaranteed that any instance of the graph type satisfies its invariants, either by
      construction, or as a guard at the end of the construction.

* Automatic generation of mappings
  * when there is no mapping taking an old node as an input, a mapping is automatically generated.
    The mapping simply translates the old node to the new node type, and recursively transforms its fields,
    traversing lists etc.
  * When there is a mapping with more than one argument, then no mapping is auto-generated for that input node type,
    and instead calls to the mapping must be explicit (i.e. can't return the old node type).
  * This means that we have a mechanism before the actual core graph macro, which checks and decides which mappings
    to auto-generate.
  * We also have a mechanism for auto-calling transformations on input node types
  * Possibility to escape this, in order to actually insert a reference to the old graph?
    * Some notation in the type, to indicate that it's "protected" in some way?
    * Some wrapper indicating that it should merely be unwrapped, not processed? (I think that's better).

* Structural node equality
  * Would be nice to coalesce nodes which are equal? (i.e. hash consing)
    so that any two nodes which are equal? within the same graph have the same index.
    I suppose this would be rather costly: O(n log n) comparisons, with each comparison potentially
    costly, in principle. If we cached results, we could achieve a better run-time in practice, and
    perhaps a better theoretical complexity if we handled cycles ourselves.
    * The general algorithm when there are no unordered sets is deterministic finite automaton minimization
    * The general algorithm when there are unordered sets is nondeterministic finite automaton minimization
    * We could cache all calls to equal? on two nodes for a limited dynamic scope.
    * If we have this, then we can, for comparisons within the same graph, quickly return #t or #f based on eq?
  * Alpha-equivalence comparisons
    (i.e. ignore the actual values of some fields, except for constraining the shape of the graph)
    Possibility: create values which are unique within the graph, but are ignored when comparing
                 nodes from different graphs.
    I'm not sure whether plainly erasing the information won't be enough, for a weaker form of alpha-equivalence?
    * Alpha-equivalence can't easily be implemented atop equal?, dropping it for now.
      It's not extremely useful anyway, as thanks to the full-blown graph representation, we are likely to
      discard names early on (keeping them just for printing / debuggin purposes).


* Coloring: the existing graph library for Racket has some coloring algorithms:
  http://docs.racket-lang.org/graph/index.html#%28def._%28%28lib._graph%2Fmain..rkt%29._coloring%2Fgreedy%29%29
  Maybe we can build a wrapper for those?