
Now that these are type checked by Typed Racket in 7.5, it's a problem that the documentation doesn't promise that these will be a Fixnum. Switch to using Integer, and change the use of fxxor to match.
413 lines
18 KiB
Racket
413 lines
18 KiB
Racket
#lang hyper-literate typed/racket/base #:no-require-lang #:no-auto-require
|
||
@(require scribble-enhanced/doc
|
||
racket/require
|
||
hyper-literate
|
||
(for-label racket/format
|
||
racket/promise
|
||
racket/list
|
||
type-expander
|
||
(except-in (subtract-in typed/racket/base type-expander)
|
||
values)
|
||
(only-in racket/base values)
|
||
(subtract-in racket/contract typed/racket/base)
|
||
phc-toolkit
|
||
phc-toolkit/untyped-only
|
||
remember))
|
||
@(unless-preexpanding
|
||
(require (for-label (submod ".."))))
|
||
@doc-lib-setup
|
||
|
||
@title[#:style manual-doc-style
|
||
#:tag "node-low-level"
|
||
#:tag-prefix "phc-adt/node-low-level"
|
||
]{Implementation of nodes: printing and equality}
|
||
|
||
@(chunks-toc-prefix
|
||
'("(lib phc-adt/scribblings/phc-adt-implementation.scrbl)"
|
||
"phc-adt/node-low-level"))
|
||
|
||
This section discusses the implementation of @tc[prop:custom-write] and
|
||
@tc[prop:equal+hash] for nodes.
|
||
|
||
@(table-of-contents)
|
||
|
||
@section{Printing nodes}
|
||
|
||
To avoid printing large and confusing swathes of data when a node is displayed,
|
||
we only print its constituents up to a certain depth. The
|
||
@tech[#:doc '(lib "scribblings/guide/guide.scrbl")]{parameter}
|
||
@tc[write-node-depth] controls the depth for printing nested nodes.
|
||
|
||
@CHUNK[<write-node-depth>
|
||
(define write-node-depth (make-parameter 1))]
|
||
|
||
The @tc[make-node-writer] macro expands to a procedure which prints a node with
|
||
the given name and fields. If the @racket[write-node-depth] is @racket[0], then
|
||
the contents of the node are elided, and only its name is printed, so that the
|
||
resulting printed representation is @racket["(node name …)"] with an actual
|
||
ellipsis character.
|
||
|
||
@CHUNK[<node-custom-write>
|
||
(define-syntax/parse (make-node-writer pid name fieldᵢ …)
|
||
#'(λ (self out mode)
|
||
(if (> (write-node-depth) 0)
|
||
(parameterize ([write-node-depth (sub1 (write-node-depth))])
|
||
(fprintf out
|
||
"(node ~a ~a)"
|
||
'name
|
||
(string-join (list <format-field> …) " ")))
|
||
(fprintf out "(node ~a …)" 'name))))]
|
||
|
||
Each field is formatted as @tc[[fieldᵢ valueᵢ]]. Copy-pasting the whole printed
|
||
form will not form a valid expression which would be @tc[equal?] to the
|
||
original. This limitation is deliberate: a node will often refer to many other
|
||
nodes, and a stand-alone representation of such a node would result in a very
|
||
large printed form. Instead, the user should call the @tc[serialize-graph]
|
||
macro, which will produce a complete, canonical @note{The representation is
|
||
canonical so long as unordered sets or hash tables are not used as part of the
|
||
node's contents. In that case, the printed form is canonical modulo the order
|
||
of elements within the set or hash table. Once executed, it will nevertheless
|
||
produce a node which is @racket[equal?] to the original.} and self-contained
|
||
representation of the node.
|
||
|
||
@chunk[<format-field>
|
||
(format "[~a ~a]" 'fieldᵢ (force ((struct-accessor pid fieldᵢ) self)))]
|
||
|
||
@section{Comparing and hashing nodes}
|
||
|
||
Nodes are represented like tagged structures, but contain an extra @tc[raw]
|
||
field. The @tc[raw] field contains a low-level representation of the node, which
|
||
is used to implement node equality. The low-level representation uses the
|
||
@tc[raw-node] Racket @racket[struct]. It contains two fields, @tc[database] and
|
||
@tc[index]. The first is the database of nodes, as created by the graph
|
||
construction macro. It contains one vector of nodes per node type. The second is
|
||
a logical pointer into that database, consisting of the node's type's name,
|
||
represented as a symbol, and an offset within the corresponding vector,
|
||
represented as an @tc[Index].
|
||
|
||
@chunk[<raw-node>
|
||
(struct/props (D I) raw-node ([database : D] [index : I]) #:transparent
|
||
<raw-node-equality>)]
|
||
|
||
A regular with-promises node can have several in-memory representations which
|
||
are not pointer-equal. This is due to the fact that the contents of node fields
|
||
are wrapped with promises, and accessing the node via two distinct paths will
|
||
yield two copies, each with fresh promises. We therefore use the @tc[raw-node]
|
||
as a proxy for pointer equality: we know for sure that two nodes are exactly the
|
||
same if the @tc[database] and @tc[index] is the same for both nodes.
|
||
|
||
@chunk[<raw-node-equality>
|
||
#:property prop:equal+hash
|
||
(list (λ (a b r)
|
||
(and (raw-node? a)
|
||
(raw-node? b)
|
||
(eq? (raw-node-database a) (raw-node-database b))
|
||
(equal? (raw-node-index a) (raw-node-index b))))
|
||
(λ (a r)
|
||
(fxxor (eq-hash-code (raw-node-database a))
|
||
(r (raw-node-index a))))
|
||
(λ (a r)
|
||
(fxxor (eq-hash-code (raw-node-database a))
|
||
(r (raw-node-index a)))))]
|
||
|
||
The following function can then be used to test if two nodes are the same, based
|
||
on the contents of their @tc[raw] field:
|
||
|
||
@chunk[<same-node?>
|
||
(define (same-node? a b)
|
||
(and ((struct-predicate node-id) a)
|
||
((struct-predicate node-id) b)
|
||
(equal? ((struct-accessor node-id raw) a)
|
||
((struct-accessor node-id raw) b))))]
|
||
|
||
To detect cycles within the graph while implementing node equality, we use a
|
||
global hash table tracking which nodes have already been visited.
|
||
|
||
@chunk[<seen-hash-table>
|
||
(define seen-nodes
|
||
: (Parameterof (U #f (HashTable (raw-node Any Any) Any)))
|
||
(make-parameter #f))]
|
||
|
||
The current implementation uses a mutable hash table. It is only initialised
|
||
when @tc[equal?] starts comparing two nodes, so that references to nodes are not
|
||
kept once @tc[equal?] finished running. However, in theory, an immutable hash
|
||
table could be threaded through all the recursive calls to @tc[equal?].
|
||
Unfortunately, the recursive equality function supplied by Racket when
|
||
implementing @tc[prop:equal+hash] does not accept an extra parameter to thread
|
||
state throughout the recursion. It would therefore be necessary to re-implement
|
||
the algorithm used by Racket's @tc[equal?] as described by
|
||
@cite[adams2008scheme-equality] tailored to the comparison of data structures
|
||
with high-level logical cycles. To be correct, such a re-implementation would
|
||
however need to access the @tc[prop:equal+hash] property of other structs, but
|
||
Racket provides no public predicate or accessor for that property. Therefore,
|
||
although it would, in theory, be possible to implement node equality without
|
||
mutable state, Racket's library does not offer the primitives needed to build
|
||
it. We therefore settle on using a global, mutable hash table, which will exist
|
||
only during the execution of @tc[equal?].
|
||
|
||
@chunk[<node-equal+hash>
|
||
(define-syntax/parse
|
||
(make-node-comparer common-id node-id name fieldᵢ …)
|
||
(define-temp-ids "~a/τ" (fieldᵢ …))
|
||
#'(let ()
|
||
<same-node?>
|
||
<find-in-table>
|
||
<node-hash>
|
||
(list <node-equal?>
|
||
<node-equal-hash-code>
|
||
<node-equal-secondary-hash-code>)))]
|
||
|
||
@subsection{Hashing nodes}
|
||
|
||
@tc[equal-hash-code] and @tc[equal-secondary-hash-code] are implemented via
|
||
a single function @tc[node-hash], the only difference being the function used to
|
||
recursively compute the hash of sub-elements.
|
||
|
||
@chunk[<node-equal-hash-code>
|
||
(λ (a rec-equal-hash-code)
|
||
(node-hash a rec-equal-hash-code))]
|
||
|
||
@chunk[<node-equal-secondary-hash-code>
|
||
(λ (a rec-equal-secondary-hash-code)
|
||
(node-hash a rec-equal-secondary-hash-code))]
|
||
|
||
It would be desirable to implement hashing in the following way: if the current
|
||
node is already present in a hash table of seen nodes, but is not @tc[eq?] to
|
||
that copy, then the racket hash function is called on the already-seen node.
|
||
Otherwise, if the node has never been seen, or if it is @tc[eq?] to the seen
|
||
node, the hash code is computed.
|
||
|
||
The problem with this approach is that it introduces an intermediate recursive
|
||
call to Racket's hashing function. When Racket's hashing function is applied to
|
||
a structure with the @tc[prop:equal+hash] property, it does @emph{not}
|
||
return the result of the struct's hash implementation unmodified.
|
||
|
||
For example, the code below implements a struct @tc[s] with no fields, which
|
||
computes its hash code by recursively calling Racket's hashing function on other
|
||
(unique) instances of @tc[s], and returns the constant @tc[1] at a certain
|
||
depth. The custom hashing function does not alter in any way the result returned
|
||
by Racket's hashing function, however we can see that the hash for the same
|
||
instance of @tc[s] depends on the number of recursive calls to Racket's hashing
|
||
function @tc[r]. This simple experiment seems to suggest that Racket adds
|
||
@tc[50] at each step, but this is not something that can be relied upon.
|
||
|
||
@(require scribble/eval)
|
||
@defs+int[
|
||
{(define recursion-depth (make-parameter #f))
|
||
(struct s (x) #:transparent
|
||
#:property prop:equal+hash
|
||
(list (λ (a b r) (error "Not implemented"))
|
||
(λ (a r)
|
||
(if (= 0 (recursion-depth))
|
||
1
|
||
(parameterize ([recursion-depth (sub1 (recursion-depth))])
|
||
(r (s (gensym))))))
|
||
(λ (a r) (error "Not implemented"))))
|
||
(define s-instance (s 'x))}
|
||
(parameterize ([recursion-depth 0])
|
||
(equal-hash-code s-instance))
|
||
(parameterize ([recursion-depth 1])
|
||
(equal-hash-code s-instance))
|
||
(parameterize ([recursion-depth 2])
|
||
(equal-hash-code s-instance))]
|
||
|
||
Since the order of traversal of the nodes is not fixed in the presence of sets
|
||
and hash tables, we need to make sure that the recursion depth at which a node's
|
||
hash is computed is constant. We achieve this by @emph{always} calling Racket's
|
||
hash function on the already-seen node from the hash table, even if was inserted
|
||
just now. To distinguish between the current node and the already-seen node from
|
||
the hash table, we remove the contents of the node's @tc[raw] field, and replace
|
||
them with a special marker.
|
||
|
||
@chunk[<node-hash>
|
||
(: node-hash (∀ (fieldᵢ/τ …)
|
||
(→ (node-id fieldᵢ/τ … Any Any) (→ Any Integer) Integer)))
|
||
(define (node-hash nd racket-recur-hash)
|
||
(if (eq? (raw-node-database ((struct-accessor node-id raw) nd))
|
||
'unique-copy)
|
||
<compute-hash>
|
||
<hash-init-table-and-recur>))]
|
||
|
||
When the node's @tc[raw] field does not indicate @tc['unique-copy], we first
|
||
initialise the hash table if needed, then recursively call
|
||
@tc[racket-recur-hash] on the unique copy of the node:
|
||
|
||
@chunk[<hash-init-table-and-recur>
|
||
(let ([seen-table (or (seen-nodes)
|
||
((inst make-hash (raw-node Any Any) Any)))])
|
||
(parameterize ([seen-nodes seen-table])
|
||
(racket-recur-hash (find-in-table seen-table nd))))]
|
||
|
||
To obtain the unique copy of the node, we look it up in the hash table, creating
|
||
it and adding it to the hash table if the current node is not already present
|
||
there:
|
||
|
||
@chunk[<find-in-table>
|
||
(: find-in-table (∀ (fieldᵢ/τ …)
|
||
(→ (HashTable (raw-node Any Any) Any)
|
||
(node-id fieldᵢ/τ … Any Any)
|
||
Any)))
|
||
(define (find-in-table seen-table nd)
|
||
(hash-ref! seen-table
|
||
((struct-accessor node-id raw) nd)
|
||
(λ () <make-unique-copy-node>)))]
|
||
|
||
To create a unique copy of the node, we create a new instance of the node's
|
||
struct, and copy over all the fields except for the @tc[raw] field, whose value
|
||
becomes @tc['unique-copy].
|
||
|
||
@chunk[<make-unique-copy-node>
|
||
((struct-constructor node-id) ((struct-accessor common-id fieldᵢ) nd)
|
||
…
|
||
(raw-node 'unique-copy 'unique-copy))]
|
||
|
||
The hash code is finally computed by combining the hash code for each field's
|
||
contents (after forcing it). The node's tag name is also hashed, and added to
|
||
the mix.
|
||
|
||
@chunk[<compute-hash>
|
||
(combine-hash-codes
|
||
(racket-recur-hash 'name)
|
||
(racket-recur-hash (force ((struct-accessor common-id fieldᵢ) nd)))
|
||
…)]
|
||
|
||
To combine hash codes, we simply compute their @elem[#:style 'tt]{xor}. Later
|
||
versions of this library may use more sophisticated mechanisms.
|
||
|
||
@chunk[<combine-hash-codes>
|
||
(: combine-hash-codes (→ Integer * Integer))
|
||
(define (combine-hash-codes . fixnums)
|
||
(apply bitwise-xor fixnums))]
|
||
|
||
@subsection{Caching node equality}
|
||
|
||
We provide a mechanism at run-time to cache the result of equality tests
|
||
within a limited dynamic scope. The graph generation procedure can coalesce
|
||
nodes which are @racket[equal?], which means that it needs to perform a
|
||
significant number of equality comparisons, and can therefore benefit from
|
||
caching the result of inner equality tests during the execution of the
|
||
coalescing operation.
|
||
|
||
@chunk[<equality-cache>
|
||
(define equality-cache
|
||
: (Parameterof (U #f (HashTable (Pairof (raw-node Any Any)
|
||
(raw-node Any Any))
|
||
Boolean)))
|
||
(make-parameter #f))]
|
||
|
||
The @racket[with-node-equality-cache] form executes its body while enabling
|
||
caching of the result of direct and recursive calls to @racket[equal?] on
|
||
nodes.
|
||
|
||
@chunk[<with-node-equality-cache>
|
||
(define-syntax-rule (with-node-equality-cache . body)
|
||
(parameterize ([equality-cache (or (equality-cache)
|
||
<make-equality-cache>)])
|
||
. body))]
|
||
|
||
If necessary, a new equality cache is created, unless
|
||
@racket[with-node-equality-cache] is used within the dynamic extent of another
|
||
use of itself.
|
||
|
||
@chunk[<make-equality-cache>
|
||
((inst make-hash (Pairof (raw-node Any Any) (raw-node Any Any)) Any))]
|
||
|
||
When comparing two nodes, we first check whether an equality cache is
|
||
installed. If so, we attempt to query the cache, and memoize the result of the
|
||
comparison when the pair of values is not already in the cache.
|
||
|
||
@chunk[<memoize-equality>
|
||
(λ (result-thunk)
|
||
(let ([e-cache (equality-cache)])
|
||
(if e-cache
|
||
(cond
|
||
[(hash-has-key? e-cache (cons a-raw b-raw))
|
||
(hash-ref e-cache (cons a-raw b-raw))]
|
||
[(hash-has-key? e-cache (cons b-raw a-raw))
|
||
(hash-ref e-cache (cons b-raw a-raw))]
|
||
[else
|
||
(let ([result (result-thunk)])
|
||
(hash-set! e-cache (cons a-raw b-raw) result)
|
||
result)])
|
||
(result-thunk))))]
|
||
|
||
@subsection{Comparing nodes for equality}
|
||
|
||
We implement equality following the same architecture as for hash codes, but
|
||
check that both nodes are already unique copies. In addition, the implementation
|
||
of @tc[equal?] checks that both values are of the node's type.
|
||
|
||
@chunk[<node-equal?>
|
||
(λ (a b racket-recur-equal?)
|
||
(and ((struct-predicate node-id) a)
|
||
((struct-predicate node-id) b)
|
||
(let ([a-raw ((struct-accessor node-id raw) a)]
|
||
[b-raw ((struct-accessor node-id raw) b)])
|
||
(if (and (eq? (raw-node-database a-raw) 'unique-copy)
|
||
(eq? (raw-node-database b-raw) 'unique-copy))
|
||
<compare>
|
||
(or (same-node? a b)
|
||
(<memoize-equality>
|
||
(λ () <equality-init-table-and-recur>)))))))]
|
||
|
||
When either or both of the node's @tc[raw] field do not indicate
|
||
@tc['unique-copy], we first initialise the hash table if needed, then
|
||
recursively call @tc[racket-recur-hash] on the unique copy of each node:
|
||
|
||
@chunk[<equality-init-table-and-recur>
|
||
(let ([seen-table (or (seen-nodes)
|
||
((inst make-hash (raw-node Any Any) Any)))])
|
||
(parameterize ([seen-nodes seen-table])
|
||
(racket-recur-equal? (find-in-table seen-table a)
|
||
(find-in-table seen-table b))))]
|
||
|
||
The nodes are compared pointwise, checking each pair of fields for equality,
|
||
after forcing both:
|
||
|
||
@chunk[<compare>
|
||
(and (racket-recur-equal? (force ((struct-accessor common-id fieldᵢ) a))
|
||
(force ((struct-accessor common-id fieldᵢ) b)))
|
||
…)]
|
||
|
||
@chunk[<*>
|
||
(require racket/promise
|
||
racket/string
|
||
racket/require
|
||
phc-toolkit
|
||
remember
|
||
typed-struct-props
|
||
(for-syntax racket/base
|
||
racket/syntax
|
||
racket/list
|
||
racket/set
|
||
racket/format
|
||
(subtract-in syntax/stx phc-toolkit/untyped)
|
||
syntax/parse
|
||
phc-toolkit/untyped))
|
||
|
||
(provide make-node-comparer
|
||
make-node-writer
|
||
raw-node
|
||
write-node-depth
|
||
with-node-equality-cache)
|
||
|
||
<equality-cache>
|
||
<with-node-equality-cache>
|
||
<seen-hash-table>
|
||
<write-node-depth>
|
||
<node-custom-write>
|
||
<raw-node>
|
||
<combine-hash-codes>
|
||
<node-equal+hash>]
|
||
|
||
@define[adams2008scheme-equality
|
||
(string-append "Efficient nondestructive equality checking for trees"
|
||
" and graphs, Adams and Dybvig, 2008")]
|
||
@bibliography[
|
||
@bib-entry[#:key adams2008scheme-equality
|
||
#:title @list{Efficient nondestructive equality checking for trees
|
||
and graphs in @emph{ACM Sigplan Notices} (Vol. 43, No. 9)
|
||
pp. 179–188}
|
||
#:date "2008"
|
||
#:author "Michael D. Adams and R. Kent Dybvig"
|
||
#:url "http://www.cs.indiana.edu/~dyb/pubs/equal.pdf"]] |