racket/pkgs/racket-doc/scribblings/reference/hashes.scrbl
Matthew Flatt 3fbb384604 add hash-keys-subset?
This function exposes the fast subset operation that is built in for
immutable hash tables (and used by the set-of-scopes implementation).

Also, make the space optimization implicit for `eq?`-based hash tables
that contain only #t values (instead of explicit and only available
internally). It turns out to be easy and efficient to make the
representation automatic, because the HAMT implementation can support
a mixture of nodes with some containing explicit values and some
containing implicit #t values.
2016-06-30 08:22:18 -06:00

634 lines
24 KiB
Racket

#lang scribble/doc
@(require "mz.rkt")
@title[#:tag "hashtables"]{Hash Tables}
@(define (concurrency-caveat)
@elemref['(caveat "concurrency")]{caveats concerning concurrent modification})
@(define (mutable-key-caveat)
@elemref['(caveat "mutable-keys")]{caveat concerning mutable keys})
@(define (see-also-caveats)
@t{See also the @concurrency-caveat[] and the @mutable-key-caveat[] above.})
@(define (see-also-concurrency-caveat)
@t{See also the @concurrency-caveat[] above.})
@(define (see-also-mutable-key-caveat)
@t{See also the @mutable-key-caveat[] above.})
@guideintro["hash-tables"]{hash tables}
A @deftech{hash table} (or simply @deftech{hash}) maps each of its
keys to a single value. For a given hash table, keys are equivalent
via @racket[equal?], @racket[eqv?], or @racket[eq?], and keys are
retained either strongly or weakly (see @secref["weakbox"]). A hash
table is also either mutable or immutable. Immutable hash tables
support effectively constant-time access and update, just like mutable
hash tables; the constant on immutable operations is usually larger,
but the functional nature of immutable hash tables can pay off in
certain algorithms.
@margin-note{Immutable hash tables actually provide @math{O(log N)}
access and update. Since @math{N} is limited by the address space so
that @math{log N} is limited to less than 30 or 62 (depending on the
platform), @math{log N} can be treated reasonably as a constant.}
For @racket[equal?]-based hashing, the built-in hash functions on
@tech{strings}, @tech{pairs}, @tech{lists}, @tech{vectors},
@tech{prefab} or transparent @tech{structures}, @|etc|, take time
proportional to the size of the value. The hash code for a compound
data structure, such as a list or vector, depends on hashing each item
of the container, but the depth of such recursive hashing is
limited (to avoid potential problems with cyclic data). For a
non-@tech{list} @tech{pair}, both @racket[car] and @racket[cdr]
hashing is treated as a deeper hash, but the @racket[cdr] of a
@tech{list} is treated as having the same hashing depth as the list.
A hash table can be used as a two-valued @tech{sequence} (see
@secref["sequences"]). The keys and values of the hash table serve as
elements of the sequence (i.e., each element is a key and its
associated value). If a mapping is added to or removed from the hash
table during iteration, then an iteration step may fail with
@racket[exn:fail:contract], or the iteration may skip or duplicate
keys and values. See also @racket[in-hash], @racket[in-hash-keys],
@racket[in-hash-values], and @racket[in-hash-pairs].
Two hash tables cannot be @racket[equal?] unless they use the same
key-comparison procedure (@racket[equal?], @racket[eqv?], or
@racket[eq?]), both hold keys strongly or weakly, and have the same
mutability.
@elemtag['(caveat "concurrency")]{@bold{Caveats concerning concurrent
modification:}} A mutable hash table can be manipulated with
@racket[hash-ref], @racket[hash-set!], and @racket[hash-remove!]
concurrently by multiple threads, and the operations are protected by
a table-specific semaphore as needed. Three caveats apply, however:
@itemize[
@item{If a thread is terminated while applying @racket[hash-ref],
@racket[hash-set!], @racket[hash-remove!], @racket[hash-ref!],
or @racket[hash-update!] to a hash table that
uses @racket[equal?] or @racket[eqv?] key comparisons, all current
and future operations on the hash table may block indefinitely.}
@item{The @racket[hash-map], @racket[hash-for-each], and @racket[hash-clear!] procedures do
not use the table's semaphore to guard the traversal as a whole.
Changes by one thread to a hash table can affect the keys and values
seen by another thread part-way through its traversal of the same
hash table.}
@item{The @racket[hash-update!] and @racket[hash-ref!] functions
use a table's semaphore
independently for the @racket[hash-ref] and @racket[hash-set!] parts
of their functionality, which means that the update as a whole is not
``atomic.''}
]
@elemtag['(caveat "mutable-keys")]{@bold{Caveat concerning mutable
keys:}} If a key in an @racket[equal?]-based hash table is mutated
(e.g., a key string is modified with @racket[string-set!]), then the
hash table's behavior for insertion and lookup operations becomes
unpredictable.
A literal or printed hash table starts with @litchar{#hash},
@litchar{#hasheqv}, or
@litchar{#hasheq}. @see-read-print["hashtable"]{hash tables}
@defproc[(hash? [v any/c]) boolean?]{
Returns @racket[#t] if @racket[v] is a @tech{hash table}, @racket[#f]
otherwise.}
@defproc[(hash-equal? [hash hash?]) boolean?]{
Returns @racket[#t] if @racket[hash] compares keys with @racket[equal?],
@racket[#f] if it compares with @racket[eq?] or @racket[eqv?].}
@defproc[(hash-eqv? [hash hash?]) boolean?]{
Returns @racket[#t] if @racket[hash] compares keys with @racket[eqv?],
@racket[#f] if it compares with @racket[equal?] or @racket[eq?].}
@defproc[(hash-eq? [hash hash?]) boolean?]{
Returns @racket[#t] if @racket[hash] compares keys with @racket[eq?],
@racket[#f] if it compares with @racket[equal?] or @racket[eqv?].}
@defproc[(hash-weak? [hash hash?]) boolean?]{
Returns @racket[#t] if @racket[hash] retains its keys weakly,
@racket[#f] if it retains keys strongly.}
@deftogether[(
@defproc[(hash [key any/c] [val any/c] ... ...) (and/c hash? hash-equal? immutable?)]
@defproc[(hasheq [key any/c] [val any/c] ... ...) (and/c hash? hash-eq? immutable?)]
@defproc[(hasheqv [key any/c] [val any/c] ... ...) (and/c hash? hash-eqv? immutable?)]
)]{
Creates an immutable hash table with each given @racket[key] mapped to
the following @racket[val]; each @racket[key] must have a @racket[val],
so the total number of arguments to @racket[hash] must be even.
The @racket[hash] procedure creates a table where keys are compared
with @racket[equal?], @racket[hasheq] procedure creates a table where
keys are compared with @racket[eq?], and @racket[hasheqv] procedure
creates a table where keys are compared with @racket[eqv?].
The @racket[key] to @racket[val] mappings are added to the table in
the order that they appear in the argument list, so later mappings can
hide earlier mappings if the @racket[key]s are equal.}
@deftogether[(
@defproc[(make-hash [assocs (listof pair?) null]) (and/c hash? hash-equal?)]
@defproc[(make-hasheqv [assocs (listof pair?) null]) (and/c hash? hash-eqv?)]
@defproc[(make-hasheq [assocs (listof pair?) null]) (and/c hash? hash-eq?)]
)]{
Creates a mutable hash table that holds keys strongly.
The @racket[make-hash] procedure creates a table where keys are
compared with @racket[equal?], @racket[make-hasheq] procedure creates
a table where keys are compared with @racket[eq?], and
@racket[make-hasheqv] procedure creates a table where keys are
compared with @racket[eqv?].
The table is initialized with the content of @racket[assocs]. In each
element of @racket[assocs], the @racket[car] is a key, and the
@racket[cdr] is the corresponding value. The mappings are added to the
table in the order that they appear in @racket[assocs], so later
mappings can hide earlier mappings.
See also @racket[make-custom-hash].}
@deftogether[(
@defproc[(make-weak-hash [assocs (listof pair?) null]) (and/c hash? hash-equal? hash-weak?)]
@defproc[(make-weak-hasheqv [assocs (listof pair?) null]) (and/c hash? hash-eqv? hash-weak?)]
@defproc[(make-weak-hasheq [assocs (listof pair?) null]) (and/c hash? hash-eq? hash-weak?)]
)]{
Like @racket[make-hash], @racket[make-hasheq], and
@racket[make-hasheqv], but creates a mutable hash table that holds
keys weakly.
Beware that values in the table are retained normally. If a value in
the table refers back to its key, then the table will retain the value
and therefore the key; the mapping will never be removed from the
table even if the key becomes otherwise inaccessible. To avoid that
problem, instead of mapping the key to the value, map the key to an
@tech{ephemeron} that pairs the key and value.}
@deftogether[(
@defproc[(make-immutable-hash [assocs (listof pair?) null])
(and/c hash? hash-equal? immutable?)]
@defproc[(make-immutable-hasheqv [assocs (listof pair?) null])
(and/c hash? hash-eqv? immutable?)]
@defproc[(make-immutable-hasheq [assocs (listof pair?) null])
(and/c hash? hash-eq? immutable?)]
)]{
Like @racket[hash], @racket[hasheq], and @racket[hasheqv], but accepts
the key--value mapping in association-list form like
@racket[make-hash], @racket[make-hasheq], and @racket[make-hasheqv].}
@defproc[(hash-set! [hash (and/c hash? (not/c immutable?))]
[key any/c]
[v any/c]) void?]{
Maps @racket[key] to @racket[v] in @racket[hash], overwriting
any existing mapping for @racket[key].
@see-also-caveats[]}
@defproc[(hash-set*! [hash (and/c hash? (not/c immutable?))]
[key any/c]
[v any/c]
...
...) void?]{
Maps each @racket[key] to each @racket[v] in @racket[hash], overwriting
any existing mapping for each @racket[key]. Mappings are added from the left, so
later mappings overwrite earlier mappings.
@see-also-caveats[]}
@defproc[(hash-set [hash (and/c hash? immutable?)]
[key any/c]
[v any/c])
(and/c hash? immutable?)]{
Functionally extends @racket[hash] by mapping @racket[key] to
@racket[v], overwriting any existing mapping for @racket[key], and
returning the extended hash table.
@see-also-mutable-key-caveat[]}
@defproc[(hash-set* [hash (and/c hash? immutable?)]
[key any/c]
[v any/c]
...
...)
(and/c hash? immutable?)]{
Functionally extends @racket[hash] by mapping each @racket[key] to
@racket[v], overwriting any existing mapping for each @racket[key], and
returning the extended hash table. Mappings are added from the left, so
later mappings overwrite earlier mappings.
@see-also-mutable-key-caveat[]}
@defproc[(hash-ref [hash hash?]
[key any/c]
[failure-result (failure-result/c any/c)
(lambda ()
(raise (make-exn:fail:contract ....)))])
any]{
Returns the value for @racket[key] in @racket[hash]. If no value
is found for @racket[key], then @racket[failure-result] determines the
result:
@itemize[
@item{If @racket[failure-result] is a procedure, it is called
(through a tail call) with no arguments to produce the result.}
@item{Otherwise, @racket[failure-result] is returned as the result.}
]
@see-also-caveats[]}
@defproc[(hash-ref! [hash hash?] [key any/c] [to-set any/c])
any]{
Returns the value for @racket[key] in @racket[hash]. If no value is
found for @racket[key], then @racket[to-set] determines the result as
in @racket[hash-ref] (i.e., it is either a thunk that computes a value
or a plain value), and this result is stored in @racket[hash] for the
@racket[key]. (Note that if @racket[to-set] is a thunk, it is not
invoked in tail position.)
@see-also-caveats[]}
@defproc[(hash-has-key? [hash hash?] [key any/c])
boolean?]{
Returns @racket[#t] if @racket[hash] contains a value for the given
@racket[key], @racket[#f] otherwise.}
@defproc[(hash-update! [hash (and/c hash? (not/c immutable?))]
[key any/c]
[updater (any/c . -> . any/c)]
[failure-result (failure-result/c any/c)
(lambda ()
(raise (make-exn:fail:contract ....)))])
void?]{
Composes @racket[hash-ref] and @racket[hash-set!] to update an
existing mapping in @racket[hash], where the optional
@racket[failure-result] argument is used as in @racket[hash-ref] when
no mapping exists for @racket[key] already. See the caveat above about
concurrent updates.
@see-also-caveats[]}
@defproc[(hash-update [hash (and/c hash? immutable?)]
[key any/c]
[updater (any/c . -> . any/c)]
[failure-result (failure-result/c any/c)
(lambda ()
(raise (make-exn:fail:contract ....)))])
(and/c hash? immutable?)]{
Composes @racket[hash-ref] and @racket[hash-set] to functionally
update an existing mapping in @racket[hash], where the optional
@racket[failure-result] argument is used as in @racket[hash-ref] when
no mapping exists for @racket[key] already.
@see-also-mutable-key-caveat[]}
@defproc[(hash-remove! [hash (and/c hash? (not/c immutable?))]
[key any/c])
void?]{
Removes any existing mapping for @racket[key] in @racket[hash].
@see-also-caveats[]}
@defproc[(hash-remove [hash (and/c hash? immutable?)]
[key any/c])
(and/c hash? immutable?)]{
Functionally removes any existing mapping for @racket[key] in
@racket[hash], returning the fresh hash table.
@see-also-mutable-key-caveat[]}
@defproc[(hash-clear! [hash (and/c hash? (not/c immutable?))])
void?]{
Removes all mappings from @racket[hash].
If @racket[hash] is not an @tech{impersonator}, then all mappings are
removed in constant time. If @racket[hash] is an @tech{impersonator},
then each key is removed one-by-one using @racket[hash-remove!].
@see-also-caveats[]}
@defproc[(hash-clear [hash (and/c hash? immutable?)])
(and/c hash? immutable?)]{
Functionally removes all mappings from @racket[hash].
If @racket[hash] is not a @tech{chaperone}, then clearing is
equivalent to creating a new @tech{hash table}, and the operation is
performed in constant time. If @racket[hash] is a @tech{chaperone},
then each key is removed one-by-one using @racket[hash-remove].}
@defproc[(hash-copy-clear [hash hash?]) hash?]{
Produces an empty @tech{hash table} with the same key-comparison
procedure and mutability of @racket[hash].}
@defproc[(hash-map [hash hash?]
[proc (any/c any/c . -> . any/c)]
[try-order? any/c #f])
(listof any/c)]{
Applies the procedure @racket[proc] to each element in
@racket[hash] in an unspecified order, accumulating the results
into a list. The procedure @racket[proc] is called each time with a
key and its value, and the procedure's individual results appear in
order in the result list.
If a hash table is extended with new keys (either through
@racket[proc] or by another thread) while a @racket[hash-map] or
@racket[hash-for-each] traversal is in process, arbitrary key--value
pairs can be dropped or duplicated in the traversal. Key mappings can
be deleted or remapped (by any thread) with no adverse affects; the
change does not affect a traversal if the key has been seen already,
otherwise the traversal skips a deleted key or uses the remapped key's
new value.
If @racket[try-order?] is true, then the order of keys and values
passed to @racket[proc] is normalized under certain circumstances,
such as when the keys are all symbols and @racket[hash] is not an
@tech{impersonator}.
@see-also-concurrency-caveat[]
@history[#:changed "6.3" @elem{Added the @racket[try-order?] argument.}]}
@defproc[(hash-keys [hash hash?])
(listof any/c)]{
Returns a list of the keys of @racket[hash] in an unspecified order.
See @racket[hash-map] for information about modifying @racket[hash]
during @racket[hash-keys]. @see-also-concurrency-caveat[]}
@defproc[(hash-values [hash hash?])
(listof any/c)]{
Returns a list of the values of @racket[hash] in an unspecified order.
See @racket[hash-map] for information about modifying @racket[hash]
during @racket[hash-values]. @see-also-concurrency-caveat[]}
@defproc[(hash->list [hash hash?])
(listof (cons/c any/c any/c))]{
Returns a list of the key--value pairs of @racket[hash] in an unspecified order.
See @racket[hash-map] for information about modifying @racket[hash]
during @racket[hash->list]. @see-also-concurrency-caveat[]}
@defproc[(hash-keys-subset? [hash1 hash?] [hash2 hash?])
boolean?]{
Returns @racket[#t] if the keys of @racket[hash1] are a subset of or
the same as the keys of @racket[hash2]. The hash tables must both use
the same key-comparison function (@racket[equal?], @racket[eqv?], or
@racket[eq?]), otherwise the @exnraise[exn:fail:contract].
Using @racket[hash-keys-subset?] on immutable hash tables can be much
faster than iterating through the keys of @racket[hash1] to make sure
that each is in @racket[hash2].
@history[#:added "6.5.0.8"]}
@defproc[(hash-for-each [hash hash?]
[proc (any/c any/c . -> . any)]
[try-order? any/c #f])
void?]{
Applies @racket[proc] to each element in @racket[hash] (for the
side-effects of @racket[proc]) in an unspecified order. The procedure
@racket[proc] is called each time with a key and its value.
See @racket[hash-map] for information about @racket[try-order?] and
about modifying @racket[hash] within @racket[proc].
@see-also-concurrency-caveat[]
@history[#:changed "6.3" @elem{Added the @racket[try-order?] argument.}]}
@defproc[(hash-count [hash hash?])
exact-nonnegative-integer?]{
Returns the number of keys mapped by @racket[hash]. Unless @racket[hash]
retains keys weakly, the result is computed in
constant time and atomically. If @racket[hash] retains it keys weakly, a
traversal is required to count the keys.}
@defproc[(hash-empty? [hash hash?]) boolean?]{
Equivalent to @racket[(zero? (hash-count hash))].}
@defproc[(hash-iterate-first [hash hash?])
(or/c #f exact-nonnegative-integer?)]{
Returns @racket[#f] if @racket[hash] contains no elements, otherwise
it returns an integer that is an index to the first element in the hash
table; ``first'' refers to an unspecified ordering of the table
elements, and the index values are not necessarily consecutive
integers. For a mutable @racket[hash], this index is guaranteed to
refer to the first item only as long as no items are added to or
removed from @racket[hash].}
@defproc[(hash-iterate-next [hash hash?]
[pos exact-nonnegative-integer?])
(or/c #f exact-nonnegative-integer?)]{
Returns either an integer that is an index to the element in
@racket[hash] after the element indexed by @racket[pos] (which is not
necessarily one more than @racket[pos]) or @racket[#f] if @racket[pos]
refers to the last element in @racket[hash]. If @racket[pos] is not a
valid index, then the @exnraise[exn:fail:contract]. For a mutable
@racket[hash], the result index is guaranteed to refer to its item
only as long as no items are added to or removed from @racket[hash].}
@defproc[(hash-iterate-key [hash hash?]
[pos exact-nonnegative-integer?])
any]{
Returns the key for the element in @racket[hash] at index
@racket[pos]. If @racket[pos] is not a valid index for
@racket[hash], the @exnraise[exn:fail:contract].}
@defproc[(hash-iterate-value [hash hash?]
[pos exact-nonnegative-integer?])
any]{
Returns the value for the element in @racket[hash] at index
@racket[pos]. If @racket[pos] is not a valid index for
@racket[hash], the @exnraise[exn:fail:contract].}
@defproc[(hash-iterate-pair [hash hash?]
[pos exact-nonnegative-integer?])
(cons any any)]{
Returns a pair containing the key and value for the element
in @racket[hash] at index
@racket[pos]. If @racket[pos] is not a valid index for
@racket[hash], the @exnraise[exn:fail:contract].}
@history[#:added "6.4.0.5"]
@defproc[(hash-iterate-key+value [hash hash?]
[pos exact-nonnegative-integer?])
(values any any)]{
Returns the key and value for the element in @racket[hash] at index
@racket[pos]. If @racket[pos] is not a valid index for
@racket[hash], the @exnraise[exn:fail:contract].}
@history[#:added "6.4.0.5"]
@defproc[(hash-copy [hash hash?])
(and/c hash? (not/c immutable?))]{
Returns a mutable hash table with the same mappings, same
key-comparison mode, and same key-holding strength as @racket[hash].}
@defproc[(eq-hash-code [v any/c]) fixnum?]{
Returns a @tech{fixnum}; for any two calls with @racket[eq?] values,
the returned number is the same.
@margin-note{Equal @tech{fixnums} are always @racket[eq?].}}
@defproc[(eqv-hash-code [v any/c]) fixnum?]{
Returns a @tech{fixnum}; for any two calls with @racket[eqv?] values,
the returned number is the same.}
@defproc[(equal-hash-code [v any/c]) fixnum?]{
Returns a @tech{fixnum}; for any two calls with @racket[equal?] values,
the returned number is the same. A hash code is computed even when
@racket[v] contains a cycle through pairs, vectors, boxes, and/or
inspectable structure fields. See also @racket[gen:equal+hash].
For any @racket[v] that could be produced by @racket[read], if
@racket[v2] is produced by @racket[read] for the same input
characters, the @racket[(equal-hash-code v)] is the same as
@racket[(equal-hash-code v2)] --- even if @racket[v] and @racket[v2]
do not exist at the same time (and therefore could not be compared by
calling @racket[equal?]).
@history[#:changed "6.4.0.12" @elem{Strengthened guarantee for @racket[read]able values.}]}
@defproc[(equal-secondary-hash-code [v any/c]) fixnum?]{
Like @racket[equal-hash-code], but computes a secondary value suitable
for use in double hashing.}
@;------------------------------------------------------------------------
@section{Additional Hash Table Functions}
@note-lib-only[racket/hash]
@(require (for-label racket/hash))
@(define the-eval (make-base-eval))
@(the-eval '(require racket/hash))
@defproc[(hash-union [h0 (and/c hash? hash-can-functional-set?)]
[h hash?] ...
[#:combine combine
(-> any/c any/c any/c)
(lambda _ (error 'hash-union ....))]
[#:combine/key combine/key
(-> any/c any/c any/c any/c)
(lambda (k a b) (combine a b))])
(and/c hash? hash-can-functional-set?)]{
Computes the union of @racket[h0] with each hash table @racket[h] by functional
update, adding each element of each @racket[h] to @racket[h0] in turn. For each
key @racket[k] and value @racket[v], if a mapping from @racket[k] to some value
@racket[v0] already exists, it is replaced with a mapping from @racket[k] to
@racket[(combine/key k v0 v)].
@examples[
#:eval the-eval
(hash-union (make-immutable-hash '([1 . one]))
(make-immutable-hash '([2 . two]))
(make-immutable-hash '([3 . three])))
(hash-union (make-immutable-hash '([1 . (one uno)] [2 . (two dos)]))
(make-immutable-hash '([1 . (ein une)] [2 . (zwei deux)]))
#:combine/key (lambda (k v1 v2) (append v1 v2)))
]
}
@defproc[(hash-union! [h0 (and/c hash? hash-mutable?)]
[h hash?] ...
[#:combine combine
(-> any/c any/c any/c)
(lambda _ (error 'hash-union ....))]
[#:combine/key combine/key
(-> any/c any/c any/c any/c)
(lambda (k a b) (combine a b))])
void?]{
Computes the union of @racket[h0] with each hash table @racket[h] by mutable
update, adding each element of each @racket[h] to @racket[h0] in turn. For each
key @racket[k] and value @racket[v], if a mapping from @racket[k] to some value
@racket[v0] already exists, it is replaced with a mapping from @racket[k] to
@racket[(combine/key k v0 v)].
@examples[
#:eval the-eval
(define h (make-hash))
h
(hash-union! h (make-immutable-hash '([1 . (one uno)] [2 . (two dos)])))
h
(hash-union! h
(make-immutable-hash '([1 . (ein une)] [2 . (zwei deux)]))
#:combine/key (lambda (k v1 v2) (append v1 v2)))
h
]
}
@(close-eval the-eval)