correct some hash-table locking cases

BC did not take a lock for `hash-clear!`. Maybe the intent was that a lock isn't needed from the perspective of `hash-clear!` when it isn't traversing the table, but failing to take a lock before modifying the table can break other operations (that do take a lock) in progress. Furthermore, some iterations that intentionally did not take a lock also did not guard against changes well enough. Various repairs here avoid crashes, and now `hash-clear!` consistently takes a lock. CS had a similar (but more subtle and more limited) instance of the problem in `hash-map` and `hash-for-each`. That problem is corrected (without adding any new lock acquisitions). The existing CS implementation of `equal-hash-code` for hash tables takes a lock more than the BC implementation. A possibly surprising result: when attempting to add a mutable hash table to itself as part of a key, the `hash-set!` can block forever on the hash table's lock, instead of doing something more random as a result of having a mutated key. The documentation now notes that possibility. Related to #3738
2021-03-21 12:39:12 -06:00 · 2021-03-21 12:39:12 -06:00 · ef0ff679d0
commit ef0ff679d0
parent bcd16840da
7 changed files with 142 additions and 37 deletions
--- a/pkgs/racket-doc/scribblings/reference/hashes.scrbl
+++ b/pkgs/racket-doc/scribblings/reference/hashes.scrbl
@ -3,11 +3,6 @@

@title[#:tag "hashtables"]{Hash Tables}

-@(define (concurrency-caveat)
-  @elemref['(caveat "concurrency")]{caveats concerning concurrent modification})
-@(define (mutable-key-caveat)
-  @elemref['(caveat "mutable-keys")]{caveat concerning mutable keys})
-
@(define (see-also-caveats)
   @t{See also the @concurrency-caveat[] and the @mutable-key-caveat[] above.})
@(define (see-also-concurrency-caveat)
@ -67,18 +62,20 @@ when they are @racket[equal?].
 modification:}} A mutable hash table can be manipulated with
@racket[hash-ref], @racket[hash-set!], and @racket[hash-remove!]
 concurrently by multiple threads, and the operations are protected by
-a table-specific semaphore as needed. Three caveats apply, however:
+a table-specific semaphore as needed. Several caveats apply, however:

 @itemize[

  @item{If a thread is terminated while applying @racket[hash-ref],
  @racket[hash-ref-key], @racket[hash-set!], @racket[hash-remove!],
-  @racket[hash-ref!], or @racket[hash-update!] to a hash table that
+  @racket[hash-ref!], @racket[hash-update!], or @racket[hash-clear!]
+  to a hash table that
  uses @racket[equal?] or @racket[eqv?] key comparisons, all current
  and future operations on the hash table may block indefinitely.}

  @item{The @racket[hash-map], @racket[hash-for-each], and @racket[hash-clear!] procedures do
-  not use the table's semaphore to guard the traversal as a whole.
+  not use the table's semaphore to guard the traversal as a whole
+  (if a traversal is needed, in the case of @racket[hash-clear!]).
  Changes by one thread to a hash table can affect the keys and values
  seen by another thread part-way through its traversal of the same
  hash table.}
@ -89,6 +86,13 @@ a table-specific semaphore as needed. Three caveats apply, however:
 of their functionality, which means that the update as a whole is not
 ``atomic.''}

+ @item{Adding a mutable hash table as a key in itself is trouble on
+  the grounds that the key is being mutated (see the caveat below),
+  but it is also a kind of concurrent use of the hash table: computing
+  a hash table's hash code may require waiting on the table's
+  semaphore, but the semaphore is already held for modifying the hash
+  table, so the hash-table addition can block indefinitely.}
+
 ]

@elemtag['(caveat "mutable-keys")]{@bold{Caveat concerning mutable
--- a/pkgs/racket-doc/scribblings/reference/mz.rkt
+++ b/pkgs/racket-doc/scribblings/reference/mz.rkt
@ -175,3 +175,10 @@
 (provide envvar-indexed)
 (define (envvar-indexed s)
  (as-index (envvar s)))
+
+(provide concurrency-caveat
+         mutable-key-caveat)
+@(define (concurrency-caveat)
+  @elemref['(caveat "concurrency")]{caveats concerning concurrent modification})
+@(define (mutable-key-caveat)
+  @elemref['(caveat "mutable-keys")]{caveat concerning mutable keys})
--- a/pkgs/racket-doc/scribblings/reference/sets.scrbl
+++ b/pkgs/racket-doc/scribblings/reference/sets.scrbl
@ -5,6 +5,11 @@
@(define set-eval (make-base-eval))
@examples[#:hidden #:eval set-eval (require racket/set)]

+@(define (hash-set-caveats)
+   @elem{For @tech{hash sets}, see also the @concurrency-caveat[]
+         for hash tables, which applies to
+         hash sets.})
+
 A @deftech{set} represents a collection of distinct elements.  The following
 datatypes are all sets:

@ -354,7 +359,8 @@ Produces a set that includes @racket[v] plus all elements of
 Adds the element @racket[v] to @racket[st].  This operation runs in constant
 time for @tech{hash sets}. Has no fallback.

-}
+@hash-set-caveats[]}
+

@defproc[(set-remove [st generic-set?] [v any/c]) generic-set?]{

@ -368,7 +374,8 @@ Produces a set that includes all elements of @racket[st] except
 Removes the element @racket[v] from @racket[st].  This operation runs in constant
 time for @tech{hash sets}. Has no fallback.

-}
+@hash-set-caveats[]}
+

@defproc[(set-empty? [st generic-set?]) boolean?]{

@ -462,7 +469,8 @@ Removes all elements from @racket[st].
 Supported for any @racket[st] that @impl{implements} @racket[set-remove!] and either
@supp{supports} @racket[set->stream] or @impl{implements} @racket[set-first] and either @racket[set-count] or @racket[set-empty?].

-}
+@hash-set-caveats[]}
+

@defproc[(set-union [st0 generic-set?] [st generic-set?] ...) generic-set?]{

@ -506,7 +514,7 @@ total size of the @racket[st]s.

 Supported for any @racket[st] that @impl{implements} @racket[set-add!] and @supp{supports} @racket[set->stream].

-}
+@hash-set-caveats[]}

@defproc[(set-intersect [st0 generic-set?] [st generic-set?] ...) generic-set?]{

@ -541,7 +549,8 @@ size of @racket[st0].

 Supported for any @racket[st] that @impl{implements} @racket[set-remove!] and @supp{supports} @racket[set->stream].

-}
+@hash-set-caveats[]}
+

@defproc[(set-subtract [st0 generic-set?] [st generic-set?] ...) generic-set?]{

@ -576,7 +585,8 @@ size of @racket[st0].

 Supported for any @racket[st] that @impl{implements} @racket[set-remove!] and @supp{supports} @racket[set->stream].

-}
+@hash-set-caveats[]}
+

@defproc[(set-symmetric-difference [st0 generic-set?] [st generic-set?] ...) generic-set?]{

@ -616,7 +626,8 @@ total size of the @racket[st]s.

 Supported for any @racket[st] that @impl{implements} @racket[set-remove!] and @supp{supports} @racket[set->stream].

-}
+@hash-set-caveats[]}
+

@defproc[(set=? [st generic-set?] [st2 generic-set?]) boolean?]{

--- a/pkgs/racket-test-core/tests/racket/hash.rktl
+++ b/pkgs/racket-test-core/tests/racket/hash.rktl
@ -704,4 +704,60 @@

 ;; ----------------------------------------

+(for ([make-hash (in-list (list make-hash make-weak-hash make-ephemeron-hash))]
+      [hash-clear! (in-list (list hash-clear!
+                                  (lambda (ht)
+                                    (hash-for-each ht (lambda (k v) (hash-remove! ht k))))))]
+      [op (in-list (list
+                    (lambda (ht ht2) (hash-set! ht ht #t))
+                    (lambda (ht ht2) (equal? ht ht2))
+                    (lambda (ht ht2) (equal-hash-code ht))
+                    (lambda (ht ht2) (equal-secondary-hash-code ht))
+                    (lambda (ht ht2) (hash-map ht (lambda (k v) (hash-clear! ht) k)))
+                    (lambda (ht ht2) (hash-for-each ht (lambda (k v) (hash-clear! ht) k)))))])
+  (define amok? #f)
+
+  (define ht (make-hash))
+  (define ht2 (make-hash))
+
+  (struct a (x)
+    #:property prop:equal+hash (list (lambda (a1 a2 eql?)
+                                       (when amok?
+                                         (hash-clear! ht))
+                                       (eql? (a-x a1) (a-x a2)))
+                                     (lambda (a1 hc)
+                                       (when amok?
+                                         (hash-clear! ht))
+                                       (a-x a1))
+                                     (lambda (a2 hc)
+                                       (when amok?
+                                         (hash-clear! ht))
+                                       (a-x a2))))
+
+  (define saved null)
+  (define (save v)
+    (set! saved (cons v saved))
+    v)
+
+  (for ([i (in-range 1000)])
+    (hash-set! ht (save (a i)) #t)
+    (hash-set! ht2 (save (a i)) #t))
+
+  (set! amok? #t)
+
+  ;; This operation can get stuck or raise an exception,
+  ;; but it should not crash
+  (let* ([fail? #f]
+         [t (thread
+             (lambda ()
+               (with-handlers ([exn:fail:contract? void]
+                               [exn:fail? (lambda (x)
+                                            (set! fail? #t)
+                                            (raise x))])
+                 (op ht ht2))))])
+    (sync (system-idle-evt))
+    (test #f `(no-crash? ,op) fail?)))
+
+;; ----------------------------------------
+
 (report-errs)
--- a/racket/src/bc/src/hash.c
+++ b/racket/src/bc/src/hash.c
@ -647,6 +647,9 @@ int scheme_hash_table_equal_rec(Scheme_Hash_Table *t1, Scheme_Object *orig_t1,
 	return 0;
      if (!scheme_recur_equal(val1, val2, eql))
 	return 0;
+
+      /* since we didn't take a lock, the size could have changed */
+      if (i > t1->size) i = t1->size;
    }
  }

@ -1823,6 +1826,8 @@ static uintptr_t equal_hash_key(Scheme_Object *o, uintptr_t k, Hash_Info *hi)
          MZ_MIX(vk);
          k += vk;  /* can't mix k, because the key order shouldn't matter */
          hi->depth = old_depth; /* also needed to avoid order-sensitivity */
+          /* since we didn't take a lock, the size could have changed */
+          if (i > ht->size) i = ht->size;
 	}
      }
      
@ -1904,6 +1909,8 @@ static uintptr_t equal_hash_key(Scheme_Object *o, uintptr_t k, Hash_Info *hi)
            MZ_MIX(vk);
            k += vk; /* can't mix k, because the key order shouldn't matter */
            hi->depth = old_depth; /* also needed to avoid order-sensitivity */
+            /* since we didn't take a lock, the size could have changed */
+            if (i > ht->size) i = ht->size;
 	  }
 	}
      }
@ -2313,6 +2320,8 @@ static uintptr_t equal_hash_key2(Scheme_Object *o, Hash_Info *hi)
 	  k += equal_hash_key2(key, hi);
 	  k += equal_hash_key2(val, hi);
          hi->depth = old_depth;
+          /* since we didn't take a lock, the size could have changed */
+          if (i > ht->size) i = ht->size;
 	}
      }
      
@ -2392,6 +2401,8 @@ static uintptr_t equal_hash_key2(Scheme_Object *o, Hash_Info *hi)
 	    k += equal_hash_key2(key, hi);
            hi->depth = old_depth;
 	  }
+          /* since we didn't take a lock, the size could have changed */
+          if (i > ht->size) i = ht->size;
 	}
      }
    
--- a/racket/src/bc/src/list.c
+++ b/racket/src/bc/src/list.c
@ -2949,9 +2949,15 @@ static Scheme_Object *hash_table_clear_bang(int argc, Scheme_Object *argv[])
  }

  if (SCHEME_BUCKTP(v)) {
-    scheme_clear_bucket_table((Scheme_Bucket_Table *)v);
+    Scheme_Bucket_Table *t = (Scheme_Bucket_Table *)v;
+    if (t->mutex) scheme_wait_sema(t->mutex, 0);
+    scheme_clear_bucket_table(t);
+    if (t->mutex) scheme_post_sema(t->mutex);
  } else{
-    scheme_clear_hash_table((Scheme_Hash_Table *)v);
+    Scheme_Hash_Table *t = (Scheme_Hash_Table *)v;
+    if (t->mutex) scheme_wait_sema(t->mutex, 0);
+    scheme_clear_hash_table(t);
+    if (t->mutex) scheme_post_sema(t->mutex);
  }

  return scheme_void;
@ -3100,6 +3106,8 @@ static Scheme_Object *do_map_hash_table(int argc,
          } else
            _scheme_apply_multi(f, 2, p);
        }
+        /* since we didn't take a lock, the size could have changed */
+        if (i > hash->size) i = hash->size;
      }
    }
  } else if (SCHEME_HASHTP(obj)) {
@ -3132,6 +3140,8 @@ static Scheme_Object *do_map_hash_table(int argc,
          } else
            _scheme_apply_multi(f, 2, p);
        }
+        /* since we didn't take a lock, the size could have changed */
+        if (i > hash->size) i = hash->size;
      }
    }
  } else {
--- a/racket/src/cs/rumble/hash.ss
+++ b/racket/src/cs/rumble/hash.ss
@ -470,26 +470,32 @@
  ;; of calling `hash-iterate-...` for each step
  (let vec-loop ([old-n 0] [try? #t])
    (let ([vec (prepare-iterate! ht old-n)])
-      (let loop ([i old-n])
-        (cond
-         [(= i (#%vector-length vec))
-          (if try?
-              (vec-loop i (> i  old-n))
-              (if map? '() (void)))]
-         [else
-          (let ([p (#%vector-ref vec i)])
-            (let ([key (car p)]
-                  [val (cdr p)])
-              (cond
-               [(or (eq? key #!bwp)
-                    (eq? val #!bwp))
-                (loop (fx+ i 1))]
-               [map?
-                (cons (|#%app| proc key val)
-                      (loop (fx+ i 1)))]
-               [else
-                (|#%app| proc key val)
-                (loop (fx+ i 1))])))])))))
+      (cond
+        [(fx>= old-n (#%vector-length vec))
+         ;; If `old-n` is not zero, the hash table changed while we
+         ;; iterated, which is possible since we haven't taken a lock
+         (if map? '() (void))]
+        [else
+         (let loop ([i old-n])
+           (cond
+             [(fx= i (#%vector-length vec))
+              (if try?
+                  (vec-loop i (fx> i old-n))
+                  (if map? '() (void)))]
+             [else
+              (let ([p (#%vector-ref vec i)])
+                (let ([key (car p)]
+                      [val (cdr p)])
+                  (cond
+                    [(or (eq? key #!bwp)
+                         (eq? val #!bwp))
+                     (loop (fx+ i 1))]
+                    [map?
+                     (cons (|#%app| proc key val)
+                           (loop (fx+ i 1)))]
+                    [else
+                     (|#%app| proc key val)
+                     (loop (fx+ i 1))])))]))]))))

 ;; In sorted hash-table travesals, make some effort to sort the key.
 ;; This attempt is useful for making hash-table traversals more