Boost logo

Boost Users :

From: Ray Whitmer (ray_at_[hidden])
Date: 2006-05-05 11:48:30

> Personally, I read this and said, "eeek!" (Well, not literally. My
> office mates would wonder what was up.) I think any design like
> this is going to be (at best) very non-portable, and (more likely)
> subject to breaking at the worst possible moment (i.e. when
> demonstrating to a customer). Is "relatively" thread-safe really
> good enough here?

Let me make it clear. I do not want relatively thread-safe. I want
completely thread safe.

It seems to me that boost already relies on an atomic counter for
shared_ptr. I do not think the solution would be less portable than
this. I certainly want it to run on Windows and Mac, 32 and 64 bit
as well as major Unix and Linux including certain 32-bit-or-more
embedded situations.

There may be hopefully-rare C++ threading situations that may be
difficult to recover from, and I will simply have to gather more
information on these and see if they can be dealt with.

> As a constructive suggestion, if you want to reduce lock conflicts
> in hash tables, have you considered some form of finer-grained
> locking? You could, for example, have one lock per "slot" in your
> hash table. Computing the hash and finding the slot is thread-safe
> (since it doesn't depend on the state of the hash table), so you
> could then lock only the slot itself while performing list
> maintenance or whatever you need to do to read/write the data.

For the more-common reader of the hash table, it is not my intent to
reduce lock conflicts, but, rather, to generally eliminate them, and
replace them with something lighter-weight, such as the atomic
counter used by shared_ptr, that in best cases may only take a few
instructions to execute. I may have thousands of threads reading
info from the hashtable at once and there may never be a natural time
when there are no readers so I can update the table. The purpose of
the threads is many operations that rely heavily on the hashtables --
every other instruction they are looking up something.

For what it is worth, finding the slot is not thread-safe, because
the table size and pointer can change. List maintenance involves
multiple slots and I do not think it is worthwhile doing finer-
grained locks because of the complexity, deadlock potential, and
likely worse performance it would introduce.

> This would be truly thread-safe but might perform better if you
> have lots of simultaneous readers/writers. I haven't tested this,
> but if you do, I'd be interested to hear the results.

I don't think it even begins to be thread-safe, because it doesn't
begin to deal with reallocation of the array itself, and if it did,
your fine-grained locks appear to become worthless because you need a
high-level lock protecting that allocation assuming you are going to
use locks for readers, which seems like a bad idea from the start for
me. There is a reason shared_ptr does not use locks (at least on the
best-supported platforms), but uses an atomic count.

> PS - I'd be even more worried in Java, btw, since Java opcodes
> should map to CPU operations less well than a compiled binary. Not
> to mention that in Java, the portability assumption becomes really
> critical. Java classes are *supposed* to be portable.

I do not want to get into Java wars here, but Java has to solve the
deallocation problem in a thread-safe manner. When the garbage
collector actually reclaims an allocation, the possibility cannot
exist that someone else still happens to be using it or the language
is badly broken. This is the hard thing to be solved in C++ to make
a thread-safe hashtable that you do not have to deal with in Java.

The deallocation problem is, from everything I have analyzed, the big
problem in C++. When you attempt to deallocate something, for all
you know there could be another thread that just barely loaded its
pointer into shared memory and as soon as it unstalls will want to
increment its reference count, but it has not yet been incremented,
so even a smart pointer using an atomic reference count fails because
the object is sharable in a multithreaded environment, but not the
reference itself.

Here is my first stab at a solution:

Extra data (for each Hashtable):

1. Array of two atomic counts, one active and one inactive.
2. Index that flops between 0 and 1, so there is an active and
inactive count.
3. A mutex that can be used for exclusive writers

Each reader behaves as follows:

1. Fetch index.
2. Increment corresponding atomic count.
3. If fetched index is not current index, decrement atomic count and
loop to step 1. This is rare and safely eliminates rare bad cases

4. Now the reader has free access to the entire hashtable structure,
and is guaranteed by cooperation of the writer that the pointers it
sees are not stale. It must only guarantee that any pointer
referenced in step 4 is either destroyed or it's ref count is
incremented before step 5 -- this includes the pointer to the
allocated resizable array itself.

5. Decrement the same atomic count (of fetched index) incremented in
step 3.

Note that in Java, none of this is necessary for readers because the
language safety assumes any reference is not deallocated.

Each writer behaves as follows:

1. Obtain an exclusive lock on the hashtable. This lock only
excludes other writers, not other readers.
2. Perform changes to the hashtable in a manner that, given atomic
loads or stores of aligned pointers, never leaving the table in a
state that simultaneous readers cannot use it. This is very easy
compared to the allocation issues, and I have been doing this in Java
for years. I would be happy to expand on this. This write operation
may include releasing objects with ref-count 1, if this is supposed
to be a weak hashtable that does not keep unused objects in the index.
3. This is really part of 2: any step in (2) which destroys a
reference a reference keeps a private local copy to make sure it is
not deallocated before all reading threads are finished with it. If
the references are ref-counted, a ref-counted private reference is kept.
4. If there are no private local pointers/references being kept, go
to step 9
5. Wait for inactive (1-index) atomic count to go to 0. This is rare
and safely eliminates rare bad cases.
6. Change index (index = 1 - index).
7. Wait for inactive (1-index) atomic count to go to 0. This means
that all readers who might have a copy of a stale pointer are finished.
8. Release all private local references with a reference count of 1.
Non-ref-counted references always get released, due to single owner.
If this is a weak hashtable, reinsert any references that have a
reference count greater than one since these were found and
incremented while we were waiting for readers to exit.
9. Release lock. Done.

More notes:

Although we don't want to go out of our way to make the writer
inefficient, there is a single writer at any one time with an
exclusive lock. But we do not have to wait for all readers to exit
to perform the update, we only need to wait for current readers to
exit, during which time new readers may enter, which we do not care
about because we have already removed the candidates from public
places in the list. This includes the pointer to the array itself,
which is generally treated like any other allocated pointer, but does
not require a reference count.

There are many details I left out that did not directly pertain to
the allocation problem, but only with atomically updating the rest of
the table so that the reader view is always consistent.

Ray Whitmer

Boost-users list run by williamkempf at, kalb at, bjorn.karlsson at, gregod at, wekempf at