|
Boost : |
From: Sean Parent (sparent_at_[hidden])
Date: 2005-09-15 19:08:43
I don't regularly read the boost mailing list - so please reply
directly to me or Mike.
Sean
Begin forwarded message:
> From: "Mike Schuster" <schuster_at_[hidden]>
> Date: September 15, 2005 3:42:26 PM PDT
> Subject: Boost thread library bugs
>
>
> Here is a summary of several bugs I've discovered over the past few
> months in the Boost thread library (version 1_32_0). Sean, please
> forward this email to the Boost thread developers. Thanks.
>
>
>
> 1) On the PowerPC, the sequence of memory write operations executed
> by one processor may be seen by another processor or device in a
> different order. This weak write ordering property implies that
> when modifying a shared resource, the modifying processor must
> execute a sync instruction to make these modifications visible to
> all other processors before releasing the lock. I discovered
> several situations in the Boost thread library where a sync call is
> missing.
>
>
>
> call_once: Immediately after the client function returns a lock
> variable is set to one. Other processors may see this lock equal to
> one before all memory write operations performed by the client
> function are completed. A call to __sync() should be made
> immediately prior to setting the lock to one.
>
>
>
> synchronization class constructors (mutex, read_write_mutex,
> condition, etc): Once the class constructor returns, Boost provides
> an API where other threads are free to call the synchronization
> member functions. However, the memory write operations performed by
> the constructor may not have been completed when the member
> functions are executed by a different processor. So a call to __sync
> () should be made immediately prior to returning from the constructor.
>
>
>
> Note that a similar situation occurs between member function calls.
> However the MacOS synchronization primitives used by Boost do
> perform a sync, so correct operation is guaranteed implicitly as
> long the last operation performed by a member function involves an
> OS synchronization primitive call. This appears to be the situation
> in many places, but there may be places in the Boost library where
> this requirement is not met. So someone needs to review all of the
> source code for problems of this sort.
>
>
>
> 2) On the PowerPC, I have seen situations where call_once deadlocks
> in the MPRemoteCall function. I have not been able to diagnose the
> problem. Deadlocks occur when call_once is executed by non-main
> threads. I believe I have a solution to the problem which uses a
> completely different implementation similar to that of the Win32
> version and avoids all calls to MPRemoveCall. Maybe I should submit
> this solution to the Boost developers for consideration.
>
>
>
> 3) I discovered a deadlock in read_write_mutex. If either of the
> alternating scheduling policies are used, the implementation will
> deadlock the first reader to arrive when no writers are active. The
> deadlock occurs in the function void
> read_write_mutex_impl<Mutex>::do_read_lock. If m_state == 0 and
> m_num_readers_to_wait == 0 (this holds immediately on
> construction), then an arriving reader will hang indefinitely on
> m_waiting_readers. There are other related situations where the
> BOOST_ASSERT on loop_count fails.
>
>
>
> Note I am concerned that such a blatant flaw is present in the
> library. This implies that the library has not been very well
> tested. This is worrisome especially for a thread library where
> threading bugs can be extremely frustrating and hard for uses of
> the library to reproduce and diagnose.
>
>
>
> -Mike Schuster
>
>
>
>
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk