Boost logo

Boost :

From: Chris Fairles (chris.fairles_at_[hidden])
Date: 2008-05-14 16:29:56


On Wed, May 14, 2008 at 4:07 PM, Phil Endecott
<spam_from_boost_dev_at_[hidden]> wrote:
> James Sutherland wrote:
>> I have been testing thread performance on Linux and Mac. My Linux
>> system has two dual-core processors and my Mac has one dual-core
>> processor. Both are intel chips.
>>
>> For the code snippet given below, the execution time should ideally
>> decrease as the number of threads increases. However, the opposite
>> trend is observed. For example, using -O3 flags on my Linux desktop
>> produces the following timings:
>> 1 Thread: 0.66 sec
>> 2 Threads: 0.9 sec
>> 3 Threads: 1.2 sec
>> 4 Threads: 1.4 sec
>>
>> I do not have a lot of experience with threads, and was wondering if
>> this result surprises anyone?
>
> Hi James,
>
> Quoting your code out of order:
>> for( int itask=0; itask<nTasks; ++itask ){
>> boost::thread_group threads;
>> for( int i=0; i<nThreads; ++i ){
>> threads.create_thread( MyStruct(itask++ + 100) );
>> }
>> threads.join_all();
>> }
>
> Did you really want the ++itask in the first for() ? Isn't it being
> incremented enough in the create_thread line?
>
>> struct MyStruct
>> {
>> explicit MyStruct(const int i) : tag(i) {}
>> void operator()() const
>> {
>> const int n = 100;
>> std::vector<int> nums(n,0);
>> for( int j=0; j<1000000; ++j )
>> for( int i=0; i<n; ++i )
>> nums[i] = i+tag;
>> }
>> private:
>> int tag;
>> };
>
> So sizeof(MyStruct)==sizeof(int) [for the tag]. Now, if you were
> creating the MyStruct objects like this:
>
> MyStruct my_structs[n];
>
> then I would say that they are all sharing a cache line, and that cache
> line is being fought over by the different processors when they read
> tag, and that you should add some padding. But you're not; you're
> passing a temporary MyStruct to create_thread which presumably stores a
> copy of it. How does boost::thread_group store the functors that are
> passed to it? If it is storing them in some sort of array or vector
> then that could still be the problem - and it could be fixed by adding
> padding inside boost.thread, or by copying the functor onto the new
> thread's stack.
>
> Also, I would imagine that the compiler would keep tag in a register.
> What happens if you declare it as const?
>
> I suggest that you try adding some padding and see what happens.
>
>
> Phil.
>
>
>
>
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
>

Ran your test on windows for kicks, msvc9 with full optimizations.
Dual-core AMD Athalon X2 4800 (2 x 2.5ghz running in 32bit compat
mode).

1 thread = 1.50 s
2 threads = 1.05 s
3 threads = 1.30 s
4 threads = 1.55 s

So, somewhat expected results here.

Chris


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk