Boost logo

Boost Users :

Subject: [Boost-users] Understanding Threads: Slow on Win32 with DLL
From: Simon Adler (boost_at_[hidden])
Date: 2010-11-30 16:38:03


Hello Boost-Users,

I have a strange Problem and also a theorie to solve it. I do not realy
understand the reason
and hope for somebody how may explain the problem.

I am using boost.threads to execute some algorithms in parallel. I will post
source code
in detail if this is required.
I have some work to do which is similar for different Data. Concrete: I
compute some Forces
on a Deformationmodel of geometrical Edges. I create for n available
Processors n-1 Threads
so lets say we have 10.000 Edges 3 times 2500 are processed from boost
threads and 2500
are processed by the main thread. Because this is a worker crew i join the
threads and will
be finished.
All this is done in Win32 using Visual Studio Express and Boost.

I create some wrapper Structure at the moment

struct Wrappy
{
    void operator()( EdgeProcessor * array, int count)
   {
       for (int i = 0; i < count; i++) array[i]->compute();
   }
}

My main application is creating a EdgeProcessor Array of 10000 Elements. In
boost::thread
constructor I give a ptr to the elements the thread should compute and count
is for every thread
2500. You'll see - everything is straight forward and worked fine in a lot
of situations.

Now EdgeProcessor is a class compiled in a seperate dll (Multithreaded DLL)
doing a lot of
calls - some recursive, Just basic C++ calls, no std contaiiners in use.
If I process all Elements with the main application using no threads it
takes 0,07 sec and i have 100%
usage of one CPU (have an I5, so 25% in total)
If I use threads, i got 0,19 sec - more than twice the time.
If I use just one boost::thread - not processing in main app - again 0,07
sec.
If I use all threads all four cpus are in 100% usage - they are all together
working but require more
than twice the time - Yes i am sure that every thread is just working on
2500 Elements.

If more than one thread is calling the EdgeProcessor, the task is done very
slow.
The data for the EdgeProcessor is by the way parallel, so every
EdgeProcessor has its own data and
there are no intersections or synchronisations at all.

I assume that there is a realy big overhead because of the dll. Maybe the
access of the class can
not be done in parallel or not as fast as in a static case. Maybe there is a
hidden synchronisation.
Do you have some explanation for this?
I use the same approach on other parts of the application without problems.
The only difference
is a) using of a class Instance within a dll and b) the compute will result
i a couple of recursive calls
(traversing tree)

I have experience in this, but this behavior is strange. I am sorry if this
is a little win32 / dll / visual studio
like, but i am not sure where to ask.

Thanks for your ideas!

Simon



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net