|
Ublas : |
From: Langeland, Stian (GE Healthcare) (Stian.Langeland_at_[hidden])
Date: 2007-09-27 09:46:03
Hi,
It is a bit unclear where the
#pragma omp parallel
ends. There is a bracet missing somewhere. If you have the bracket
ending the
#pragma omp parallel
after the collecting for-loop you are in trouble. The for loop will the
be run in all four threads.
#pragma omp barrier just syncronizes the loops. It does not merge them.
What you want is to end the
#pragma omp parallel
before the loop.
This should work:
#pragma omp parallel sections
{
#pragma omp section
{outputVec[0] = doSomething();}
#pragma omp section
{outputVec[1] = doSomething();}
#pragma omp section
{outputVec[2] = doSomething();}
#pragma omp section
{outputVec[3] = doSomething();}
} //all parallel threads end here
for (unsigned i = 0; i < 4; ++i)
output += outputVec[i];
Stian
________________________________
From: ublas-bounces_at_[hidden]
[mailto:ublas-bounces_at_[hidden]] On Behalf Of Tan, Sarah (FID)
Sent: 24. september 2007 16:02
To: ublas_at_[hidden]
Subject: [ublas] ublas and ICC
Hi
I am wondering whether anyone has experience with the stability
issue when compiling ublas with Intel C++ Compiler for Windows (ICC
9.1).
Problem arises when parallel code is generated (boost library
1.33.1). The parallelization scheme is quite simple. There are four
threads for a four-core machine. Each thread dumps output to a ublas
matrix. They then got accumulated into a final matrix when four threads
all come back.
ublas::matrix<double> output(n1, n2);
output.clear();
std::vector< ublas::matrix<double> > outputVec;
for (unsigned i = 0; i < 4; ++i)
{
ublas::matrix<double> um(n1, n2);
um.clear();
outputVec.push_back(um);
}
#pragma omp parallel
{
#pragma omp sections
{
#pragma omp section
{ outputVec[0] = doSomething();}
#pragma omp section
{ outputVec[1] = doSomething();}
#pragma omp section
{ outputVec[2] = doSomething();}
#pragma omp section
{ outputVec[3] = doSomething();}
}
#pragma omp barrier
for (unsigned i = 0; i < 4; ++i)
output += outputVec[i];
If ICC "Generate parallel code /QOpenmp" option is selected,
output is unstable. If ICC "Generate sequential code /QOpenmp-stubs"
option is selected, output is STABLE, but of course I no longer enjoy
the speed boost because omp directives are ignored.
The whole parallelization scheme is perfectly stable if
std::vector based matrix is used rather than ublas, which is
considerably slower than ublas.
How can I get a stable paralyzed ublas-based implementation? Any
input is much appreciated.
Thanks
Sarah
________________________________
This is not an offer (or solicitation of an offer) to buy/sell
the securities/instruments mentioned or an official confirmation.
Morgan Stanley may deal as principal in or own or act as market maker
for securities/instruments mentioned or may advise the issuers. This is
not research and is not from MS Research but it may refer to a research
analyst/research report. Unless indicated, these views are the author's
and may differ from those of Morgan Stanley research or others in the
Firm. We do not represent this is accurate or complete and we may not
update this. Past performance is not indicative of future returns. For
additional information, research reports and important disclosures,
contact me or see https://secure.ms.com/servlet/cls. You should not use
e-mail to request, authorize or effect the purchase or sale of any
security or instrument, to send transfer instructions, or to effect any
other transactions. We cannot guarantee that any such requests received
via e-mail will be processed in a timely manner. This communication is
solely for the addressee(s) and may contain confidential information.
We do not waive confidentiality by mistransmission. Contact me if you
do not wish to receive these communications. In the UK, this
communication is directed in the UK to those persons who are market
counterparties or intermediate customers (as defined in the UK Financial
Services Authority's rules).