Hi,
It is a bit unclear where the 
#pragma  omp parallel
ends. There is a bracet missing somewhere. If you have the bracket ending the 
#pragma  omp parallel
after the collecting for-loop you are in trouble. The for loop will the be run in all four threads.
#pragma omp barrier just syncronizes the loops. It does not merge them. What you want is to end the

#pragma omp parallel

before the loop.

This should work:

 

#pragma omp parallel sections

{

#pragma omp section

    {outputVec[0] = doSomething();}

#pragma omp section

    {outputVec[1] = doSomething();}

#pragma omp section

    {outputVec[2] = doSomething();}

#pragma omp section

    {outputVec[3] = doSomething();}

} //all parallel threads end here

 

for (unsigned i = 0; i < 4; ++i)

    output += outputVec[i];

 

Stian

 


From: ublas-bounces@lists.boost.org [mailto:ublas-bounces@lists.boost.org] On Behalf Of Tan, Sarah (FID)
Sent: 24. september 2007 16:02
To: ublas@lists.boost.org
Subject: [ublas] ublas and ICC

Hi
 
I am wondering whether anyone has experience with the stability issue when compiling ublas with Intel C++ Compiler for Windows (ICC 9.1).
 
Problem arises when parallel code is generated (boost library 1.33.1). The parallelization scheme is quite simple. There are four threads for a four-core machine. Each thread dumps output to a ublas matrix. They then got accumulated into a final matrix when four threads all come back.
 

ublas::matrix<double> output(n1, n2);

output.clear();

std::vector< ublas::matrix<double> > outputVec;

 for (unsigned i = 0; i < 4; ++i)

{

ublas::matrix<double> um(n1, n2);

      um.clear();

      outputVec.push_back(um);

}

#pragma omp parallel

{

#pragma omp sections

{

#pragma omp section

{     outputVec[0] = doSomething();}

#pragma omp section

{     outputVec[1] = doSomething();}

#pragma omp section

{     outputVec[2] = doSomething();}

#pragma omp section

{     outputVec[3] = doSomething();}

}

#pragma omp barrier

for (unsigned i = 0; i < 4; ++i)

      output += outputVec[i];

 
If ICC "Generate parallel code /QOpenmp" option is selected, output is unstable. If ICC "Generate sequential code /QOpenmp-stubs" option is selected, output is STABLE, but of course I no longer enjoy the speed boost because omp directives are ignored.
 
The whole parallelization scheme is perfectly stable if std::vector based matrix is used rather than ublas, which is considerably slower than ublas.
 
How can I get a stable paralyzed ublas-based implementation? Any input is much appreciated.
Thanks
Sarah
 

This is not an offer (or solicitation of an offer) to buy/sell the securities/instruments mentioned or an official confirmation.  Morgan Stanley may deal as principal in or own or act as market maker for securities/instruments mentioned or may advise the issuers.  This is not research and is not from MS Research but it may refer to a research analyst/research report.  Unless indicated, these views are the author’s and may differ from those of Morgan Stanley research or others in the Firm.  We do not represent this is accurate or complete and we may not update this.  Past performance is not indicative of future returns.  For additional information, research reports and important disclosures, contact me or see https://secure.ms.com/servlet/cls.  You should not use e-mail to request, authorize or effect the purchase or sale of any security or instrument, to send transfer instructions, or to effect any other transactions.  We cannot guarantee that any such requests received via e-mail will be processed in a timely manner.  This communication is solely for the addressee(s) and may contain confidential information.  We do not waive confidentiality by mistransmission.  Contact me if you do not wish to receive these communications.  In the UK, this communication is directed in the UK to those persons who are market counterparties or intermediate customers (as defined in the UK Financial Services Authority’s rules).