Hello all,
As this issue was raised in this list, I run some tests with uBlas and OpenMP. For anyone interested in adding OpenMP functionality to uBlas I think the findings below will be a good starting point.

As uBlas indirects the calls of assignment operator to certain functions, what needs to be changed are the for loops in those functions. So in this test I added the openMP clause at line 264 in uBlas' vector_assign.hpp (rev. 70666) :

From:

#ifndef BOOST_UBLAS_USE_DUFF_DEVICE
        for (size_type i = 0; i < size; ++ i)
            functor_type::apply (v (i), e () (i));
#else

To:

#ifndef BOOST_UBLAS_USE_DUFF_DEVICE
        #pragma omp parallel for
        for (size_type i = 0; i < size; ++ i)
            functor_type::apply (v (i), e () (i));
#else

The test program was:

#include <boost/numeric/ublas/vector.hpp>
#include <iostream>
int main() {
    const std::size_t N=20000000;
    boost::numeric::ublas::vector<double> a(N), b(N), c(N);
    for (std::size_t i=0; i!=500; i++)
        c=a+b;
    std::cout << c(1) << std::endl;
    return 0;
}

that was compiled with the following gcc flags (amongst typical flags):
 -O2 -DNDEBUG -fopenmp -msse3 -mfpmath=sse -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=1
and linked with (amongst typical flags):
-fopenmp

Before executing the test, one has to set the OMP_NUM_THREADS environment variable. i.e. for 4 cores on an ubuntu machine (bash):
export OMP_NUM_THREADS=4

I didn't run any benchmarks, but top shows 400% utilization on a 4-core i7 (hence all 4 cores working).

Furthermore inspecting the assembly code reveals that the above compilation results into SSE instructions for the addition. (GCC 4.4.3):
    movsd    (%rbp,%rax), %xmm1
    movsd    (%rbx,%rax), %xmm2
    movhpd    8(%rbp,%rax), %xmm1
    movhpd    8(%rbx,%rax), %xmm2
    movapd    %xmm1, %xmm0
    addpd    %xmm2, %xmm0
    movapd    %xmm0, (%r12,%rax)

I hope that gives a clue to someone with some free time to make some uBlas patches. Please let me know if those instructions are enough to get a basic parallel example running in uBlas.

Best,
Nasos