Boost logo

Ublas :

From: Vardan Akopian (vakopian_at_[hidden])
Date: 2007-03-07 07:14:23


My first thought was that incrementing
compressed_vector::const_iterator and calling index() on it must be
more costly. But by looking at the code (of a bit older version of
boost), I can't see why it should be so. The way I read the code,
incrementing the iterator just increments a pointer into the
index_data array, and calling index() dereferences that pointer (and
subtracts the base index from it). So here are the differences that I
see

1) pathgatevector.end () is actually not trivial, and is not clear if
it's being fully inlined. So it could be that it's being called on
each iteration. The obvious solution is to move it out of the loop:
compressed_vector<double>::const_iterator it = pathgatevector.begin
(), itEnd = pathgatevector.end ();
In my test, this change reduces the runtime from 1 to 0.76.
2) it.index() does one additional subtraction, which may or may not be
optimized out by the compiler (considering that it's 0 passed as a
template argument, a good compiler should optimize it out).
3) it.index() also does an additional dereference back to the
container, which again should normally be inlined.
4) your loop increments 2 object, while mine increments only 1.
5) it != itEnd, might be more expensive than comparing integers on a
64-bit machine since it ends up comparing pointers. This most likely
is not true for 32 bit machines though.

That's all I see as of now. I think 4) and 5) are not significant, so
after moving end() out of the loop, the main cost comes from
it.index() I think. Maybe someone knowing assembly could look at the
generated code and see why is there such a difference?

BTW, to answer your question about why would
densevector (i) = (it.index ());
be more expensive than
densevector (i) = othervector (it.index ());
i think this is because of the conversion from integer to double in
the former case.

-Vardan

On 3/7/07, Sourabh <sourabh_at_[hidden]> wrote:
>
>
> On Wed, 7 Mar 2007, Vardan Akopian wrote:
>
> > I think this is a perfect case to use index_data as Gunter suggested
> > before. So the loop in your FillOmegas() would become:
> > for (unsigned i = 0, iEnd = pathgatevector.filled(); i < iEnd; ++i) {
> > omegas (i) = othervector(pathgatevector.index_data()[i]);
> > }
> > On my machine, this reduces the runtime from 1 minute to 0.26 (using
> > the arguments 100000 1000 to your program)
> >
> > -Vardan
> >
>
> Can you please tell me what is the reason by which this approach is
> getting performance improvement over the following way ?
>
>
> compressed_vector<double>::const_iterator it = pathgatevector.begin ();
> for (unsigned i = 0; it != pathgatevector.end (); ++it, ++i) {
> omegas (i) = othervector(it.index ());
> }
>
> In both the cases, <omegas> is resized to pathgatevector.nnz(), already.
>
> Can we do a comparative study of these two examples and say why one
> approach is running in less time ?
>
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/ublas
>