Boost logo

Boost :

Subject: Re: [boost] Accelerating algorithms with SIMD - Segmented iteratorsand alternatives
From: Jeffrey Lee Hellrung, Jr. (jhellrung_at_[hidden])
Date: 2010-10-15 16:48:09


On 10/15/2010 11:06 AM, Robert Ramey wrote:
> David Abrahams wrote:
>>> By the way, I did run a comparison between a segmented iteration,
>>> with nested for-loops, and a flattened iteration. I was very much
>>> surprised in the runtime difference. It obviously depends on the
>>> operation you do per iteration, but for just incrementing a counter,
>>> the difference was 2 or 3 orders of magnitude. Needless to say, I'm
>>> more convinced of the value of exposing the segmentation of data
>>> structures now.
>
> 2 or 3 orders of magnitude? 100 to 1000 times faster? is that right?
>
> Robert Ramey

No, it's not right; upon looking at the assembly output (which I
should've done prior to making hasty conclusions), I think the original
nested for-loop test I tried was too easy for the compiler to optimize.
  I changed the test to actually iterate over a segmented data structure
(I was just doing the looping constructs originally, without any actual
data structure), and the results are much less dramatic: the nested
for-loop iteration is about 20% faster. I think this is in line with
the Austern paper. Here is the code:

#include <iostream>
#include <vector>

#include <boost/timer.hpp>

int main(int argc, char* argv[])
{
     typedef std::vector< int > v_type;
     typedef v_type::iterator v_it_type;
     typedef std::vector< v_type > vv_type;
     typedef vv_type::iterator vv_it_type;
     vv_type vv(1000);
     for(int i = 0; i != 1000; ++i)
         vv[i].resize(1000);
     {
         boost::timer timer;
         for(int i = 0; i != 1000; ++i)
             for(vv_it_type vv_it = vv.begin(), vv_end = vv.end(); vv_it
!= vv_end; ++vv_it) {
                 v_type& v = *vv_it;
                 for(v_it_type v_it = vv_it->begin(), v_end =
vv_it->end(); v_it != v_end; ++v_it)
                     *v_it = 42;
             }
         const double elapsed = timer.elapsed();
         std::cout << elapsed << std::endl;
     }
     {
         boost::timer timer;
         for(int i = 0; i != 1000; ++i) {
             vv_it_type vv_it = vv.begin(), vv_end = vv.end();
             v_it_type v_it = vv_it->begin(), v_end = vv_it->end();
             while(vv_it != vv_end) {
                 *v_it = 42;
                 ++v_it;
                 if(v_it == v_end) {
                     ++vv_it;
                     if(vv_it != vv_end)
                         v_it = vv_it->begin(), v_end = vv_it->end();
                 }
             }
         }
         const double elapsed = timer.elapsed();
         std::cout << elapsed << std::endl;
     }
     return 0;
}

Still faster, and if you replace the std::vector's with, e.g.,
boost::iterator_range< boost::counting_iterator< ... > >'s, you might
get a better measure of the actual loop overhead, as you'd avoid the
memory accesses (speculation).

- Jeff


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk