|
Boost : |
Subject: Re: [boost] Accelerating algorithms with SIMD - Segmented iteratorsand alternatives
From: Jeffrey Lee Hellrung, Jr. (jhellrung_at_[hidden])
Date: 2010-10-15 16:48:09
On 10/15/2010 11:06 AM, Robert Ramey wrote:
> David Abrahams wrote:
>>> By the way, I did run a comparison between a segmented iteration,
>>> with nested for-loops, and a flattened iteration. I was very much
>>> surprised in the runtime difference. It obviously depends on the
>>> operation you do per iteration, but for just incrementing a counter,
>>> the difference was 2 or 3 orders of magnitude. Needless to say, I'm
>>> more convinced of the value of exposing the segmentation of data
>>> structures now.
>
> 2 or 3 orders of magnitude? 100 to 1000 times faster? is that right?
>
> Robert Ramey
No, it's not right; upon looking at the assembly output (which I
should've done prior to making hasty conclusions), I think the original
nested for-loop test I tried was too easy for the compiler to optimize.
I changed the test to actually iterate over a segmented data structure
(I was just doing the looping constructs originally, without any actual
data structure), and the results are much less dramatic: the nested
for-loop iteration is about 20% faster. I think this is in line with
the Austern paper. Here is the code:
#include <iostream>
#include <vector>
#include <boost/timer.hpp>
int main(int argc, char* argv[])
{
typedef std::vector< int > v_type;
typedef v_type::iterator v_it_type;
typedef std::vector< v_type > vv_type;
typedef vv_type::iterator vv_it_type;
vv_type vv(1000);
for(int i = 0; i != 1000; ++i)
vv[i].resize(1000);
{
boost::timer timer;
for(int i = 0; i != 1000; ++i)
for(vv_it_type vv_it = vv.begin(), vv_end = vv.end(); vv_it
!= vv_end; ++vv_it) {
v_type& v = *vv_it;
for(v_it_type v_it = vv_it->begin(), v_end =
vv_it->end(); v_it != v_end; ++v_it)
*v_it = 42;
}
const double elapsed = timer.elapsed();
std::cout << elapsed << std::endl;
}
{
boost::timer timer;
for(int i = 0; i != 1000; ++i) {
vv_it_type vv_it = vv.begin(), vv_end = vv.end();
v_it_type v_it = vv_it->begin(), v_end = vv_it->end();
while(vv_it != vv_end) {
*v_it = 42;
++v_it;
if(v_it == v_end) {
++vv_it;
if(vv_it != vv_end)
v_it = vv_it->begin(), v_end = vv_it->end();
}
}
}
const double elapsed = timer.elapsed();
std::cout << elapsed << std::endl;
}
return 0;
}
Still faster, and if you replace the std::vector's with, e.g.,
boost::iterator_range< boost::counting_iterator< ... > >'s, you might
get a better measure of the actual loop overhead, as you'd avoid the
memory accesses (speculation).
- Jeff
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk