Boost logo

Boost Users :

Subject: Re: [Boost-users] [GIL] Some questions about 2D iteration
From: Tim Day (timday_at_[hidden])
Date: 2010-04-11 19:24:36


On Fri, 2010-04-09 at 00:56 -0400, Brett Gmoser wrote:
> The documentation clearly says that v(x, y) is slower than iteration.

Interestingly, I've just been benchmarking the various GIL image
traversal methods, as I find the coordinate access method very
convenient, but I wanted to get some idea of how much improvement I
should realistically expect from converting code to use iterators.
Results obtained on standard Debian Lenny on an Intel Core i7, compiled
with -march=core2 -mfpmath=sse -msse4.1 -O3 -DNDEBUG ; the main bits of
code appended to this email.

GIL coord access 1120.9697 Megapixels/s
GIL row iterator access 1293.0218 Megapixels/s
GIL image iterator access 77.9477 Megapixels/s

I was pretty surprised just how efficient the v(x,y) accessor is,
and even more surprised by how inefficient the whole-image iterator is!

Inspecting the assember, the inner loop of v(x,y) looks like:
.L768:
        xorb (%rcx,%rdx), %sil
        movq %rax, %rdx
        incq %rax
        cmpq %r8, %rax
        jne .L768
which is very lean, but not quite as good as the inner loop of the row
iterator:
.L734:
        xorb (%rdx,%rax), %cl
        incq %rax
        cmpq %rax, %r8
        jg .L734

However, the inner loop of the all-image iterator is:
.L696:
        movzbl (%rcx), %eax
        incq %rdx
        leaq 1(%rcx,%rbp), %rcx
        xorl %eax, %r10d
.L708:
        testq %rdx, %rdx
        jne .L696
        cmpq %rcx, %rbx
        jne .L696
which is a bit more complicated, although it seems remarkable it runs
~15 times slower than the other methods.

What I took away from this:
 - Avoid the all-image iterator like the plague (although I don't really
understand how it manages to be quite so spectacularly slow).
 - You need to be pretty desperate for performance to convert working
and basically fast enough coordinate-access based code to iterators.
 - Compilers can do a pretty nice job with GIL classes. I've used other
image classes which leave far more to run-time (e.g virtual function
calls) and you have to basically "unload" the class information to
pointers and ints and do it all yourself to get performant inner loops.

-----

BOOST_AUTO_TEST_CASE(coord_access_benchmark)
{
  unsigned char hash=0;
  scoped_timer t("GIL coord access",images().size(),"Megapixels");
  for (images_t::const_iterator it=images().begin();it!=images().end();++it)
    {
      const boost::gil::gray8c_view_t v=boost::gil::const_view(**it);
      for (int y=0;y<v.height();++y)
        for (int x=0;x<v.width();++x)
          hash=(hash^v(x,y));
    }
  force_result=hash;
}

BOOST_AUTO_TEST_CASE(row_iterator_access_benchmark)
{
  unsigned char hash=0;
  scoped_timer t("GIL row iterator access",images().size(),"Megapixels");
  for (images_t::const_iterator it=images().begin();it!=images().end();++it)
    {
      const boost::gil::gray8c_view_t v=boost::gil::const_view(**it);
      for (int y=0;y<v.height();++y)
        {
          boost::gil::gray8c_view_t::x_iterator p=v.row_begin(y);
          for (int x=0;x<v.width();++x,++p)
            hash=(hash^*p);
        }
    }
  force_result=hash;
}

BOOST_AUTO_TEST_CASE(image_iterator_access_benchmark)
{
  unsigned char hash=0;
  scoped_timer t("GIL image iterator access",images().size(),"Megapixels");
  for (images_t::const_iterator it=images().begin();it!=images().end();++it)
    {
      const boost::gil::gray8c_view_t v=boost::gil::const_view(**it);
      for (boost::gil::gray8c_view_t::iterator p=v.begin();p!=v.end();++p)
        {
          hash=(hash^*p);
        }
    }
  force_result=hash;
}

-----
Hope that's of interest to some
Tim


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net