Boost logo

Boost :

Subject: Re: [boost] Geometry and spatial indexes, my opinion
From: Brandon Kohn (blkohn_at_[hidden])
Date: 2008-10-11 09:47:25

From: "Patrick Mihelich" <patrick.mihelich_at_[hidden]>
Sent: Friday, October 10, 2008 10:55 PM
To: <boost_at_[hidden]>
Subject: Re: [boost] Geometry and spatial indexes, my opinion

> On Thu, Oct 9, 2008 at 7:25 AM, Brandon Kohn <blkohn_at_[hidden]> wrote:
>> Second, there are efficiency and code size concerns. Consider the
>> Euclidean
>>> distance function. For 2D/3D points, it is nice to use compile-time
>>> indexing
>>> and metaprogramming to guarantee tight, loop-less code. For high
>>> dimensional
>>> points, however, beyond some dimension
>>> (BOOST_GEOMETRY_MAX_UNROLL_DIMENSION?) unrolling the loop is no longer
>>> desirable, and we would rather just do the runtime iteration. If the
>>> points
>>> are RuntimeIndexable, we can choose an appropriate distance algorithm at
>>> compile time depending on the points' dimensionality.
>> I presume the reason one would not want loop-less code at high dimensions
>> is due to bloat? I do not understand why runtime looping would be faster
>> in
>> this case. It would seem that even in the largest cases the bloat would
>> still be small compared to memory sizes today. Runtime indexing can
>> certainly be accommodated using a tag dispatch on the access traits. I
>> think
>> more research needs to be done to quantify the differences between the
>> two
>> to see if it is worth the maintenance overhead.
> Sure, I need to back up my claim. The fundamental problem with loop-less
> code at high dimensions is bloat, yes. Remember that bloat also affects
> efficiency. Suppose we're calculating distances from one point to a set of
> other points. If the size of the distance function makes the inner loop
> code
> larger than the L1 instruction cache, we take a performance hit.
> I'm attaching a simple benchmark that calculates the distance between two
> points some number of times, using both compile-time and runtime access.
> Note that this is awfully friendly to the compile-time access code, as
> we're
> doing nothing else in the inner loop. Here are some results on my machine
> (Intel Core Duo 2.4 GHz, gcc 4.2.3):
> $ g++ -DITERATIONS=1000000000 -DDIMENSIONS=2 -O3 runtime_test.cpp -o
> runtime_test
> $ ./runtime_test
> Distance = 2.000000
> Compile-time access: 5.460000s
> Runtime access: 9.070000s
> $ g++ -DITERATIONS=1000000000 -DDIMENSIONS=3 -O3 runtime_test.cpp -o
> runtime_test
> $ ./runtime_test
> Distance = 3.000000
> Compile-time access: 7.980000s
> Runtime access: 10.970000s
> So for 2D/3D we see the benefit to using compile-time access, although it
> wouldn't surprise me if the gap could be closed by better compile-flag
> twiddling. Let's see what happens at high dimensions. Let's also add in a
> partially unrolled runtime access version that operates 4 elements at a
> time
> (ideally the compiler would take care of this).
> $ g++ -DITERATIONS=10000000 -DDIMENSIONS=36 -O3 runtime_test.cpp -o
> runtime_test
> mihelich_at_adt:~/projects/stuff$ ./runtime_test
> Distance = 36.000000
> Compile-time access: 1.620000s
> Runtime access: 1.280000s
> Runtime access (unrolled): 0.740000s
> $ g++ -I /u/mihelich/packages/boost-spacial-index -I
> /u/mihelich/packages/boost/include -DITERATIONS=10000000 -DDIMENSIONS=128
> -O3 runtime_test.cpp -o runtime_test
> $ ./runtime_test
> Distance = 128.000000
> Compile-time access: 5.960000s
> Runtime access: 4.900000s
> Runtime access (unrolled): 3.160000s
> Here runtime access is faster. 128 dimensions may sound excessive, but I
> did
> not choose these numbers arbitrarily. In vision applications it is very
> common to represent image patches by SIFT descriptors, which are points in
> 128-dimensional space. PCA-SIFT reduces the dimensionality to only 36, but
> runtime access is still faster. In either case partial unrolling is a big
> win. In other applications we may go to even higher dimensions...
> $ g++ -DITERATIONS=1000000 -DDIMENSIONS=1000 -O3 -ftemplate-depth-1024
> runtime_test.cpp -o runtime_test
> $ ./runtime_test
> Distance = 1000.000000
> Compile-time access: 8.150000s
> Runtime access: 4.090000s
> Runtime access (unrolled): 2.460000s
> Now runtime access is much faster. And once we get up this far I have to
> crank up the template depth just to calculate a distance. 1000 dimensions
> is
> high but not ridiculous. Spectra of galaxies from the Sloan Digital Sky
> Survey have 4000 dimensions, for example.
> I understand if the authors do not want to concern themselves much with
> optimizing for non 2D/3D cases, as users who really care can always
> substitute their own highly optimized distance function using SSE, etc.
> But
> it would be nice to have something decent out of the box, and there is
> plenty of reason for a RuntimeIndexable concept. If the geometry library
> is
> to be used as a base for spatial index and perhaps clustering libraries,
> some consideration has to be taken for high dimensions.

Thank you for running some numbers Patrick, this certainly helps to clarify
the issues. I'm going to investigate using Fusion as the underlying
mechanism for access as I hear it supports both models. Out of curiosity, I
wonder are these random accesses generally done iteratively? If so, how does
direct iteration compare with index operator accesses? I've noticed in my
own tests on measuring std::vector accesses that iteration is significantly
faster than operator [ ] when you are iterating over the container
(presumably due to bounds checking).



> Cheers,
> Patrick

> _______________________________________________
> Unsubscribe & other changes:

Boost list run by bdawes at, gregod at, cpdaniel at, john at