Boost logo

Boost :

Subject: Re: [boost] Geometry and spatial indexes, my opinion
From: Patrick Mihelich (patrick.mihelich_at_[hidden])
Date: 2008-10-10 23:55:49


On Thu, Oct 9, 2008 at 7:25 AM, Brandon Kohn <blkohn_at_[hidden]> wrote:

> Second, there are efficiency and code size concerns. Consider the Euclidean
>> distance function. For 2D/3D points, it is nice to use compile-time
>> indexing
>> and metaprogramming to guarantee tight, loop-less code. For high
>> dimensional
>> points, however, beyond some dimension
>> (BOOST_GEOMETRY_MAX_UNROLL_DIMENSION?) unrolling the loop is no longer
>> desirable, and we would rather just do the runtime iteration. If the
>> points
>> are RuntimeIndexable, we can choose an appropriate distance algorithm at
>> compile time depending on the points' dimensionality.
>>
>>
> I presume the reason one would not want loop-less code at high dimensions
> is due to bloat? I do not understand why runtime looping would be faster in
> this case. It would seem that even in the largest cases the bloat would
> still be small compared to memory sizes today. Runtime indexing can
> certainly be accommodated using a tag dispatch on the access traits. I think
> more research needs to be done to quantify the differences between the two
> to see if it is worth the maintenance overhead.

Sure, I need to back up my claim. The fundamental problem with loop-less
code at high dimensions is bloat, yes. Remember that bloat also affects
efficiency. Suppose we're calculating distances from one point to a set of
other points. If the size of the distance function makes the inner loop code
larger than the L1 instruction cache, we take a performance hit.

I'm attaching a simple benchmark that calculates the distance between two
points some number of times, using both compile-time and runtime access.
Note that this is awfully friendly to the compile-time access code, as we're
doing nothing else in the inner loop. Here are some results on my machine
(Intel Core Duo 2.4 GHz, gcc 4.2.3):

$ g++ -DITERATIONS=1000000000 -DDIMENSIONS=2 -O3 runtime_test.cpp -o
runtime_test
$ ./runtime_test
Distance = 2.000000
Compile-time access: 5.460000s
Runtime access: 9.070000s
$ g++ -DITERATIONS=1000000000 -DDIMENSIONS=3 -O3 runtime_test.cpp -o
runtime_test
$ ./runtime_test
Distance = 3.000000
Compile-time access: 7.980000s
Runtime access: 10.970000s

So for 2D/3D we see the benefit to using compile-time access, although it
wouldn't surprise me if the gap could be closed by better compile-flag
twiddling. Let's see what happens at high dimensions. Let's also add in a
partially unrolled runtime access version that operates 4 elements at a time
(ideally the compiler would take care of this).

$ g++ -DITERATIONS=10000000 -DDIMENSIONS=36 -O3 runtime_test.cpp -o
runtime_test
mihelich_at_adt:~/projects/stuff$ ./runtime_test
Distance = 36.000000
Compile-time access: 1.620000s
Runtime access: 1.280000s
Runtime access (unrolled): 0.740000s
$ g++ -I /u/mihelich/packages/boost-spacial-index -I
/u/mihelich/packages/boost/include -DITERATIONS=10000000 -DDIMENSIONS=128
-O3 runtime_test.cpp -o runtime_test
$ ./runtime_test
Distance = 128.000000
Compile-time access: 5.960000s
Runtime access: 4.900000s
Runtime access (unrolled): 3.160000s

Here runtime access is faster. 128 dimensions may sound excessive, but I did
not choose these numbers arbitrarily. In vision applications it is very
common to represent image patches by SIFT descriptors, which are points in
128-dimensional space. PCA-SIFT reduces the dimensionality to only 36, but
runtime access is still faster. In either case partial unrolling is a big
win. In other applications we may go to even higher dimensions...

$ g++ -DITERATIONS=1000000 -DDIMENSIONS=1000 -O3 -ftemplate-depth-1024
runtime_test.cpp -o runtime_test
$ ./runtime_test
Distance = 1000.000000
Compile-time access: 8.150000s
Runtime access: 4.090000s
Runtime access (unrolled): 2.460000s

Now runtime access is much faster. And once we get up this far I have to
crank up the template depth just to calculate a distance. 1000 dimensions is
high but not ridiculous. Spectra of galaxies from the Sloan Digital Sky
Survey have 4000 dimensions, for example.

I understand if the authors do not want to concern themselves much with
optimizing for non 2D/3D cases, as users who really care can always
substitute their own highly optimized distance function using SSE, etc. But
it would be nice to have something decent out of the box, and there is
plenty of reason for a RuntimeIndexable concept. If the geometry library is
to be used as a base for spatial index and perhaps clustering libraries,
some consideration has to be taken for high dimensions.

Cheers,
Patrick




Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk