Boost logo

Boost :

Subject: Re: [boost] Geometry and spatial indexes, my opinion
From: Brandon Kohn (blkohn_at_[hidden])
Date: 2008-10-11 09:47:25


--------------------------------------------------
From: "Patrick Mihelich" <patrick.mihelich_at_[hidden]>
Sent: Friday, October 10, 2008 10:55 PM
To: <boost_at_[hidden]>
Subject: Re: [boost] Geometry and spatial indexes, my opinion

> On Thu, Oct 9, 2008 at 7:25 AM, Brandon Kohn <blkohn_at_[hidden]> wrote:
>
>> Second, there are efficiency and code size concerns. Consider the
>> Euclidean
>>> distance function. For 2D/3D points, it is nice to use compile-time
>>> indexing
>>> and metaprogramming to guarantee tight, loop-less code. For high
>>> dimensional
>>> points, however, beyond some dimension
>>> (BOOST_GEOMETRY_MAX_UNROLL_DIMENSION?) unrolling the loop is no longer
>>> desirable, and we would rather just do the runtime iteration. If the
>>> points
>>> are RuntimeIndexable, we can choose an appropriate distance algorithm at
>>> compile time depending on the points' dimensionality.
>>>
>>>
>> I presume the reason one would not want loop-less code at high dimensions
>> is due to bloat? I do not understand why runtime looping would be faster
>> in
>> this case. It would seem that even in the largest cases the bloat would
>> still be small compared to memory sizes today. Runtime indexing can
>> certainly be accommodated using a tag dispatch on the access traits. I
>> think
>> more research needs to be done to quantify the differences between the
>> two
>> to see if it is worth the maintenance overhead.
>
>
> Sure, I need to back up my claim. The fundamental problem with loop-less
> code at high dimensions is bloat, yes. Remember that bloat also affects
> efficiency. Suppose we're calculating distances from one point to a set of
> other points. If the size of the distance function makes the inner loop
> code
> larger than the L1 instruction cache, we take a performance hit.
>
> I'm attaching a simple benchmark that calculates the distance between two
> points some number of times, using both compile-time and runtime access.
> Note that this is awfully friendly to the compile-time access code, as
> we're
> doing nothing else in the inner loop. Here are some results on my machine
> (Intel Core Duo 2.4 GHz, gcc 4.2.3):
>
> $ g++ -DITERATIONS=1000000000 -DDIMENSIONS=2 -O3 runtime_test.cpp -o
> runtime_test
> $ ./runtime_test
> Distance = 2.000000
> Compile-time access: 5.460000s
> Runtime access: 9.070000s
> $ g++ -DITERATIONS=1000000000 -DDIMENSIONS=3 -O3 runtime_test.cpp -o
> runtime_test
> $ ./runtime_test
> Distance = 3.000000
> Compile-time access: 7.980000s
> Runtime access: 10.970000s
>
> So for 2D/3D we see the benefit to using compile-time access, although it
> wouldn't surprise me if the gap could be closed by better compile-flag
> twiddling. Let's see what happens at high dimensions. Let's also add in a
> partially unrolled runtime access version that operates 4 elements at a
> time
> (ideally the compiler would take care of this).
>
> $ g++ -DITERATIONS=10000000 -DDIMENSIONS=36 -O3 runtime_test.cpp -o
> runtime_test
> mihelich_at_adt:~/projects/stuff$ ./runtime_test
> Distance = 36.000000
> Compile-time access: 1.620000s
> Runtime access: 1.280000s
> Runtime access (unrolled): 0.740000s
> $ g++ -I /u/mihelich/packages/boost-spacial-index -I
> /u/mihelich/packages/boost/include -DITERATIONS=10000000 -DDIMENSIONS=128
> -O3 runtime_test.cpp -o runtime_test
> $ ./runtime_test
> Distance = 128.000000
> Compile-time access: 5.960000s
> Runtime access: 4.900000s
> Runtime access (unrolled): 3.160000s
>
> Here runtime access is faster. 128 dimensions may sound excessive, but I
> did
> not choose these numbers arbitrarily. In vision applications it is very
> common to represent image patches by SIFT descriptors, which are points in
> 128-dimensional space. PCA-SIFT reduces the dimensionality to only 36, but
> runtime access is still faster. In either case partial unrolling is a big
> win. In other applications we may go to even higher dimensions...
>
> $ g++ -DITERATIONS=1000000 -DDIMENSIONS=1000 -O3 -ftemplate-depth-1024
> runtime_test.cpp -o runtime_test
> $ ./runtime_test
> Distance = 1000.000000
> Compile-time access: 8.150000s
> Runtime access: 4.090000s
> Runtime access (unrolled): 2.460000s
>
> Now runtime access is much faster. And once we get up this far I have to
> crank up the template depth just to calculate a distance. 1000 dimensions
> is
> high but not ridiculous. Spectra of galaxies from the Sloan Digital Sky
> Survey have 4000 dimensions, for example.
>
> I understand if the authors do not want to concern themselves much with
> optimizing for non 2D/3D cases, as users who really care can always
> substitute their own highly optimized distance function using SSE, etc.
> But
> it would be nice to have something decent out of the box, and there is
> plenty of reason for a RuntimeIndexable concept. If the geometry library
> is
> to be used as a base for spatial index and perhaps clustering libraries,
> some consideration has to be taken for high dimensions.
>

Thank you for running some numbers Patrick, this certainly helps to clarify
the issues. I'm going to investigate using Fusion as the underlying
mechanism for access as I hear it supports both models. Out of curiosity, I
wonder are these random accesses generally done iteratively? If so, how does
direct iteration compare with index operator accesses? I've noticed in my
own tests on measuring std::vector accesses that iteration is significantly
faster than operator [ ] when you are iterating over the container
(presumably due to bounds checking).

Cheers,

Brandon

> Cheers,
> Patrick
>

> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk