Boost logo

Geometry :

Subject: Re: [geometry] Adapters at the point and/or box abstraction level?
From: Patrick J. LoPresti (lopresti_at_[hidden])
Date: 2014-07-15 13:37:37


I apologize for the long delay in responding. Other tasks intervened,
but I am finally coming back to it...

Adam Wulkiewicz <adam.wulkiewicz_at_[hidden]> writes:

> Correct me if I'm wrong. The coordinates must be loaded to the SSE
> register to perform some vector operation but normally they should be
> stored in memory (in C-array) so you should expect no penalty
> accessing single coordinate WRT e.g. the geometry::model::point<>.

Well, it depends.

Right now, my class looks something like this (simplified):

class Vec2 {
  ...
private:
  __m128i rep_;
};

If I write a function like add(Vec2, Vec2), the ABI will pass each
argument in a single SSE register. Most operations on my Vec2 --
including vector arithmetic and component-wise min/max, as occurs in
bbox union/intersection -- happen entirely in such registers.

The cost comes when extracting a component of the vector (as in
get<0>). That either requires dumping it to memory and reading it back,
or using _mm_extract_epi64. Either way can wind up slower than not
bothering with SSE at all.

I could change my representation like so:

class Vec2 {
  ...
private:
  union {
    __m128i v;
    int64_t a[2];
  } rep_;
};

That would allow direct access to the components when the object is in
memory, avoiding the shuffle through an SSE register...

...but then add(Vec2, Vec2) will have its arguments passed in non-SSE
registers, and therefore will need to do some shuffling itself before it
can perform SSE operations.

That said, most (all?) of the interesting operations are inlined, so
perhaps the calling convention ABI is irrelevant.

I will do some benchmarking with these approaches and let you know how
it goes. Although it may be a while because I have to get my application
working first.

> As Barend mentioned in an other email you could:
> 1. specialize algorithms on the geometry::dispatch level

For my application, I only care about intersection() queries, and I care
a lot more about the speed of queries than the speed of index
generation. I would expect a query to spend most of its time in the
leaves, so specializing intersection() for my Box type will likely give
me everything I want.

I would still like to participate in a broader effort if and when that
happens.

> 2. you could overload the algorithm's main function for your Geometry
> type only
>
> If done properly 1. could be used with any user-defined type to
> perform SIMD calculation. We'd probably need to develop SIMD Point
> Concept. This could be something like:
> - geometry::use_simd<Point>::value equal to true
> - and/or geometry::is_contigeous<Point>::value equal to true - this
> could give use benefits even if SIMD were not used
> - some access to the vectorised data - e.g. data() and size() (similar
> to the C++11 std::vector, std::array), they should probably be free
> functions

Interesting. So the idea is for the representation to just be a pair of
contiguous elements, but the operations can use SIMD if available?

That might work. The trick is to convince the compiler to emit sequences
like:

 - load memory to SIMD register(s)
 - perform sequence of operations on SIMD register(s)
 - store SIMD register(s) back

The danger is that the compiler might emit loads and stores between each
SIMD operation if the representation of the point is just a
vector/array.

Anyway, thank you for your reply, and for all of the great work on
boost::geometry::index itself.

 - Pat


Geometry list run by mateusz at loskot.net