Boost logo

Geometry :

Subject: Re: [geometry] 1.57 Storing indices as values to save space.
From: Adam Wulkiewicz (adam.wulkiewicz_at_[hidden])
Date: 2014-11-27 07:43:56


Hi Tim,

Tim Finer wrote:
> On 11/26/14, 11:49 AM, Adam Wulkiewicz wrote:
>> Tim Finer wrote:
>>> Actually, I think what rtree is doing with the temporary vector in
>>> the packing constructor should be considered a bug.
>>>
>>> That problem is that the vector is created without regard to the
>>> rtree's allocator - so even if a user gives the rtree a memory
>>> mapped file allocator, the rtree creates a copy of the incoming data
>>> in memory, ignoring the allocator.
>>>
>>
>> Not really a copy of incoming data. A calculated centroids and
>> iterators to the original data.
> Ah I see. IndexableGetter must be the source for the centroid though,
> right? Can you use the IndexableGetter to calculate and sort on the
> centroid without allocating a temporary vector of them?
>

I see no other way of effectively sorting a range of elements without a
temporary container, in general. Do you?

Instead of creating a container for all elements there could be M heaps
for elements.size() / M but this wouldn't solve anything.

FYI, actually elements aren't sorted. std::nth_element() is called M
times for each tree level.
M = MAX-1.

>>
>> Yes, rtree's allocator could be passed to the vector (as mentioned in
>> the other email). Then iterators shouldn't be stored since they might
>> be not suitable for storing in shared memory. So probably a copy of
>> actual objects should be made here. And this could be done only if an
>> allocator was different than std::allocator<> so at least in this
>> case it wouldn't harm the performance. Also in the case of passing
>> RandomAccessIterators elements indexes could be stored instead of
>> iterators.
> Except I think the word "should" belongs there instead of "could". :)
> The whole point of passing in an allocator is for naught if the class
> selectively uses it - it doesn't matter about shared memory allocators
> or not. All of the classes that I'm familiar with that use an
> allocator use it completely (like the stdlib containers).
>

For (all) temporary containers? In STL? Could you give some example?

> Step back from your knowledge of the implementation and think about
> this as a user of the rtree. Would you really expect the rtree to
> allocate space from the heap, after giving it an allocator?
>
> The rtree shouldn't second guess the intentions of the users here and
> allocate memory from the heap for a temporary structure. There are
> plenty of important uses cases to use other kinds of allocators that
> don't have anything to do with shared or mapped memory: debugging
> allocators, small object allocators, memory pools, etc. What makes
> this a bug is that it is a surprise. It probably would have gone
> unnoticed by me but for the operating system killing my testbed. :)

It definietly requires more thinking. In general, should all temporary
containers be using passed allocator? Also some temporary containers
used during nodes splitting (they're smaller). I think not but I may be
wrong.
The problem with this one is that it can be very big so probably should
be handled differently.

>
> The need for this heap allocated vector appears to be packing related,
> would it make sense to expose the packing algorithm as a free function
> like the std::transform in <algorithm> instead? It could even take a
> callable object, just like transform does (it'd be nice to reuse the
> IndexableGetter there, if that's what it needs).
>
> rtree::pack( srcBegin, srcEnd, packedBegin, packedEnd );
>
> where srcBegin and srcEnd are random access, const_iterators and
> std::distance(srcBegin,srcEnd) == std::distance(packedBegin, packedEnd).
>
> Or an output iterator:
> rtree::pack<Type, Type, Type>( srcBegin, srcEnd, packedIter );
>
> Then the user would be responsible for allocating the packed container
> and the resultant packing what would be passed in as a ctor (or a
> factory method that has some sort of name that says "packed"):
>
> auto rtree =
> bgi::rtree<ValueType,Partitioner,Indexer>make_packed_rtree(packedBegin, packedEnd);
>
>
>
> // Or, the constructor
> auto rtree = bgi::rtree<ValueType,Partitioner,Indexer>(packedBegin,
> packedEnd);
>

Isn't it the same how it's implemented?

But sure, additional method responsible for packing, allowing to
explicitly express that packing is desired could also be added. But I'm
thinking about non-static member function, similar to assign().

> That gives the flexibility of allocation without compromising the API.
> I think it is cleaner to make a bulk loaded rtree construction more
> explicit (vs. insert) and it removes the "surprise" of allocation that
> packing requires.
>

Then do you also propose to change the constructor?

>>
>> But would it solve your problem?
>
> Yes. My test of 50,000,000 points died because it ran out of memory
> (this is a "small" test). It was a big surprise to see that the rtree
> needed to allocate outside of the allocator and not use the functor to
> derive what it needed. I'm not focused on persistence, that was a
> means to an end. My overall goal is to give my users the query
> ability of data too large to fit in memory. If the rtree only stored
> the ValueType as implied in the documentation and only used the
> allocater, as implied by the API...
>>
>> Alterntively a second allocator could be passed for this temporary
>> container, so a second, temporary file could be used. This would
>> probably be the most convenient for you, am I right? However this
>> would require to complicate the interface (constructors) so I don't
>> like it.
> I agree that putting yet another allocator in the constructor makes it
> even more complex than it should be. Does the rtree really need to
> keep the centroids of all of the data while it is packing? Couldn't
> those be derived from calling the passed in IndexableGetter? If not,
> then maybe getting the centroid is fundamental enough to deserve its
> own functor, or a some other way to model that supports indexing,
> finding the centroid and equality?
>

Of course it is "derived", first IndexableGetter is used, then a
centroid of an indexable is calculated. Do you expect to do it every
time two values need to be compared during sorting/nth_element?
Furthermore it doesn't solve the problem because you still can have so
many values that a container of indexes will be too big.

> If the packing algorithm really needs the temporary storage, then I
> think that the constructor is doing too much and the functionality
> that is packing should be made a free function and the rtree's
> constructor made simpler.
>

So I guess this is an answer for the above question, how would you
propose to change the ctor. I fear that users already using it and
expecting packing could dissagree.

>>
>> We should definietly think about it when implementing a "proper"
>> persistance support.
> Yeah, persistence support is another thing entirely. I'm talking
> about the happy case of memory based allocation right now - rtree is
> not honoring its API contract with regard to allocators. The harder
> problem of persistence is another story, but yes, that would be grand
> to have all that too!
>
> I'll take a look to see if I can't figure out if any of my ideas make
> any sense. I don't know the domain well, but I do have a lot of C++
> experience. I'll let you know either way.

Ok sure, all suggestions for making the library better are appreciated.

Currently I'm concerned the most about the allocator for this big
temporary container. I'm wondering if there is some kind of good practice.

Regards,
Adam


Geometry list run by mateusz at loskot.net