Boost logo

Geometry :

Subject: Re: [geometry] 1.57 Storing indices as values to save space.
From: Adam Wulkiewicz (adam.wulkiewicz_at_[hidden])
Date: 2014-11-26 14:32:44


Hi Tim,

Tim Finer wrote:
> Hi Adam,
>
> I'm trying yet another approach, heh. This time with 1.57.
>
> I'm going to see if I can't just store the spatial data in a memory
> mapped vector and indices into it in an in-memory rtree, as long as I
> relax my requirement for the size of the data a bit.
>

How would this solve your problem?

> What I thought I'd try is storing the smallest sized unsigned integer
> index possible (given the data) and reference a shared memory that I
> control the layout of. As a test, I set up an rtree with the value of
> std::uint32_t and passed in an IndexableGetter functor that returns an
> indexable double triplet when it receives the std::uint32_t value. I
> used the packing constructor to minimize the memory footprint. This
> all works, except, it looks like the rtree really stores a pair of an
> indexable point and a value and not just the values.
>
> I did some before and after memory snapshots and determined the size
> of the memory used is the same regardless if I set the value to an
> index, or if I fill the rtree with the double triplets directly. My
> expectation was that the functor would be used each time that the
> indexable is needed, because the documentation said "Important: The
> translation is done quite frequently inside the container - each time
> the rtree needs it".
>
> Except, that isn't really the case. What really happens is that the
> rtree caches a temporary vector of the entire size incoming data.
>
> pack_create.hpp, around line 155:
> std::vector<entry_type> entries;
> entries.reserve(values_count);
>
> Where entry_type is a std::pair<point_type, InIt>, where
> sizeof(point_type) is 24 and InIt is the incoming value iterator.
>

The IndexableGetter allows to retrieve an Indexable (Point, Box or
Segment) bound with the Value stored in an rtree. So in your case to
translate std::uint32_t to some Point or Box. It's used e.g. in line 211
indirectly by the translator(). A Value stored in the rtree is of a type
passed as the first template parameter so uint32_t.

Packing algorithm works in a top-down manner. It must have access to all
values and in the process it is sorting the entire set on each level.
This is why it must use some intermediate container to store data since
it can't sort an input range. The std::vector<entry_type> is this
temporary container storing pairs of a point (which is a centroid of
Indexable) and an original iterator to access the value later (line
210). The Point is stored to avoid retrieving an Indexable and
calculating its centroid each time an entry is compared during sorting.
Does it take too much memory in your case?

Unfortunately to fully support pack-creation of a persistent rtree we
should use a temporary persistent storage/file for all of the temporary
elements and use this storage/file for sorting.

I'm guessing that using the same allocator as in the case of an rtree
(and boost::container::vector<>) could somehow solve the problem as long
as not the whole mapped file must be loaded in memory, I'm not sure if
this is the case. Still, a lot more free space would be needed in a
mapped file.

> I also tried dynamically inserting nodes too, but that used even more
> memory, although I didn't spend too much time looking at why.

Inserting elements using balancing algorithm will result in creation of
more nodes than in the case of packing algorithm which creates a minimum
number of nodes for given Max and Min. This is why the tree is bigger.

> So my question is, what is the purpose of the IndexableGetter? Is
> there a way to trade off speed for space such that the rtree uses an
> IndexableGetter all the way through? Was this by design?

As explained above IndexableGetter is used as intended.

Regards,
Adam


Geometry list run by mateusz at loskot.net