|
Geometry : |
Subject: Re: [geometry] Crash in rtree after growing shared memory, VS2012 x64, Boost 1.55
From: Adam Wulkiewicz (adam.wulkiewicz_at_[hidden])
Date: 2014-11-25 09:04:03
Hi Tim,
Tim Finer wrote:
> On 11/21/14, 4:59 PM, Adam Wulkiewicz wrote:
>> Tim Finer wrote:
>>> It's been a few months, but I finally got back to trying out the bgl
>>> rtree with a resizable memory mapped file. Things are looking up
>>> and I have another question.
>>>
>>> I've edited our correspondence down to the essentials:
>>>
>>> On 7/26/14, 4:08 AM, Adam Wulkiewicz wrote:
>>>> Tim Finer wrote:
>>>>>> On Jul 25, 2014, at 6:16 PM, Adam Wulkiewicz
>>>>>> <adam.wulkiewicz_at_[hidden]> wrote:
>>>>>>
>>>>>> In order to work the way you want the rtree requires an allocator
>>>>>> which will be able to increase the size of shared memory if needed.
>>>>>> I'm not sure if this is possible in general and/or with
>>>>>> Interprocess.
>>>>>> I'll try to play with it in my free time.
>>>>>> If you were able to come up with something before me, please let
>>>>>> me know.
>>>>> Some other options I'm looking at are:
>>>>>
>>>>> 1 Scanning all the data first (it isn't in a convenient format, so
>>>>> I'll need to store it in another mapped file), then allocating
>>>>> enough space for the rtree.
>>>>
>>>> FYI, the packing algorithm done in the constructor (when you're
>>>> creating the rtree from e.g. a pair of iterators) should construct
>>>> an rtree having nodes "packed" as tightly as possible, you could
>>>> actually predict what'll be the actual number of nodes created. At
>>>> least for currently used algorithm. If you're curious see line 109
>>>> of
>>>> https://github.com/boostorg/geometry/blob/develop/include/boost/geometry/index/detail/rtree/pack_create.hpp.
>>>> So maybe it'd be a good idea to first load the objects to a vector,
>>>> calculate their size, then create a shared memory and construct an
>>>> rtree using pack-create.
>>>> Though I fear that if the memory could be fragmented this might not
>>>> work in all cases.
>>>>
>>> I want to avoid a first pass and would prefer to trade space for
>>> time. Is there a way that I could "actually predict what'll be the
>>> actual number of nodes created" given a live rtree? I'd like to do
>>> this based upon the currently populated rtree so that I could find
>>> the worst case scenario and allocate at least that much memory in a
>>> node insertion wrapper.
>>>
>>
>> I think it could be possible. Since 1.56 or 1.57 (I don't remember)
>> no temporary containers should be created using passed allocator,
>> only nodes. In the worst case the rtree would have to create 1 leaf
>> node + N = depth new internal nodes per insert. Unfortunately there
>> is no official way of checking the depth. You should be able to do it
>> like this:
>>
>> bgi::detail::rtree::utilities::view<rtree_type>(rtree_obj).depth();
>>
>> but it's non-official and could be removed/changed in the future.
> Thank you for the hint - having a non-official function is better than
> nothing. I'm using 1.55, and depth() is in there.
>
> Loading time is a lot faster when I create a memory mapped vector,
> insert all the points, then create an rtree with the vector's
> iterators (using the packing you mentioned before). Not only does
> this generate a smaller rtree, it also is faster at returning queries!
>
The above is for one call of a single-value insert().
In the case of packing algorithm you can predict a number of all nodes
(as I probably wrote earlier), see:
https://github.com/boostorg/geometry/blob/develop/include/boost/geometry/index/detail/rtree/pack_create.hpp
line 109.
> Your mention about the implementation change brings up a question: how
> stable is the internal arrangement of nodes from boost version to
> version?
>
> If I create persistent memory mapped rtrees with version 1.55, will
> allocators from 1.56 or 1.57 still map them correctly into memory? I'm
> in the middle of implementing all this so my customers can use as a
> means of loading very large spatial datasets. If the memory layout
> changes (in a breaking way) from boost version to boost version,
> that'll mean a lot more work that I can live with but only if it is
> easy to detect and mitigate robustly.
>
The safest approach would be to never use previously created shared
files with the new version of the library. Some bugs might be fixed,
features added, not only in Geometry but also in Interprocess or any
other library on which it depend, etc. And we're talking about raw data
here stored in a native format. This isn't serialization.
In 1.55 polymorphic nodes was used, and they didn't work properly with
shared memory. You shouldn't use this version.
Since 1.56 variant-based nodes are used.
In the future I plan to replace them with lighter type of nodes, so less
memory should be needed, but the representation would change.
>>
>>>>
>>>>> 3. Monitor the capacity of the shared file and proactively
>>>>> resize given a threshold (say 90%). I just thought of this one and
>>>>> I like it because it sidesteps the entire exception problem
>>>>> altogether. The more I think about it, the more I like this one.
>>>>
>>>> Yes, this is also worth trying.
>>>> It seems that there is a get_free_memory() method in
>>>> managed_shared_memory so the amount of free space could be easily
>>>> returned.
>>>> In case if the memory could be fragmented you could try to allocate
>>>> some bigger block and see if it succeeds. Then resize the memory if
>>>> necessary.
>>>> When the exception is thrown get_free_memory() returns 968 in my
>>>> test and I'm starting from 1024 * 64.
>>>> The author of Boost.Interprocess could probably give us some
>>>> pointers about how big the threshold should be and if the
>>>> fragmentation should be taken into account.
>>>>
>>>
>>> Hey! This worked. I arbitrarily chose 90% and resized at that
>>> point and my initial tests work! What I'd like to do know is figure
>>> out how much memory would require. I'm not terribly worried about
>>> fragmentation (yet), but will get to that after I run some tests
>>> with finding appropriate thresholds.
>>
>> So you probably would also need a size of nodes for the esstimate.
>> Unfortunately it depends on the node type and there is no official
>> way of obtaining it.
>>
>> So again, everything described below is non-official and may be
>> changed in the future.
>>
>> Currently (1.57) Boost.Variant-based nodes are used by default (this
>> may change in the future, in fact I plan to do it at soem point).
>> Static nodes stores varrays/static_vectors, dynamic nodes uses
>> boost::container::vector<>. So in the case of the container::vector<>
>> the ammount of memory would also depend on the allocation strategy of
>> this container. AFAIR this implementation of vector<> allocates only
>> as much as it needs but I'm not sure.
>>
>> For Variant-based static nodes you should be able to check the size
>> like this:
>> sizeof(
>> bgi::detail::rtree::utilities::view<rtree_type>::allocators_type::node_allocator_type::value_type
>> )
>>
>> For dynamic nodes it'd be more complicated since you should add the
>> container::vector<> storage which'd be in the worst case equal to
>> N*sizeof(Value) for leafs and around N*sizeof(std::pair<Box,
>> NodePointer>) for internal nodes. AFAIR in the worst case N should be
>> equal to MAX+1-MIN in the newly created nodes. Box would be of type
>> model::box<model::point<>> using the same coordinate type and
>> dimension as the Indexable. NodePointer would be a type returned by
>> allocator which you could get like this:
>> bgi::detail::rtree::utilities::view<rtree_type>::allocators_type::node_pointer
>>
>>
>>
>> Because of all of this non-official stuff I'd consider it rather a
>> workaround than a solution.
>>
>> Have you thought about the implementation of an allocator managing
>> arbitrary number of Boost.Interprocess allocators/shared files? It
>> could own some number of BI allocators and try to create a block of
>> memory using them and if it wasn't able to do it it could create
>> another file.
>
> I'm hoping that my using of a fixed node type means that I can predict
> the size of the nodes using the space they actually take (by keeping
> track of the allocator's statistics). So far, the overhead is roughly
> 8 bytes per node over the size of the data stored when I use the
> method above (create vector, use packing algorithm to create an rtree).
Btw, above I wrote about a single-value insert(). If you're using only
pack-create the size could be calculated with more precision, probably
as you're doing already.
Yes, static-size nodes should always take the same amount of memory and
you can get the size of a node using the method described above.
Regards,
Adam
Geometry list run by mateusz at loskot.net