Boost logo

Geometry :

Subject: Re: [geometry] Crash in rtree after growing shared memory, VS2012 x64, Boost 1.55
From: Adam Wulkiewicz (adam.wulkiewicz_at_[hidden])
Date: 2014-11-25 19:25:44


Hi Tim,

Tim Finer wrote:
> On 11/25/14, 6:04 AM, Adam Wulkiewicz wrote:
>> Tim Finer wrote:
>>> Your mention about the implementation change brings up a question:
>>> how stable is the internal arrangement of nodes from boost version
>>> to version?
>>>
>>> If I create persistent memory mapped rtrees with version 1.55, will
>>> allocators from 1.56 or 1.57 still map them correctly into memory?
>>> I'm in the middle of implementing all this so my customers can use
>>> as a means of loading very large spatial datasets. If the memory
>>> layout changes (in a breaking way) from boost version to boost
>>> version, that'll mean a lot more work that I can live with but only
>>> if it is easy to detect and mitigate robustly.
>>>
>>
>> The safest approach would be to never use previously created shared
>> files with the new version of the library. Some bugs might be fixed,
>> features added, not only in Geometry but also in Interprocess or any
>> other library on which it depend, etc. And we're talking about raw
>> data here stored in a native format. This isn't serialization.
>>
>> In 1.55 polymorphic nodes was used, and they didn't work properly
>> with shared memory. You shouldn't use this version.
>> Since 1.56 variant-based nodes are used.
>>
>> In the future I plan to replace them with lighter type of nodes, so
>> less memory should be needed, but the representation would change.
> I see. I understand that this isn't serialization, but isn't this an
> obvious use case? My understanding is that rtrees work really well as
> disk based storage, implying memory mapped files (along with samples
> that explain how to do this). The size of the data sets my customers
> have are too large to fit in memory. It takes several minutes on fast
> systems (with SSDs and lots of RAM) to create an rtree of millions of
> points. I don't want to make my users do this every time they open a
> document. My goal was to optimize the data once in an rtree and then
> memory map the rtree many times after that. If boost geometry
> rearranges the rtree internally with every release, this makes it much
> more difficult to reuse.
>
> Please add a warning in the documentation for the rtree in the memory
> mapped samples about this? I can't be the only one that can't afford
> to regenerate rtrees on the fly and is thinking of or attempting to
> reuse a memory mapped version.

Of course you're right. The R-tree was designed to use persistent
storage, e.g. as an index in a database and this implementation should
certainly allow this.

Boost.Interprocess allows to work around the lack of explicit
persistence but it's very raw/native mechanism. Its main purpose is to
allow sharing data between different processes. The memory is mapped
directly and AFAIU we shouldn't expect that this mapping will not change
in another version of Boost. What if some internals of Interprocess are
optimized and some additional data are stored in a header of such shared
memory?

Do you expect to release new version of your program and bump used Boost
version so often that this is really a problem?
I mean, your users would have to rebuild the tree only if needed.

>
> I see that you actually added experimental serialization support over
> a year ago, but it looks like it supports trees than are read
> completely into memory, and not read on demand? I'm more concerned
> with persistence than serialization.

Yes, serialization's purpose is to save the whole content to disk or to
load it back. It's not related with the live storage.

Btw, it could e.g. be used along with raw mapped file storage as an
intermediate format for the transition from one Boost version to
another. Of course if the serialization support was finished which isn't
the case.

>
> Thanks for the heads up about the polymorphic node bug, where is that
> documented? If it isn't, that would also be something really helpful
> to include. I don't know anything about how boost handles this, but
> it would be helpful.
>

If you're asking about an info about the change of an internal structure
it isn't mentioned anywhere. There is only an info about fixing a bug
with Interprocess.

I dissagree that this should be mentioned in the docs, it's an internal
change. Furthermore AFAIU Interprocess doesn't guarantee that the
representation of data will be the same in various versions of Boost nor
it wasn't designed to support versioning so we shouldn't rely on this.
Btw, Serialization supports versioning.

> As much as I like the performance of the bg rtree, I'll probably
> switch to an rtree implementation that has persistence as part of its
> design. Boost isn't written so that different versions can easily be
> used by an application, so this doubly compounds the problem.

Sure, everything that works for you. Thanks for your interest and
suggestions. Maybe some day it'll meet your needs.

Regards,
Adam


Geometry list run by mateusz at loskot.net