Boost logo

Geometry :

Subject: Re: [geometry] Approach(es) to creating persistent index::rtree?
From: Adam Wulkiewicz (adam.wulkiewicz_at_[hidden])
Date: 2014-07-17 08:11:26


Hi Patrick,

Patrick J. Lopresti wrote:
> (Lots of questions from me this week, I know...)
>
> For my application, I need to create a spatial index once, save it to
> disk, and use it (read-only) for queries many times in future
> processes. So I need an approach for creating a persistent index on
> disk.
>
> Some of my data sets might have ~1 billion rectangles today and more
> in the long run. I have a lot more time to generate the index than to
> read and query it.
>
> I am aware of three approaches:
>
> Approach #1: Save the raw Value data in whatever format I like. When I
> need the index, load the from disk and create the rtree on the fly via
> the bulk loading (packing) algorithm.
>
> Pros: Very easy to implement, and this will probably be my approach
> for my prototype.
>
> Cons: Too slow for large data sets.
>
>
> Approach #2: Use boost::serialization. I do not see this described in
> the Boost.Geometry documentation -- because it is still experimental?
> -- but I do see the example in geometry/index/example/serialize.cpp.
>
> Pros: Looks pretty easy to implement, even though I know essentially
> nothing about boost::serialization.
>
> Cons: I know essentially nothing about boost::serialization :-). But I
> do know it is not "header only", so I presume Boost.Geometry would
> inherit this unfortunate property. Also I suppose I will have to learn
> how to tell boost::serialization about my Value classes (?). I do not
> know what performance to expect.

Yes, the support is there as experimental. I hope the example still works...
It has some flaws, the greatest one is indeed that it would be included
with the rtree so this would be a dependency.
I planned to reimplement it but due to a lack of time I wasn't able to
do it.

And yes, you'd be forced to enable serialization for your Values. AFAIR
if you don't want to do something unusual it's as simple as overloading
operators << and >>.

>
> Approach #3: Use Boost.Interprocess with a memory-mapped file on disk
>
> Pros: Is more along the lines of what R-trees were designed for; i.e.
> can handle indexes that do not even fit in memory. (Although even
> billions of rectangles can fit in memory on the sort of system I am
> targeting...) Can still be header-only.
>
> Cons: Possible performance penalty (?) from many many indirections
> through offset_ptr. Complex implementation; since I do not know in
> advance how large the index needs to be, I will have to write some
> sort of custom (re-)allocator and invoke it as needed.

I think your assumtions about the performance are correct but to know
exactly you'd be forced to do some benchmarking.
There is probably some cacheing involved however then probably also some
synchronization.

> My question is simple, if somewhat broad: What options and
> considerations am I missing?

Currently that's all that's implemented.

If your data fits into memory, serialization would be a good choice. Of
course assuming you don't mind using experimental code.
Another possibility is that if you had some time to spare you could
learn more about Boost.Serialization and help finishing the support.

Regards,
Adam


Geometry list run by mateusz at loskot.net