Boost logo

Geometry :

Subject: [geometry] Approach(es) to creating persistent index::rtree?
From: Patrick J. LoPresti (lopresti_at_[hidden])
Date: 2014-07-16 20:41:55


(Lots of questions from me this week, I know...)

For my application, I need to create a spatial index once, save it to
disk, and use it (read-only) for queries many times in future
processes. So I need an approach for creating a persistent index on
disk.

Some of my data sets might have ~1 billion rectangles today and more
in the long run. I have a lot more time to generate the index than to
read and query it.

I am aware of three approaches:

Approach #1: Save the raw Value data in whatever format I like. When I
need the index, load the from disk and create the rtree on the fly via
the bulk loading (packing) algorithm.

Pros: Very easy to implement, and this will probably be my approach
for my prototype.

Cons: Too slow for large data sets.

Approach #2: Use boost::serialization. I do not see this described in
the Boost.Geometry documentation -- because it is still experimental?
-- but I do see the example in geometry/index/example/serialize.cpp.

Pros: Looks pretty easy to implement, even though I know essentially
nothing about boost::serialization.

Cons: I know essentially nothing about boost::serialization :-). But I
do know it is not "header only", so I presume Boost.Geometry would
inherit this unfortunate property. Also I suppose I will have to learn
how to tell boost::serialization about my Value classes (?). I do not
know what performance to expect.

Approach #3: Use Boost.Interprocess with a memory-mapped file on disk

Pros: Is more along the lines of what R-trees were designed for; i.e.
can handle indexes that do not even fit in memory. (Although even
billions of rectangles can fit in memory on the sort of system I am
targeting...) Can still be header-only.

Cons: Possible performance penalty (?) from many many indirections
through offset_ptr. Complex implementation; since I do not know in
advance how large the index needs to be, I will have to write some
sort of custom (re-)allocator and invoke it as needed.

My question is simple, if somewhat broad: What options and
considerations am I missing?

Thank you!

 - Pat


Geometry list run by mateusz at loskot.net