Boost logo

Boost :

Subject: Re: [boost] Interest in B-tree library for Boost?
From: Beman Dawes (bdawes_at_[hidden])
Date: 2010-09-21 18:25:16


On Mon, Sep 20, 2010 at 7:48 PM, Phil Endecott
<spam_from_boost_dev_at_[hidden]> wrote:
> Hi Beman,
>
> Beman Dawes wrote:
>>
>> A Boost B-tree library would provide disk-based associative containers
>> that scale all the way from really, really, small to really, really,
>> large.
>
>> Any interest?
>
> Yes, of course!  A couple of questions:
>
> Does anyone have any thoughts about how atomic inserts can be implemented on
> top of this?  I'm not too concerned about recovering from random disk
> corruption, but I'd like to survive in the case where the program dies (e.g.
> another thread segfaults!) while inserting a new record.  Or, is the
> implementation "not quite atomic but safe enough" ?

The btree classes have a flush() function that forces writing out of
internal buffers. That is intended to help provide "not quite atomic
but safe enough".

Beyond that, I'd like to hear ideas. Is some form of journaling the
preferred approach?

>  I have various lumps of
> code where the data is either read-only or append-only and this is not a
> problem.  I'm now faced with something where I need proper safe writes; I
> don't want to have to use SQLlite....

How do SQLlite or other libraries provide safety against another
thread crashing? Do they actually stop the other threads until the
b-tree modification completes, have some form of recovery, or some
other approach? Do they have protection against another process
corrupting the B-tree?

>
> I'm also interested in spatial applications.  I have successfully used
> Z-curve techniques to adapt 1D containers for storing 2D (or higher) values
> and I'm sure this could also be used here (I guess that's what you mean by
> "Z or N tree").

Yes, at least as I'm using the terms. But I suspect that Bayer's
UB-trees may have gone further.

> Harder is the implementation of range storage (1D or e.g.
> bounding rectangles); Beman, you mentioned that someone had implemented an
> R-tree on top of your previous B-tree; can you say something about how that
> is done?

It has been 10 or 15 years. I can remember the general principles,
which are that keys represent minimum bounding rectangles (MBR's), and
that as the search down the tree proceeds you may have visit multiple
child nodes if the shape you are looking for spans several MBR's. Very
similar to an N-tree or Z-tree. But beyond that I'd have to dredge
around in my memory to come up with more details.

>  I have sometimes thought about how one could implement an interval
> tree as a layer on top of a simpler associative container, but have not
> found a solution.

Interesting stuff, that's for sure!

--Beman


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk