Boost logo

Boost :

Subject: Re: [boost] Interest in B-tree library for Boost?
From: Beman Dawes (bdawes_at_[hidden])
Date: 2010-09-16 09:22:01


On Wed, Sep 15, 2010 at 10:40 PM, Cory Nelson <phrosty_at_[hidden]> wrote:
> On Wed, Sep 15, 2010 at 6:08 PM, Beman Dawes <bdawes_at_[hidden]> wrote:
>> On Wed, Sep 15, 2010 at 7:26 PM, Cory Nelson <phrosty_at_[hidden]> wrote:
>>> - Will this support multiple B-trees in one file?
>>
>> Not currently. That would be relatively easy to support. What is the
>> motivating use case?
>
> Just ease of distribution, really.
>
> I'm currently using a SQLite database to distribute a few tables of
> read-only data that doesn't need SQL, relational, or ACID properties
> -- it's just the best thing available in terms of ease of distribution
> and portability.  I'd love to remove all the unneeded layers and just
> use a B-tree directly.

Sounds quite useful. I can think of several past apps that would have benefited.

I've added this to the Wish List:

Multiple B-trees in one file. Requested by Cory Nelson. Possible
design: Initial btree_map in the file is an index of its children.
Key: name, Data: page id of the child's header page. Recursion
allowed; the header page pointed could itself be an index header.
Current interface might gain additional constructor and open()
overload taking a ref to the parent index and the desired name.

>
>>> - Will this implement ACID properties?
>>
>> No. The model is a standard library associative container, not a
>> database. ACID properties can be built on top of B-trees, but the
>> B-trees themselves don't have ACID properties, other than a few
>> aspects like atomic writes.
>
> Alright, that clears up my understanding about the scope of this.
> Still sounds great!
>
>>> - Can this be adapted for in-memory use as well, with full non-POD support?
>>
>> No current plans for that. Why wouldn't you just a standard library
>> associative container for that?
>
> Storing keys together improves cache usage a lot for small keys, and
> allows the CPU to detect ordered iteration to prefetch the next cache
> line.  It also reduces fragmentation and the maximum number of pages
> that must be touched to find a result, which in turn reduces TLB
> misses and page faults.  The tests here indicate that this can be very
> significant with large collections:

That's also the point made by the article Thorsten referenced,
http://queue.acm.org/detail.cfm?id=1814327, although it is claiming
much greater speedups IIRC.

>
> http://idlebox.net/2007/stx-btree/stx-btree-0.8.3/doxygen-html/speedtest.html

Interesting. So someone has already done the work for drop in STL
associative container replacements.

The timings only covered small trees. 16,000 was the number of
elements mentioned. I'm testing with up to 100 million elements, and
plan to expand that to a few billion elements.

Thanks,

--Beman


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk