Boost logo

Boost :

Subject: Re: [boost] [BTREE] Get BTREE library into boost - Market Research
From: Beman Dawes (bdawes_at_[hidden])
Date: 2012-12-20 10:45:00


On Thu, Dec 20, 2012 at 6:30 AM, Antony Polukhin <antoshkka_at_[hidden]> wrote:
...
>
> I did not look through Boost.BTREE implementation, but I worked with
> some proprietary btree implementations and following features are
> quite useful or provide good performance:
>
> 1) CRC for page (extremely helpful for data corruption detection and for debug)

That would be a useful option. The tricky part is how to specify it so
that there is zero cost if you don't want invoke the option. The
implementation already allows recovery of non-corrupt pages in the
event of a failure.

> 2) Data compression for pages (less file sizes, less memory usage,
> ordered data can be compressed very good)

I'm strongly against that. The full rationale for not doing
compression is lengthy research paper, but the bottom line is what
Rudolf Bayer said so many years ago with regard to prefix and suffix
key compression - the increased complexity and reduced reliability
makes compression very unattractive.

The problems of compression are closely related to the problems of
variable length data. If the application is willing to tolerate
sequential (rather than binary) search at the page level, or is
willing to tolerate an indexed organization on pages, or even (gasp!)
an additional disk access per comparison, these problems aren't
necessarily showstoppers. But if some applications won't tolerate even
a dispatch (either a virtual function or a hand-rolled dispatch) to
select the approach being employed, then the only choice I can see is
to provide essentially multiple sets of classes, and that gets complex
and messy.

> 3) Ability for user to provide custom read/write mutexes (fake mutex,
> interprocess mutex, std::mutex)

There is a spectrum of needs. I've seen various designs that are
optimal for various points in that spectrum. Can you point to any
design that is optimal across the spectrum from single thread, single
process, single machine, on up through multi-thread, multi-process,
multi-machine?

> 4) Ability for user to provide stateful allocators (might be usefull
> for those, who use interprocess)

Agreed, and I'm hoping that C++11 allocators will ease that problem. I
haven't looked at it yet, however.

> 5) Ability for user to provide comparators

Yes. That's a requirement for an STL-like approach, so was built in
right from the start.

> 6) Search methods can be speed up, if a `position hint` iterator is
> provided (just like in ` iterator std::set::insert (iterator position,
> const value_type& val);`)

The implementation already does a lot of page caching; if there is an
outstanding iterator the page is already in memory, as are all pages
above it in the Btree. Thus it isn't obvious that a hint would
actually have much effect. But it would be interesting to add such a
search function and run some timings.

Thanks for the suggestions!

--Beman


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk