Boost logo

Boost :

Subject: Re: [boost] [BTREE] Get BTREE library into boost - Market Research
From: Cory Nelson (phrosty_at_[hidden])
Date: 2012-12-21 11:56:06


On Fri, Dec 21, 2012 at 2:26 AM, Rutger ter Borg <rutger_at_[hidden]> wrote:
>
> On 2012-12-20 16:45, Beman Dawes wrote:
>>
>>
>>> 2) Data compression for pages (less file sizes, less memory usage,
>>> ordered data can be compressed very good)
>>
>>
>> I'm strongly against that. The full rationale for not doing
>> compression is lengthy research paper, but the bottom line is what
>> Rudolf Bayer said so many years ago with regard to prefix and suffix
>> key compression - the increased complexity and reduced reliability
>> makes compression very unattractive.
>>
>> The problems of compression are closely related to the problems of
>> variable length data. If the application is willing to tolerate
>> sequential (rather than binary) search at the page level, or is
>> willing to tolerate an indexed organization on pages, or even (gasp!)
>> an additional disk access per comparison, these problems aren't
>> necessarily showstoppers. But if some applications won't tolerate even
>> a dispatch (either a virtual function or a hand-rolled dispatch) to
>> select the approach being employed, then the only choice I can see is
>> to provide essentially multiple sets of classes, and that gets complex
>> and messy.
>
>
> At what point do you expect serialization to be executed? Do pages need to
> be kept in a serialized state in memory? Is it wrong to, say, represent a
> page with a std::map in memory, and serialize to a binary page when writing?
> In the latter case, compression seems to be less difficult.
>

It's been some time since I've taken a look at Beman's btree code, so
if I'm wrong about something please correct me.

There is no support for variable-length keys at the moment. The code
is structured to always have the same number of keys in one page.
Compression would throw this off, adding a good bit of complexity.

Also, I believe the pages are kept in a blittable format (i.e. you can
mmap it and use it right away without any deserializing), so there's a
relatively minor complexity of changing that.

I do see a use for compression, and I'd definitely like the option.
Not necessarily prefix/suffix folding, but a general (ie. zlib) page
compression could come in handy on e.g. phones and SD cards, or over a
network -- scenarios where I/O is particularly expensive and CPU is
plentiful.

-- 
Cory Nelson
http://int64.org

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk