Boost logo

Boost :

Subject: Re: [boost] [BTREE] Get BTREE library into boost - Market Research
From: Antony Polukhin (antoshkka_at_[hidden])
Date: 2012-12-22 10:38:18


2012/12/22 Beman Dawes <bdawes_at_[hidden]>:
> On Fri, Dec 21, 2012 at 11:56 AM, Cory Nelson <phrosty_at_[hidden]> wrote:
>> I do see a use for compression, and I'd definitely like the option.
>> Not necessarily prefix/suffix folding, but a general (ie. zlib) page
>> compression could come in handy on e.g. phones and SD cards, or over a
>> network -- scenarios where I/O is particularly expensive and CPU is
>> plentiful.
>
> That's an interesting point!
>
> If the cost of data transmission + cost of decompression of a
> compressed page was less that the cost of data transmission of an
> uncompressed page, compression would appear to make sense. But if
> transmission is that slow, an even better approach might be to just
> send the search key, do the search on the server, and return the found
> data only.

It is also faster to read multiple compressed pages from disk at once
and uncompress them, rather than reading multiple uncompressed pages
(As I remember guys from UPX measured that)

Also, page with ordered data can be compressed extremely effectively,
leading up to 20 times less files (remember, that big parts of btree
can be filled with zeros because they are empty)

2012/12/20 Beman Dawes <bdawes_at_[hidden]>:
> On Thu, Dec 20, 2012 at 6:30 AM, Antony Polukhin <antoshkka_at_[hidden]> wrote:
>> 6) Search methods can be speed up, if a `position hint` iterator is
>> provided (just like in ` iterator std::set::insert (iterator position,
>> const value_type& val);`)
>
> The implementation already does a lot of page caching; if there is an
> outstanding iterator the page is already in memory, as are all pages
> above it in the Btree. Thus it isn't obvious that a hint would
> actually have much effect. But it would be interesting to add such a
> search function and run some timings.

For big enough trees there is a chance, that lots of iterators iterate
through different leaf nodes, so the pages above are not in memory any
more. In that case searches with hint can give some advantage, however
it is a rare use case. For cases, when all the pages are already in
memory this optimization is doubtful.

--
Best regards,
Antony Polukhin

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk