Boost logo

Boost :

Subject: Re: [boost] [BTREE] Get BTREE library into boost - Market Research
From: Rutger ter Borg (rutger_at_[hidden])
Date: 2012-12-22 04:09:08


On 21-12-2012 23:13, Beman Dawes wrote:
>
> It is a btree. That's a very specific data structure that is the same
> in memory and on disk. There is no serialization.

I always thought a B-tree is an (abstract) data-structure. As such,
implementation details, such as what is on disk and what is in memory,
are irrelevant.

Could you please point to the wording where it says that it has to be
identical on disk and memory? Maybe I misunderstood Wikipedia and
several other in-depth research papers discussing them.

>> Do pages need to
>> be kept in a serialized state in memory?
>
> It makes no sense to talk about serializing a btree. If you serialized
> it, it would no longer be a btree.

I was talking about the serialization of the nodes, or pages, of a
b-tree. I am claiming it will still be a B-tree if you do so. Store a
bunch of keys and { data, references } in nodes. No matter what the
storage scheme of the node is (e.g., by serialization), IMHO it will
still be a B-tree.

> Is it wrong to, say, represent a
>> page with a std::map in memory, and serialize to a binary page when writing?
>
> That would be very, very, slow because of serialization. Since Bayer
> invented both the btree and also the red-black tree used to implement
> maps, presumably he would have combined them if there was any
> advantage.

I guess it will be slower for writes, but couldn't it be faster for
doing (many) lookups? What operation happens more often?

>
>> In the latter case, compression seems to be less difficult.
>
> One commonly used approach to deal with large volumes of variable
> length data, such as compressed data, is to put the data in a
> sequential file that can be accessed randomly, and put the key into a
> btree that as a payload has only the position (and possibly length) of
> the actual data in the sequential file. In other words, use the btree
> as an index to the data rather than a container for the data itself.
>

The problem of variable-length keys still has to be solved, then.

Cheers,

Rutger


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk