Boost logo

Boost :

Subject: Re: [boost] [stl_ext_adv] ] Interest in advanced data structures
From: Vadim Stadnik (vadimstdk_at_[hidden])
Date: 2011-12-04 02:23:09


On Sun, Dec 4, 2011 at 10:43 AM, Nathan Ridge <zeratul976_at_[hidden]>wrote:

>
> > > > > > * Question #1: why augmenting on B+ trees? The augmenting idea
> > > > > > can be as easily built on top of RB-trees, which are
> > > > > > the weapon of choice for implementing associative containers.
> > > > > > Unless proved otherwise, RB-trees are assumed to be faster than
> B+
> > > > > > trees for in-memory structures (which your own tests hint at, cf.
> > > > > > "insert N random elements" table.)
> > > > >
> > > > > Could a B+ tree library be a basis for implementing disk-backed B+
> > > > > trees? I think that would be useful. In a clever implementation
> > > > > the memory-backed and disk-backed B+ tree might even be the same
> > > > > class, with the difference factored out into a policy template
> > > > > parameter.
> > > > >
> > > > > Regards,
> > > > > Nate
> > > > >
> > > > > I understood this comment is about adding persistence or
> serialization
> > > > support to the proposed B+ trees. Please correct if this is not the
> case.
> > > > There are two levels of complexity in this area.
> > > > A trivial one is just saving and reading data as a list, since these
> B+
> > > > trees are build on top of doubly linked lists. This might be
> sufficient
> > > for
> > > > many practical applications, but certainly not for all.
> > > > Hence there is the second and less trivial problem of saving and
> reading
> > > > operations which preserve topology of these B+ trees that are
> designed to
> > > > work in RAM. (Note: "topology" has been used here as a state of
> > > > connectivity of nodes with data stored in the nodes).
> > > > In principle, the problem can be solved for the submitted B+ trees,
> but I
> > > > will need some Boost specific requirements to implement this feature.
> > >
> > > What I mean by a disk-backed B+ tree is a B+ tree where only the nodes
> that
> > > are needed at a given time are loaded into memory from disk - the
> entire
> > > tree on disk might be too large to even fit into memory.
> > >
> > > Thank you for this clarification.
> > This is certainly the third level of complexity, this is why below only
> > ideas about possible solutions.
> > Assuming that problem #2 has been already solved the reading operation
> can
> > be implemented through reading a sub-tree from the highest internal node
> > which represents the root node of the sub-tree.
> > Direct writing back the same sub-tree, which has been modified while
> > working in RAM, is not allowed, since it can destroy invariants of the
> > whole structure saved on disk. It seems to me the most reasonable option
> is
> > to implement writing as efficient splice operations (in theory they have
> > logarithmic cost), which are applied to the structure stored on disk. I
> > know the efficient splice operations are possible for B+ trees in RAM,
> but
> > I am not sure if these algorithms are also suitable for disk systems.
> >
> > Hope this quick consideration makes some useful sense.
>
> How is it done by relational database indexes? Aren't those commonly
> implemented as disk-backed B+ trees?
>
> The proposed B+ trees have not been designed for these applications. They
are implemented as dynamically allocated data structures, such as linked
lists and red black trees. One node stores only one element. Thus, they do
not store data contiguously in blocks (also named nodes!) as B+ trees for
external storage. For more details please have a look at the section:

http://dl.dropbox.com/u/51041088/stl_ext_adv_doc/doc/bp_trees.html

Please send me a reference if you know methods of disk backing for
dynamically allocated B+ trees. This can be useful in future development.

Regards,
Vadim


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk