
Boost : 
Subject: Re: [boost] [stl_ext_adv] ] Interest in advanced data structures
From: Vadim Stadnik (vadimstdk_at_[hidden])
Date: 20111203 18:05:05
On Sun, Dec 4, 2011 at 9:28 AM, Nathan Ridge <zeratul976_at_[hidden]> wrote:
>
> > > > * Question #1: why augmenting on B+ trees? The augmenting idea
> > > > can be as easily built on top of RBtrees, which are
> > > > the weapon of choice for implementing associative containers.
> > > > Unless proved otherwise, RBtrees are assumed to be faster than B+
> > > > trees for inmemory structures (which your own tests hint at, cf.
> > > > "insert N random elements" table.)
> > >
> > > Could a B+ tree library be a basis for implementing diskbacked B+
> > > trees? I think that would be useful. In a clever implementation
> > > the memorybacked and diskbacked B+ tree might even be the same
> > > class, with the difference factored out into a policy template
> > > parameter.
> > >
> > > Regards,
> > > Nate
> > >
> > > I understood this comment is about adding persistence or serialization
> > support to the proposed B+ trees. Please correct if this is not the case.
> > There are two levels of complexity in this area.
> > A trivial one is just saving and reading data as a list, since these B+
> > trees are build on top of doubly linked lists. This might be sufficient
> for
> > many practical applications, but certainly not for all.
> > Hence there is the second and less trivial problem of saving and reading
> > operations which preserve topology of these B+ trees that are designed to
> > work in RAM. (Note: "topology" has been used here as a state of
> > connectivity of nodes with data stored in the nodes).
> > In principle, the problem can be solved for the submitted B+ trees, but I
> > will need some Boost specific requirements to implement this feature.
>
> What I mean by a diskbacked B+ tree is a B+ tree where only the nodes that
> are needed at a given time are loaded into memory from disk  the entire
> tree on disk might be too large to even fit into memory.
>
> Thank you for this clarification.
This is certainly the third level of complexity, this is why below only
ideas about possible solutions.
Assuming that problem #2 has been already solved the reading operation can
be implemented through reading a subtree from the highest internal node
which represents the root node of the subtree.
Direct writing back the same subtree, which has been modified while
working in RAM, is not allowed, since it can destroy invariants of the
whole structure saved on disk. It seems to me the most reasonable option is
to implement writing as efficient splice operations (in theory they have
logarithmic cost), which are applied to the structure stored on disk. I
know the efficient splice operations are possible for B+ trees in RAM, but
I am not sure if these algorithms are also suitable for disk systems.
Hope this quick consideration makes some useful sense.
Regards,
Vadim
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk