Boost logo

Boost :

Subject: Re: [boost] Determining interest in container with efficient random access insert and erase
From: Vadim Stadnik (vadimstdk_at_[hidden])
Date: 2015-09-12 09:39:42


On Sat, Sep 12, 2015 at 8:46 PM, Ion Gaztañaga <igaztanaga_at_[hidden]> wrote:

> On 12/09/2015 1:10, Glen Fernandes wrote:
>
>> Useful links:
>>> - Source code: https://github.com/det/segmented_tree_seq
>>> - Documentation: http://det.github.io/segmented_tree_seq/
>>> - Benchmarks:
>>> https://github.com/det/segmented_tree_seq/tree/benchmarks
>>>
>>> Any feedback is greatly appreciated.
>>>
>>
>> The benchmarks are impressive. I hope to take a deeper look this
>> weekend, but from my cursory glance: the implementation looks very
>> clean too.
>>
>> Ion: Is this something you would be interested in reviewing?
>>
>
> I'd like to separate the proposal to put that data structure into Boost
> and the proposal to put it into Boost.Container.
>
> The implementation looks good although I have no experience with B+Trees
> or augmented data structures. I think we should have more innovative
> container in Boost and there have been more proposals to similar data
> structures that were not.
>
> Regarding Boost.Container, one of the biggest problems with adding new
> stuff into Boost.Container is that it should be consistent with the rest of
> the library. The library supports C++03, supports Interprocess
> requirements, recursive types, SCARY iterators, etc... it must be
> maintained and documented as the rest of the library. I wouldn't like to
> maintain data structures I really don't fully understand or just left them
> unmaintained. We added stable_vector and static_vector but those were
> easier data structures.
>
> Just like we have Boost.CircularBuffer or Boost.Unordered I'd like to
> propose adding adding new containers in its own library. Maybe an
> (optionally augmented) B+Tree library would be nice, and it could also
> contain a sequence container based on that data structure.
>
> Just my 2 cents, does it make sense?
>
>
All of your considerations make sense. I understand technical problems that
can be created by adding other (potentially quite different) types of
containers to Boost libraries. But I am sure that there will be new
proposals and new requests for advanced data structures. Sequences are
everywhere in real life applications, not only texts and strings. In
addition to this, there is an interesting lesson that I have learned since
written "STL extensions" library.

These days big interest and selling point for augmented data structures is
not so much single augmenting by count of container elements, but multiple
augmenting, which supports fast summation algorithms (more generally fast
computations using binary operations). These data structure can be used to
compute important statistical parameters, such as mean value and standard
deviation, in logarithmic running time on cheap single processor systems.
The algorithms are general and can be applied to both ordered and unordered
data sets. Alternative parallel computations require supercomputers to
achieve the same performance.

The application area - big data and data science.

Regards,
Vadim Stadnik


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk