Boost logo

Boost :

Subject: Re: [boost] Proposal: Vector-like container, which takes O(log(N)) for insert/erase (Glen Fernandes)
From: Adam Wulkiewicz (adam.wulkiewicz_at_[hidden])
Date: 2014-04-04 07:41:04


Hi,

Alexander Kuprijanov wrote:
> Hi, again,
> I've implemented a container which should replace vector and have
> O(log(N)) for insert/delete.
> Please review.
> https://github.com/alexkupri/array/blob/master/description/description.txt
>
> The code is small, the description is verbose.

AFAIU just like a B-tree can be seen as a generalization of a binary
search tree in that a node can have more than two children, your
container can be seen as a similar generalization of a Rope. Or it can
be seen as an augumented B-tree (not really because the values are no
longer stored in internal nodes and the tree isn't sorted, correct me if
I'm wrong). Your proposal is very similar to Counted B-tree
(http://www.chiark.greenend.org.uk/~sgtatham/algorithms/cbtree.html),
which is a B-tree with random access. The difference is that your
container can't be used to search for a stored value in O(logN) without
sorting and binary search. This may be ok if someone require only random
access, the size of a node is smaller which probably improves the cacheing.

In general, B-tree was introduced to improve the performance of the
access to data stored in persistent memory (lower performance). In this
case it was better to load the whole page into memory (and cache it)
than load many small, scatered parts. But in the case of
RAM-only-access, a binary search tree should be faster. As far as we
don't consider spatial indexing, interval trees, etc. where values/nodes
may "overlap" somehow and we must search in more than one subtree.
Therefore I'm wondering if a classic Rope would be faster than your
container in the use-case you're testing - simple insert and random
access to a in-memory container. Have you test it? I'd perform 2 tests:
1. Very big number of values + truly random insert/access to prevent
cacheing
2. insert/access in the same area

IMHO the benefits related to the use of this container should be more
noticable if the persistent storage were used.

The times measured in benchmarks are very small ~e-04. You can't rely on
such small values. I'd keep them above 1s.

As for the name, it's definietly not an array. I'd call it B-rope.

Regards,
Adam


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk