|
Boost : |
Subject: [boost] Proposal: Vector-like container, which takes O(log(N)) for insert/erase, feedback
From: Alexander Kuprijanov (alexkupri_at_[hidden])
Date: 2014-04-05 09:45:51
Hi, everybody!
The most important feature of my container is performance. However,
other features will appear later.
https://github.com/alexkupri/array/blob/master/description/description.txt
Today I compared my container with rope. My container outperformed it 10
times.
> Date: Thu, 3 Apr 2014 16:39:29 -0700
> From: Glen Fernandes <glen.fernandes_at_[hidden]>
> I wasn't sure at first why my name was in parenthesis in the e-mail
> subject. :-)
> Glen
It's my fault, sorry about it.
> Date: Fri, 04 Apr 2014 13:41:04 +0200
> From: Adam Wulkiewicz <adam.wulkiewicz_at_[hidden]>
> Subject: Re: [boost] Proposal: Vector-like container, which takes
> O(log(N)) for insert/erase
> AFAIU just like a B-tree can be seen as a generalization of a binary
> search tree in that a node can have more than two children, your
> container can be seen as a similar generalization of a Rope. Or it can
> be seen as an augumented B-tree (not really because the values are no
> longer stored in internal nodes and the tree isn't sorted, correct me
> if I'm wrong). Your
You understood right.
> proposal is very similar to Counted B-tree
> (http://www.chiark.greenend.org.uk/~sgtatham/algorithms/cbtree.html),
> which is a B-tree with random access. The difference is that your
> container can't be used to search for a stored value in O(logN)
> without sorting and binary search. This may be ok if someone require
> only random access, the size of a node is smaller which probably
> improves the cacheing.
Again, right. There are a lot of situations when only random access is
necessary. You probably will not sort characters in string.
>
> In general, B-tree was introduced to improve the performance of the
> access to data stored in persistent memory (lower performance). In
> this case it was better to load the whole page into memory (and cache
> it) than load many small, scatered parts. But in the case of
> RAM-only-access, a binary search tree should be faster. As far as we
> don't consider spatial indexing, interval trees, etc. where
> values/nodes may "overlap" somehow and we must search in more than one
> subtree. Therefore I'm wondering if a classic Rope would be faster
> than your container in the use-case you're testing - simple insert and
> random
It outperformed Rope 10 times. It's expected result. Please visit my
link, I added verbose explanation under "ROPE TEST" section. Shortly
speaking, four reasons exist: cache, pages, balancing and malloc/free.
> access to a in-memory container. Have you test it? I'd perform 2 tests:
> 1. Very big number of values + truly random insert/access to prevent
> cacheing
> 2. insert/access in the same area
I've tested with truly random tests. Please see results and code of
SingleOperationPerformanceCheck and MultipleOperationsTest functions.
Tell me, if I'm wrong.
>
> IMHO the benefits related to the use of this container should be more
> noticable if the persistent storage were used.
In memory/cache there are the same relationship as in HD/memory.
>
> As for the name, it's definietly not an array. I'd call it B-rope.
Thanks, your option is accepted. I'll collect more options and we'll
vote. I like brope.
>
> Regards,
> Adam
> Message: 9
From: Lars Viklund <zao_at_[hidden]>
> Date: Fri, 4 Apr 2014 14:10:06 +0200
> If it doesn't have contiguous storage, it's not a replacement for
> vector, however much you want it to be the universal container that
> everyone uses for everything.
> It may be a better general-purpose container compared to the
> usual default containers, deque and vector, but it isn't a replacement
> for either of them.
Well, I would see it as first choise container for non-sorted data. It's
comparable with vector/deque
on small arrays and outperforms them on large sizes. It's more
preferrable than list at least
in terms of malloc/free calls. And it's probably better for strings.
So one should use it by default in all cases with unsorted data and use
list, vector, deque and string only
if he/she really understands what he/she does.
> You use the term 'intrusive' in the documentation, but it doesn't seem
> to carry the usual meaning that it requires the types involved to
> participate in the container structure.
You are right. I've corrected the documentation. Now I call "fat leaves"
leaves which contain elements, and
"thin leaves" leaves which contain pointers to elements.
> Your tests doesn't seem to exercise much else than container<int>
> either.
Suggest your tests.
> I'm not quite sure I like the explicit L/M non-type template parameters
> either, considering that it leaks implementation detail into every
> single user of the type.
Well, you are right in principle. It must have default parameters. But
it's platform dependent,
they should be chosen so that branch and leaf would take roughly 1 cache
line, minus information
required for malloc overhead.
> As for the Copy-on-Write semantics, isn't the general consensus that CoW
> is a horrible horrible thing and is very hard to get right+fast in the
> modern world with concurrency?
I disagree with your estimation. There are different situations,
different machines. Embedded and mobile machines
want to save memory. Large machines can also want to save memory, if
they amount of data is large. Finally, they
can even speed-up by avoiding copying the whole array.
Imagine you are writing text editor. (DNA editor, sound editor, etc.)
You make backup copy and continue editing,
again backup copy and continue editing. And you keep only forward
commands (without undo). You can easily rollback
to any state you need.
Ofcourse, user should decide, whether CoW option is necessary for
him/her, because it wastes resources when enabled
and not used.
>
> Message: 11
> Date: Fri, 4 Apr 2014 23:23:09 +1000
> From: Vadim Stadnik <vadimstdk_at_[hidden]>
> Could you please provide a figure that shows data association with
> nodes in
> your data structure?
I updated the doc, follow the link above.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk