Boost logo

Boost :

From: Jooseong Lee (frozenca91_at_[hidden])
Date: 2022-07-26 08:04:28


Dear Phil, FYI,

>I would be interested to see a comparison with Beman Dawes'
>B-Tree library from 2010:

https://github.com/Beman/btree seems to depend on an older version of
Boost.Endian,
It doesn't compile with the recent versions of boost. I can't find a proper
version of Boost.Endian which is compatible with.

To compare with Google's 2011 B-Tree implementation, (
https://code.google.com/archive/p/cpp-btree/)
I saw the following results:
https://github.com/frozenca/BTree/tree/295a9ec4b5495a83c546fd3e300d54d4c78d8ad8#Performance

Regards,
Jooseong

2022년 7월 26일 (화) 오전 12:03, Jooseong Lee <frozenca91_at_[hidden]>님이 작성:

> >I see that you have O(log N) n-th element access, that it is an
> >interesting feature. Does this mean that every non-leaf node has a
> >count of its descendants?
>
> Yes, every non-leaf (and leaf) node has a count of keys in the subtree
> where that node is the root.
>
> > I don't see how erase(a,b) can be O(log N), is that a typo?
>
> No, it's not.
> In order to explain this, I have to explain split() first:
> split(Tree&& tree, key a, key b) splits a tree to two trees, say tree_l
> and tree_r, so that
> tree_l contains all keys less than a, tree_r contains all keys greater
> than b.
> By B-Tree's nature, this is O(log N).
>
> erase(a, b) works like this:
> 1. compute lb = lower_bound(a), ub = upper_bound(b) pair. These are
> iterators. O(log N)
> 2. compute order(lb), order(ub). Therefore, order(ub) - order(lb) is the
> number of elements that will be erased. O(log N)
> 3-1. When order(ub) - order(lb) < 30, the tree erases element one by one
> 3-2. When it exceeds 30, the algorithm first splits the tree to two trees
> by calling spit(std::move(*this), lb, ub)
> and joins the two split trees. split() and join() is O(log N), and there
> is one split() call and join() call,
> so the whole complexity is O(log N)
>
>
> I plan to benchmark mine with abseil's B-Tree soon.
> Thank you for pointing out Beman's implementation and sqlite, I'll list
> them too.
>
> I'll add Boost license too, thanks for that
>
>
> Best,
> Jooseong
>
> 2022년 7월 25일 (월) 오후 11:29, Phil Endecott via Boost <boost_at_[hidden]>님이
> 작성:
>
>> Jooseong Lee wrote:
>> > I recently wrote a general-purpose STL-like ordered associative B-Tree
>> > based container in C++.
>> > .
>> > Its API and usage is very similar to std::set, std::map, std::multiset,
>> > std::multimap,
>> > and when the number of elements is relatively large, my benchmark shows
>> > that it is 2.5~3 times faster than std::set and its friends.
>> >
>> > Repo link: https://github.com/frozenca/BTree
>>
>> I would be interested to see a comparison with Beman Dawes'
>> B-Tree library from 2010:
>>
>> https://lists.boost.org/Archives/boost//2010/09/170995.php
>> https://github.com/Beman/btree
>>
>> Personally, I often use flat sets/maps. Of course these are best
>> suited when lookups dominate over modifications. It would be interesting
>> to know how your library compares for lookups.
>>
>> I see that you have O(log N) n-th element access, that it is an
>> interesting feature. Does this mean that every non-leaf node has a
>> count of its descendants?
>>
>> I don't see how erase(a,b) can be O(log N), is that a typo?
>>
>> In the past I have used a lot of memory-mapped, mostly read-only,
>> binary data structures but these days I'm more likely to use
>> sqlite. Have you tried to benchmark against that?
>>
>> I note it is not Boost licensed.
>>
>>
>> Regards, Phil.
>>
>>
>>
>>
>>
>> _______________________________________________
>> Unsubscribe & other changes:
>> http://lists.boost.org/mailman/listinfo.cgi/boost
>>
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk