Subject: [boost] [btree] In-memory B-tree experimental work
From: Beman Dawes (bdawes_at_[hidden])
Date: 2011-07-12 12:01:25
There was quite a bit of speculation at BoostCon about the performance
of an in-memory B-tree based ordered associative container. I've done
enough experimental work to report some real numbers for an in-memory
B-tree map implementation vs std::map:
inserting 10000000 elements...
btree: 1985589.4 per sec, 81.7% faster than stl
stl: 1092714.3 per sec, 45.0% slower than btree
iterating over 6321389 elements...
btree: 588200694.1 per sec, 2323.5% faster than stl
stl: 24270489.8 per sec, 95.9% slower than btree
finding 10000000 elements...
btree: 2468384.9 per sec, 109.3% faster than stl
stl: 1179177.8 per sec, 52.2% slower than btree
erasing 10000000 elements...
btree: 2444250.3 per sec, 150.7% faster than stl
stl: 974793.1 per sec, 60.1% slower than btree
Those timings are a simple map<long, int>, and are probably about the
best case for the B-tree implementation. In a similar test of a
map<char*, int>, i.e. an indirect map using pointers to strings
containing the key, the memory B-tree was only slightly faster than
A typical red-black container has an overhead of something like 3
pointers per node, IIRC, so memory use for the B-tree container is
presumably a lot less that the STL container, even if the B-tree
container has not been packed.
Unlike the disk-resident B-tree implementation, the in-memory tree is
much closer to the STL interface. The only significant differences are
that insert/emplace/erase always invalidates iterators and Key must be
Assignable, not just CopyConstructible.
The code can be viewed or downloaded at https://github.com/Beman/memory-btree
It is currently C++0x only, and tested only with VC++ 10 and GCC 4.6.
Since this is the first time I've used move semantics, I may not be
getting full mileage out of r-value references yet.
There is no documentation yet.
I need to set this work aside for awhile to get on with other
projects. But I thought there might be interest in the timings. It
would also be interesting to push the same set of random numbers into
a std::vector, sort it, then run find, iterate, and erase, then look
at those timings.