Subject: Re: [boost] Proposal: Monotonic Containers - Comparison with boost::pool, boost::fast_pool and TBB
From: Christian Schladetsch (christian.schladetsch_at_[hidden])
Date: 2009-06-21 23:21:52
> Looking at your MSVC data it seems that the performance advantage of
> monotonic over other allocators diminishes as the benchmark size increases.
> I see this trend is most of the MSVC benchmarks. I'd like to see the
> benchmarks extended to see if monotonic becomes slower than the other
> allocators, as the trends suggest it might.
I have added three sections to the benchmark:
Small: ~100 elements
The results are here for GCC and MSVC: http://tinyurl.com/lj6nab
As one would expect, doing less work means getting it done faster, so in
general monotonic is faster than the other allocation schemes measured.
There are exceptions and other vagaries that are yet to be explained. Even
though I've added a pool system, list_sort<int> is still faster with
boost::fast_pool and TBB on MSVC. This is surprising, as list_create<int> is
faster for monotonic. So this means that the sort itself - and not the
allocation - is slower when using a 'monotonic data-structure'. I have spent
some time investigating this and have yet to find a good answer.
OTOH, vector_create and vector_sort are both faster with monotonic. This is
also surprising for different reasons. Specifically, the results for
vector_sort for large container size (500,000) is consistently 1.2x faster
on MSVC and up to 2.9x faster on GCC with monotonic than the other schemes.
Given that sorting 500,000 elements will swamp the cost of reserving the
size for it, the question is..
Why? I don't really know. It may be because monotonic allocation uses the
BSS first, but I cannot say with any certainty.
> Have you considered that extending the buffer instead of reusing the buffer
> would lead to loss of cache performance because reuse of in cache memory
> will be faster than bringing in new memory all the time with cache misses?
It's a good point, and that is why I added monotonic::chain<> to the
proposal. This is not part of the current benchmarks because I wanted to
compare just the allocation schemes, and the data-structures they produce
> If you are on a system with limited cache size this would be a more serious
Indeed. Given that I am predominantly interested in efficiency and size, I
am acutely aware of this. There is a case to be made for adding an
introspection system to the debug build that warn the user about misuse.
Consider for example the numbers for boost::fast_pool for the map_vector
test, which are pathological.
Thanks for your thoughts again Luke. I am interested in what you think of
the latest numbers that include large datasets, and especially the numbers
for list_sort and vector_sort.
As suggested earlier, I am yet to add measurements for total memory used (is
there a portable way to measure this?). I also need to add some statistical
analysis and summaries, and compare with google allocator (which I have been
unable to find with a quick search). Any ideas for other tests to add would
be appreciated as well.
I also want to again raise the issue of boost::pool_allocator and
boost::fast_pool_allocator leaking. AFAICT, when these are used with a
system that rebind<>'s the allocator, it results in a persistent memory leak
(unless you happen to know the new sizes of the allocations thence made).
These allocators (on 1.38) also fail to compile when specialised over void
on GCC. I'll supply a patch for that soon.
Boost list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk