Subject: Re: [boost] Proposal: Monotonic Containers - Comparison with boost::pool, boost::fast_pool and TBB
From: Simonson, Lucanus J (lucanus.j.simonson_at_[hidden])
Date: 2009-06-22 12:12:04
Christian Schladetsch wrote:
> There are exceptions and other vagaries that are yet to be explained.
> Even though I've added a pool system, list_sort<int> is still faster
> with boost::fast_pool and TBB on MSVC. This is surprising, as
> list_create<int> is faster for monotonic. So this means that the sort
> itself - and not the allocation - is slower when using a 'monotonic
> data-structure'. I have spent some time investigating this and have
> yet to find a good answer.
My guess is that the algorithm is allocating and deallocating in an n^2 loop. By using n^2 memory for O(n) elements you end up with elements spread throughout a large address range, though weighted more toward the recently allocated addresses. Peak memory will be a good metric. Do you have access to VTune? You seem to struggle to identify the cause of performance loss and are reduced to guesswork. There also exist memory profiling tools. In linux you can poll /proc to get memory usage. There are little linux utilities out there called memtime and such that work this way.
I'd like to suggest another benchmark. This is an example of something I think would be very bad for your allocator. Build a std::map of size n. Loop for O(n^2) iterations. In each iteration insert one random element and lookup with lower_bound one random element and remove it. Precompute the random numbers and don't include the rand() calls in the time measurement of the benchmark. Your allocator will use O(n^2) memory since it never recovers deallocated memory, so if you set n close to the size of the L1 or L2 cache you should see dramatic performance loss. This is my typical use case for a std::map.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk