Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] Proposal: Monotonic Containers - Comparison with boost::pool, boost::fast_pool and TBB
From: Simonson, Lucanus J (lucanus.j.simonson_at_[hidden])
Date: 2009-06-22 12:12:04

Next message: Ion Gaztañaga: "Re: [boost] [Interprocess::Semaphore] Deadlock on more producers - one consumer problem"
Previous message: Ion Gaztañaga: "Re: [boost] [Interprocess::Semaphore] Deadlock on more producers - one consumer problem"
In reply to: Christian Schladetsch: "Re: [boost] Proposal: Monotonic Containers - Comparison with boost::pool, boost::fast_pool and TBB"
Next in thread: Christian Schladetsch: "Re: [boost] Proposal: Monotonic Containers - Comparison with boost::pool, boost::fast_pool and TBB"
Reply: Christian Schladetsch: "Re: [boost] Proposal: Monotonic Containers - Comparison with boost::pool, boost::fast_pool and TBB"

Christian Schladetsch wrote:

> There are exceptions and other vagaries that are yet to be explained.
> Even though I've added a pool system, list_sort<int> is still faster
> with boost::fast_pool and TBB on MSVC. This is surprising, as
> list_create<int> is faster for monotonic. So this means that the sort
> itself - and not the allocation - is slower when using a 'monotonic
> data-structure'. I have spent some time investigating this and have
> yet to find a good answer.

My guess is that the algorithm is allocating and deallocating in an n^2 loop. By using n^2 memory for O(n) elements you end up with elements spread throughout a large address range, though weighted more toward the recently allocated addresses. Peak memory will be a good metric. Do you have access to VTune? You seem to struggle to identify the cause of performance loss and are reduced to guesswork. There also exist memory profiling tools. In linux you can poll /proc to get memory usage. There are little linux utilities out there called memtime and such that work this way.

I'd like to suggest another benchmark. This is an example of something I think would be very bad for your allocator. Build a std::map of size n. Loop for O(n^2) iterations. In each iteration insert one random element and lookup with lower_bound one random element and remove it. Precompute the random numbers and don't include the rand() calls in the time measurement of the benchmark. Your allocator will use O(n^2) memory since it never recovers deallocated memory, so if you set n close to the size of the L1 or L2 cache you should see dramatic performance loss. This is my typical use case for a std::map.

Regards,
Luke

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk