Boost logo

Boost :

Subject: Re: [boost] [Heaps] benchmarks
From: Tim Blechmann (tim_at_[hidden])
Date: 2011-06-08 03:38:54


hi phil,

> http://chezphil.org/tmp/bench.tgz
>
> This is a "work queue", i.e. it's a queue of
> boost::function<void(void)> "work items" with float priorities. It's
> similar to a real application with fast (burst) fill and gradual drain;
> the pattern here is 10 pushes and 1 pop, repeated until all the work
> has been added, and then all of the work is popped in sequence. The
> priorities are essentially random, and there are 100 000 items in total.

thanks for your benchmark ... the pattern is different from the ones that i have
tested.

> I was unable to try the proposed d_ary_heap because I couldn't find a
> working syntax for the arity template parameter - any ideas anyone?

:) the scope operator uses two colons:

boost::heap::d_ary_heap< value_t,
                         boost:heap::arity<4>, // This doesn't work
                         boost::heap::compare<compare_priorities> >

boost::heap::d_ary_heap< value_t,
                         boost::heap::arity<4>,
                         boost::heap::compare<compare_priorities> >

> I was expecting the b_heap to be faster, but it is not; it is
> substantially slower. What is going on there? Does it all come down
> to size of data, cache size, etc? Ideally, an "improved"
> implementation like this would be able to fall back to the naive
> implementation in cases where it does not function any better. Or
> maybe I'm using it wrongly.

well, benchmarking this can be tricky ... first, your work is boost::funtion<>
(iirc 3 pointers) and a float ... so 16 bytes on 32bit or 28 bytes on 64bit ...
on 32bit systems you will have a better matching with actual physical cache
lines ...
second ... synthetical benchmarks are not necessarily the best way to test
cache-aware data structures: when stressing the heap, all its content will be
located in the cpu caches ... so you won't necessarily see the benefit of the
cache-awareness, since the data of the non-cache-aware data structure are still
in the CPU cache. unless you manually add some work to thrash the caches ... in
your case, you will have a maximum of 90000 items in the queue, on 32bit
platforms this is rougly 1.4mb ... you won't see a big effect, if this fits all
into the caches ...

not sure, if there is a good way to artificially thrash the caches in order to
observe the effect in a microbenchmark ...

> My understanding is that the fibonacci, binomial and other heaps only
> improve things when other operations e.g. merging are needed - is that
> correct?

more or less ... unlike the container adaptors, the fibonacci heap is the only
implementation, which provides an O(1) insert and increase ...

cheers, tim




Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk