|
Boost : |
Subject: Re: [boost] [review] Heaps
From: Tim Blechmann (tim_at_[hidden])
Date: 2011-06-02 13:15:12
> > "VE: How the objects per page impact the behavior/performances of the
> > b_heap?
> > AS: I think that there may be a serious design flaw here. The number
> > of objects per page should depend on the size of the object--much like
> > the implementation of deque.
> >
> > I think that you should be able to find a function that derives this
> > value based on the size of the object and the cache properties of the
> > host machine. A non-trivial, but I think important feature.
> >
> > This discussion should be elevated to the mailing list.
>
> In retrospect, "serious design flaw" is not a good description for
> this issue :) but I think that this is important to address.
>
> I was going to bring this up today. My suspicion is that allowing
> users to freely specify the number of objects per page could lead to
> performance degradation if they choose unwisely. I further suspect
> that you can formulate some heuristic that optimizes choice size of
> the object.
well, there are a number of constraints:
* is our container aligned to cache boundaries
... the allocator needs to do a good job for this
* what is the size of a cache line?
... can we get this at runtime? do we want to get this at runtime? for
boost.lockfree i am using a hardcoded compile-time constant, to compute the
padding size to avoid false sharing
* what is the size of our objects?
in general, this data structure is only reasonable for nodes that are
reasonably small
> I'd look at GCC's deque and see what what's going on
> there. LibC++ takes employs a similar heuristic.
i suppose, i is implemented as a linked list of nodes of a certain size?
> I'm not fundamentally opposed to allowing customization of this
> parameter (in order to support experimentation), but I think that the
> default value should be derived differently. Also, the documentation
> must explain why the parameter might be varied, and in general why it
> should not.
well, the idea is to hold multiple levels of the tree in a single cache line or
vm page. so i think it is pretty important to expose this to the interface ...
but maybe a better discussion of the underlying principle is important.
also it may be better to remove any notion about `cache line size' or `page
size' but replace it with `memory locality', which is a broader term and implies
that experimenting with different `objects_per_group<>' may be interesting.
although my feeling says me, that the default value of 8 will be the first and
the last answer for 99% of all use cases ...
tim
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk