Boost logo

Boost :

From: Fredrik Blomqvist (fredrik_blomqvist_at_[hidden])
Date: 2004-12-02 15:03:37


Perhaps this could be of some interest in this discussion.
http://www.codeproject.com/cpp/FastDelegate.asp
The article talks about low-level optimization of delegates on a wide range
of platforms.

To quote: "Initially, I used boost::function, but I found that the memory
allocation for the delegates was consuming over a third of the entire
program running time!"

Regards
// Fredrik Blomqvist

Doug Gregor wrote:
> On Dec 2, 2004, at 9:32 AM, Robert Zeh wrote:
>> At home I decided to try replacing the shared_ptr in
>> slot_call_iterator with an instance variable of the result_type.
>> Since slot_call_iterator uses the shared_ptr to determine if the
>> cache is valid, I had to add a bool indicating if the cached value
>> is valid or not. These changes made the benchmark run about two to
>> four times faster (on my home system, a 650 MHz Duron with gcc 3.3.2
>> running Debian). I expect better results with my SPARC machine,
>> because the malloc implementation seems to be slower.
>
> Interesting! I never would have expected that to be such a bottleneck.
>
>> There are several drawbacks to my modifications. It's harder to
>> maintain because of the added bool. The result_type used by
>> slot_call_iterator must now have a default constructor. If it is
>> expensive to copy the result_type, and slot_call_iterator is copied a
>> lot, replacing the shared_ptr with an instance variable will actually
>>
>> make things slower.
>> I don't know enough about the internals to weigh how important these
>> issues are.
>
> Unfortunately, some of them are pretty important. For instance, if one
> tried to have a result_type that is an iterator, copying a
> slot_call_iterator that has not been dereferenced could result in
> undefined behavior, because the iterator would have been
> default-constructed and therefore singular. That doesn't make the
> optimization invalid, of course, but it does limit where we can apply
> the optimization.
>
>> I believe the correct way to do things is to create a cache interface
>> that encapsulates the behavior the slot_call_iterator needs, and to
>> then choose the appropriate cache implementation at runtime using the
>> mpl.
>
> We could do it at compile time using some combination of type traits,
> probably. Unfortunately, those that come to
> mind--is_default_constructible and is_assignable--don't give us all
> the information we need, because iterators will return "true" for
> both but
> will have the wrong behavior. We would need to introduce a new trait
> for this.
>
> Actually, there is one other corner case that's really ugly. The
> combiner could theoretically do something like this:
>
> template<typename InputIterator>
> result_type operator()(InputIterator first, InputIterator last)
> {
> while (first != last) {
> InputIterator annoying = first;
> *first;
> if (*annoying) /* ... */
> }
> }
>
> I *think* that's valid code according to the Input Iterator
> requirements (but not 100% sure!), and without the pointer "*annoying"
> would call the slot again...
>
> Oh, I know a solution now... we stick an "optional<R> slot_result"
> into
> the signal's operator() and keep a pointer to that variable in the
> slot_call_iterator. The optional<R> has nearly the same effect as the
> cache interface you mentioned, but it doesn't require
> default-constructibility; storing a pointer to it ensures that
> slot_iterator copies are fast and that all slot_iterators see the
> updates (for those truly pathological combiners).
>
> I'm really interested in this thread but, as seems to happen, it's
> come
> at a time when I can't dedicate a whole lot of time to it. After
> December 10th, I'm all over it :) The timing tests, "lite" versions,
> performance results and performance analyses are greatly
> appreciated...
> with any luck we'll be able to improve Signals performance or come up
> with its replacement :)
>
> On the technical side, I expect the the orders-of-magnitude speed lag
> in Signals comes from the named_slot_map. Jody's "lite" version does
> not support named slots. Perhaps that feature is too painful to have,
> and that we should just drop back to using an std::list as Jody and
> perhaps provide an alternative signal type that allows named slots.
>
> Doug
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk