Boost logo

Boost :

From: Doug Gregor (dgregor_at_[hidden])
Date: 2004-12-02 11:25:50


On Dec 2, 2004, at 9:32 AM, Robert Zeh wrote:
> At home I decided to try replacing the shared_ptr in
> slot_call_iterator with an instance variable of the result_type.
> Since slot_call_iterator uses the shared_ptr to determine if the cache
> is valid, I had to add a bool indicating if the cached value is valid
> or not. These changes made the benchmark run about two to four times
> faster (on my home system, a 650 MHz Duron with gcc 3.3.2 running
> Debian). I expect better results with my SPARC machine, because the
> malloc implementation seems to be slower.

Interesting! I never would have expected that to be such a bottleneck.

> There are several drawbacks to my modifications. It's harder to
> maintain because of the added bool. The result_type used by
> slot_call_iterator must now have a default constructor. If it is
> expensive to copy the result_type, and slot_call_iterator is copied a
> lot, replacing the shared_ptr with an instance variable will actually
>
> make things slower.
> I don't know enough about the internals to weigh how important these
> issues are.

Unfortunately, some of them are pretty important. For instance, if one
tried to have a result_type that is an iterator, copying a
slot_call_iterator that has not been dereferenced could result in
undefined behavior, because the iterator would have been
default-constructed and therefore singular. That doesn't make the
optimization invalid, of course, but it does limit where we can apply
the optimization.

> I believe the correct way to do things is to create a cache interface
> that encapsulates the behavior the slot_call_iterator needs, and to
> then choose the appropriate cache implementation at runtime using the
> mpl.

We could do it at compile time using some combination of type traits,
probably. Unfortunately, those that come to
mind--is_default_constructible and is_assignable--don't give us all the
information we need, because iterators will return "true" for both but
will have the wrong behavior. We would need to introduce a new trait
for this.

Actually, there is one other corner case that's really ugly. The
combiner could theoretically do something like this:

template<typename InputIterator>
result_type operator()(InputIterator first, InputIterator last)
{
   while (first != last) {
     InputIterator annoying = first;
     *first;
     if (*annoying) /* ... */
   }
}

I *think* that's valid code according to the Input Iterator
requirements (but not 100% sure!), and without the pointer "*annoying"
would call the slot again...

Oh, I know a solution now... we stick an "optional<R> slot_result" into
the signal's operator() and keep a pointer to that variable in the
slot_call_iterator. The optional<R> has nearly the same effect as the
cache interface you mentioned, but it doesn't require
default-constructibility; storing a pointer to it ensures that
slot_iterator copies are fast and that all slot_iterators see the
updates (for those truly pathological combiners).

I'm really interested in this thread but, as seems to happen, it's come
at a time when I can't dedicate a whole lot of time to it. After
December 10th, I'm all over it :) The timing tests, "lite" versions,
performance results and performance analyses are greatly appreciated...
with any luck we'll be able to improve Signals performance or come up
with its replacement :)

On the technical side, I expect the the orders-of-magnitude speed lag
in Signals comes from the named_slot_map. Jody's "lite" version does
not support named slots. Perhaps that feature is too painful to have,
and that we should just drop back to using an std::list as Jody and
perhaps provide an alternative signal type that allows named slots.

        Doug


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk