Boost logo

Boost :

From: williamkempf_at_[hidden]
Date: 2001-07-01 11:16:18


--- In boost_at_y..., "Peter Dimov" <pdimov_at_m...> wrote:
> From: <williamkempf_at_h...>
>
> > > Using a generic thread-safe memory allocator and a generic
thread-
> > safe
> > > reference counted smart pointer can never be more efficient than
> > using
> > > tailored versions.
> >
> > I'm not so sure about that.
>
> Why? It's simple: if a generic thing is better than your custom
thing, you
> are free to simply typedef your custom thing to the generic thing.
>
> So the generic thing can never win. It can only break even.

All it has to do is break even.

> > > Of course I expect better performance. The implementation is
free to
> > > optimize, or remove, the allocation.
> >
> > *If* it's even possible.
>
> I think that on Win32 a thread::ref can contain only a HANDLE (and
a DWORD
> if you insist on being able to compare thread::refs for equality -
you can't
> compare HANDLEs.)

Yes, but this requires calls to DuplicateHandle which *do the exact
same thing that a ref-counted implementation would do*. The overhead
is going to be identical.
 
> If you think that this isn't possible, I'll be interested to hear
the
> reasons. Since you've been working on Boost.Threads (and I've not)
it's
> possible that I'm missing some critical point of the picture.

It may not be possible if extra data must be maintained above and
beyond what a HANDLE maintains. For instance, if we add clean up
stacks (a good idea, though the implementation will be tricky) you'll
have to maintain extra data and we'll wind up using an explicit ref-
counting mechanism any way.
 
> I don't see how you can come up with a more efficient noncopyable
thread
> than this. Again, it may be just me.

The noncopyable implementation *NEVER* does ref-counting, so by very
definition it's more efficient.
 
> > > If you had a noncopyable design you could never copy nor
destroy a
> > > (non-unique) object, right? So thread::ref has no overhead
> > _provided_ that
> > > you use it like you would use a noncopyable object.
> >
> > It still has overhead, you're just claiming the overhead is
dwarfed
> > by the overhead of thread creation and destruction, which from a
high
> > level view is at least some what accurate. However, with a
> > noncopyable design you can use explicit management techniques
which
> > will always be *MUCH* more efficient than any ref implementation.
> > Yes, this suffers from overhead at construction and destruction
since
> > new/delete must be used, but as you point out this overhead will
be
> > dwarfed by the cost of thread creation and destruction any way.
>
> I don't follow your logic. thread::ref suffers - potentially! - from
> overhead in construction and destruction and this is not acceptable;
> explicit management techniques suffer from overhead at construction
and
> destruction since new/delete must be used BUT this is acceptable?

Yes, because the overhead is incurred ONLY when copies must be made.
You pay only for what you use. With a thread ref design you pay the
penalty no matter what. However, the real difference between the two
doesn't come during construction and destruction but during copying.
With a noncopyable design and new/delete you can employ explicit
management which will be faster than ref-counting.
 
> > > [ And if you are in the same building as the pthreads
implementor
> > for your
> > > OS, you could do some truly amazing things with the thread::ref
> > design. :-)
> > > This is our long term goal, right? ]
> >
> > No. Our goals have nothing to do with pthreads. Now if you
change
> > this to say C++ standard library implementor you get a little
closer
> > to what you mean.
>
> Hm. I said "if you are in the same building as the pthreads
implementor"
> meaning that you - the std::thread implementor - are in the same
building
> with the pthreads implementor.
>
> Does this clarify my point?

Ignoring my point, then, yes. However, it's still irrelevant to the
overall issue of overhead. Let me illustrate the main usage patterns
that effect all of this to prove the point.

1) Simple creation:

void foo()
{
   thread thrd(&bar); // Create the thread
}

2) Simple creation, wait for thread:

void foo()
{
   thread thrd(&bar);
   thrd.join();
}

3) Creation within a loop.

void foo()
{
   for (int i=0; i<10; ++i)
      thread thrd(&bar);
}

4) Creation within a loop, wait for all threads:

void foo()
{
   thread* threads[10];
   for (int i=0; i<10; ++i)
      threads[i] = new thread(&bar);
   for (int i=0; i<10; ++i)
   {
      threads[i]->join();
      delete threads[i];
   }
}

5) Creation and passage to another manager:

void foo()
{
   thread_manager.add(new thread(&bar));
}

With (1) and (2) it's obvious that the non-ref design wins out. It's
as fast as possible in the creation and destruction and no copies are
done.

With (3) since we don't join the case is identical to (1) and (2).

With (4) we illustrate that the noncopyable version is still the
fastest, though it suffers from difficult usage. However, with the
use of a thread_group concept the usage becomes no different than
with ref-counting, but there are no copies being made so we're still
as fast as possible. Example code using this:

void foo()
{
   thread_group threads;
   for (int i=0; i<10; ++i)
      threads.create(&bar);
   threads.join();
}

With (5) a ref-counted design becomes easier and less error prone,
but explicit memory management can still give us much better
performance. For example, in the above code the thread_group is such
an example. If instead of create() we had an add() you'd have
explicit passage of ownership with no ref-counting.

void foo()
{
   thread_group threads;
   for (int i=0; i<10; ++i)
      threads.add(new thread(&bar));
   threads.join();
}

Only (4) and (5) incur any overhead from new/delete (which is likely
to be identical to a ref implementation any way and may be hidden by
the cost of thread creation/destruction), while I've shown how all of
them can incur NO overhead from copies.

I see no way that a claim can be made that the ref-counting design
shall incur no overhead in comparison to a noncopyable design, while
the reverse claim can be made for at least some usage, and for others
we can make a claim of less overhead.

Bill Kempf


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk