Boost logo

Boost Users :

From: Peter Dimov (pdimov_at_[hidden])
Date: 2008-03-17 12:13:17


Hal Vaughan:
...

> It's supposed to fill a buffer with data on one thread and remove data
> on the other thread. I would expect it to add data and remove at
> almost the same time but when I run it, it adds 10 items, then removes
> them, then adds ten again, and so on. It's never adding and removing
> in an "interleaved" fashion. I've also tried the same with simple
> counters where both threads count to ten, but it doesn't seem to make a
> difference. It seems no matter how I set it up, one thread either
> completes, or runs until it has to stop for a mutex, then the other
> thread runs. It seems that there is never a case of both threads
> running at the same time. I've never had trouble doing this in Perl
> with fork() so I'm not at all sure what could be going wrong.

There are several possible answers to this one, from worst to best:

A1. Unlock the mutex before notify_one and then yield.

A2. The threads are typically scheduled to run for a certain timesice; 10ms
is one possible value. A context switch from one thread to another takes
time, and the fact that the new thread doesn't have its data in the cache is
an additional performance hit. Context switches are pure waste of time, so
the less context switches your program performs, the faster it will appear
to run. In your case, the pushing thread can do many iterations in its
timeslice, so it runs until the buffer is full, then blocks.

A3. In a "real" multithreaded program, you almost never want threads to run
interleaved. You want them to run at the same time on different CPUs in
order to achieve maximum throughput. Amdahl's law says that the time it
takes for a program on K CPUs to finish is S + P / k, where S is the
sequential portion and P is the parallelizable portion. The less S you have,
the better. It follows that benchmarks that have P == 0, such as one in
which the threads repeatedly fight for the same mutex, are generally
useless; they would take much less time if rewritten to run single-threaded.

You want your benchmark to measure something significant; for that you need
something along the lines of:

void writer()
{
    for (int n = 0; n < ITERS; ++n)
    {
        // create an item "it", taking time
        buf.put( it );
    }
}

void reader()
{
    for (int x = 0; x < ITERS; ++x)
    {
        Item it = buf.get();
        // process the item "it", taking time
    }
}

where the time it takes to create an item and to process an item is
significant compared to the queue overhead (or a single thread creating and
processing items will be faster).

I see in your later post that an item is "created" by reading from a serial
port. In this case, if you use blocking reads, you won't need to do
anything; the "writer" thread will just block until there's something to
read from the port, and the "reader" thread will block when there's no data
in the queue.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net