Boost logo

Boost Users :

Subject: Re: [Boost-users] the performance of boost::lock_free is slow in centos 6 and 7
From: james (dirtydroog_at_[hidden])
Date: 2016-07-08 09:20:37


I suspect the main issue with your test is the effects of false sharing
being magnified when the number of cores is >= the number of threads.

I'd also suggest using the mm_pause intrinsic when busy spinning if your
CPU supports it. (It's a real shame there's no official spinlock class in
C++)

Also, use something like a countdown latch to make sure your threads all
start the actual work at the same time.

On Fri, Jul 8, 2016 at 1:50 PM, Michael <mwpowellhtx_at_[hidden]> wrote:

>
>
> On July 8, 2016 1:39:13 AM EDT, gao1738_at_[hidden] wrote:
> > Hi all,
> >
> >I try the boost::lockfree::queue and find some performance issue:
> >
> >I use the following test programs:
> >
> >lock_free_test.cc
> >
> >#include <boost/thread/thread.hpp>
> >#include <boost/lockfree/queue.hpp>
> >#include <iostream>
> >#include<cstdio>
> >
> >#include <boost/atomic.hpp>
> >
> >boost::atomic_int producer_count(0);
> >boost::atomic_int consumer_count(0);
> >
> >boost::lockfree::queue<int> queue(128);
> >
> >const int iterations = 1000000;
> >const int producer_thread_count = 4;
> >const int consumer_thread_count = 4;
> >
> >void producer(void)
> >{
> > for (int i = 0; i != iterations; ++i) {
> > int value = ++producer_count;
> > while (!queue.push(value))
> > ;
> > }
> >}
> >
> >boost::atomic<bool> done (false);
> >void consumer(void)
> >{
> > int value;
> > while (!done) {
> > while (queue.pop(value))
> > ++consumer_count;
> > }
> >
> > while (queue.pop(value))
> > ++consumer_count;
> >}
> >
> >int main(int argc, char* argv[])
> >{
> > using namespace std;
> > cout << "boost::lockfree::queue is ";
> > if (!queue.is_lock_free())
> > cout << "not ";
> > cout << "lockfree" << endl;
> >
> > boost::thread_group producer_threads, consumer_threads;//线程组
> >
> > for (int i = 0; i != producer_thread_count; ++i)
> > producer_threads.create_thread(producer);
> >
> > for (int i = 0; i != consumer_thread_count; ++i)
> > consumer_threads.create_thread(consumer);
> >
> > producer_threads.join_all();
> > done = true;
> >
> > consumer_threads.join_all();
> >
> > cout << "produced " << producer_count << " objects." << endl;
> > cout << "consumed " << consumer_count << " objects." << endl;
> >}
> >
> >
> >locktest.cc
> >
> >#include <boost/thread/thread.hpp>
> >#include <boost/lockfree/queue.hpp>
> >#include <iostream>
> >#include<cstdio>
> >#include <queue>
> >
> >#include <boost/atomic.hpp>
> >using namespace std;
> >
> >boost::mutex producer_count_mu;
> >boost::mutex consumer_count_mu;
> >int producer_count = 0;
> >int consumer_count = 0;
> >
> >std::queue<int> message_queue;
> >
> >boost::mutex queue_mutex;
> >
> >const int iterations = 1000000;
> >const int producer_thread_count = 4;
> >const int consumer_thread_count = 4;
> >
> >void producer(void)
> >{
> > for (int i = 0; i != iterations; ++i) {
> > queue_mutex.lock();
> > int value = ++producer_count;
> > message_queue.push(value);
> > queue_mutex.unlock();
> > }
> >}
>
> I haven't used lockfree per se but my understanding is that it solves what
> its name says.
>
> My guess is that most of the time is spent contending for the mutex.
> Incidentally, why not use one of the proper lock classes? You are already
> using boost, so this is also there. That'll save you having to lock and
> unlock, at least.
>
> I haven't explored lockfree that much, I could be wrong, but I thought the
> whole point of running lockfree was to avoid expensive locks, but not
> absolving you of being aware of exhausted conditions when your queue was
> empty.
>
> Also, doing a test like this what are you really asserting? Lock free; not
> expense free. There are no free lunches. Less so ever before.
>
> Anyhow, HTH
>
> Regards,
>
> Michael Powell
>
> >bool done (false);
> >void consumer(void)
> >{
> > int value;
> > while (!done) {
> > queue_mutex.lock();
> > while (!message_queue.empty()) {
> > message_queue.pop();
> > ++consumer_count;
> > }
> > queue_mutex.unlock();
> > }
> >
> > queue_mutex.lock();
> > while (!message_queue.empty()) {
> > message_queue.pop();
> > ++consumer_count;
> > }
> > queue_mutex.unlock();
> >}
> >int main(int argc, char* argv[])
> >{
> > using namespace std;
> > cout << "boost::lockfree::queue is ";
> >// if (!queue.is_lock_free())
> > cout << "not ";
> > cout << "lockfree" << endl;
> >
> > boost::thread_group producer_threads, consumer_threads;//线程组
> >
> > for (int i = 0; i != producer_thread_count; ++i)
> > producer_threads.create_thread(producer);
> >
> > for (int i = 0; i != consumer_thread_count; ++i)
> > consumer_threads.create_thread(consumer);
> >
> > producer_threads.join_all();
> > done = true;
> >
> > consumer_threads.join_all();
> >
> > cout << "produced " << producer_count << " objects." << endl;
> > cout << "consumed " << consumer_count << " objects." << endl;
> >}
> >
> >
> >The compile command is:
> >g++ -I/usr/local/inlcude -L/usr/local/lib lock_free_test.cc
> >-lboost_thread -lboost_system -o lock_free_test
> >g++ -I/usr/local/inlcude -L/usr/local/lib lock_test.cc -lboost_thread
> >-lboost_system -o lock_test
> >
> >1. I first test in on my work computer, which use ubuntu 14.04 with
> >2core(i5), with
> >boost version: 1.54
> >gcc version: 4.8.4
> >g++ version: 4.8.4
> >
> >The test result is that:
> >
> >time ./lock_test
> >
> >boost::lockfree::queue is not lockfree
> >
> >produced 4000000 objects.
> >
> >consumed 4000000 objects.
> >
> >
> >
> >real 0m3.844s
> >
> >user 0m1.800s
> >
> >sys 0m12.308s
> > time ./lock_free_test
> >
> >boost::lockfree::queue is lockfree
> >
> >produced 4000000 objects.
> >
> >consumed 4000000 objects.
> >
> >
> >
> >real 0m1.745s
> >
> >user 0m6.886s
> >
> >sys 0m0.000s
> >
> >
> >We can see that the lock free solution has better performance, about
> >50%.
> >
> >2. then I test it in a PC server with centos 6.4 , and 8 core (CPU
> >Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz )
> >
> >boost version: 1.54
> >gcc version: 4.4.7
> >g++ version:4.4.7
> >
> >The test result is that:
> >
> >time ./lock_test
> >boost::lockfree::queue is not lockfree
> >produced 4000000 objects.
> >consumed 4000000 objects.
> >
> >real 0m3.900s
> >user 0m2.593s
> >sys 0m27.282s
> >
> > time ./lock_free_test
> >boost::lockfree::queue is lockfree
> >produced 4000000 objects.
> >consumed 4000000 objects.
> >
> >real 0m5.470s
> >user 0m43.105s
> >sys 0m0.000s
> >
> >
> >Non lock free solution is better than lock free solution.
> >
> >3. I test it in a better PC server with centos 7.1 and 32 core CPU
> >(Intel(R) Xeon(R) CPU E7-4820 v2 @ 2.00GHz)
> >boost version: 1.53
> >gcc version: 4.8.3
> >g++ version: 4.8.3
> >
> >time ./lock_test
> >boost::lockfree::queue is not lockfree
> >produced 4000000 objects.
> >consumed 4000000 objects.
> >
> >real 0m3.023s
> >user 0m1.929s
> >sys 0m20.706s
> >
> >time ./lock_free_test
> >boost::lockfree::queue is lockfree
> >produced 4000000 objects.
> >consumed 4000000 objects.
> >
> >real 0m9.804s
> >user 1m14.900s
> >sys 0m0.100s
> >
> >The lock free solution will be 3 times lower than the non-lock free
> >solution!
> >
> >
> >My question is that:
> >1. why lock free solution will get better performance in ubuntu but
> >much slower in centos 6 and 7?
> > Is it the issue of kernal or the gcc version or the boost version?
> >The more cpu in the machine the worse performance for lock free
> >solution?
> >
> >2. In which case, we should use the boost lock free solution to get
> >better performance?
> >
> >
> >
> >Best Regards!
> >
> >dennis
> >
> >
> >
> >
> >------------------------------------------------------------------------
> >
> >_______________________________________________
> >Boost-users mailing list
> >Boost-users_at_[hidden]
> >http://lists.boost.org/mailman/listinfo.cgi/boost-users
>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net