Boost logo

Boost :

Subject: Re: [boost] [asio] Race conditions detected with valgrind drd in example/http/server2
From: Jorge Moraleda (moraleda_at_[hidden])
Date: 2010-06-08 14:50:29


On Fri, Jun 4, 2010 at 6:46 PM, Jorge Moraleda <moraleda_at_[hidden]> wrote:
> When testing boost.asio (example/http/server2) for race conditions
> using valgrind drd, some
> errors are reported right after the first connection to the server is
> made. I am using Valgrind-3.6.0.SVN and gcc version 4.4.4 (Debian
> 4.4.4-4). This is an up to date Debian sid 64 bit system.
>
> To reproduce:
>
> g++ -l boost_thread -l boost_system
> /opt/boost/boost/doc/html/boost_asio/example/http/server2/*cpp
>
> valgrind --tool=drd ./a.out 127.0.0.1 31174 8 /tmp
>
> While it is running type "http://localhost:31174/main.cpp" in your
> browser (where main.cpp is a file in /tmp). Then several race
> conditions occur. These are the valgrind error messages:
>
> ==7737== drd, a thread error detector
> ==7737== Copyright (C) 2006-2010, and GNU GPL'd, by Bart Van Assche.
> ==7737== Using Valgrind-3.6.0.SVN and LibVEX; rerun with -h for copyright info
> ==7737== Command: ./a.out 127.0.0.1 31174 8 /tmp
> ==7737==
> ==7737== Thread 4:
> ==7737== Conflicting load by thread 4 at 0x061875f0 size 8
> ==7737==    at 0x40DB72:
> boost::asio::detail::op_queue<boost::asio::detail::reactor_op>::front()
> (in /tmp/a.out)
> ==7737==    by 0x417974: boost::asio::detail::epoll_reactor::run(bool,
> boost::asio::detail::op_queue<boost::asio::detail::task_io_service_operation<boost::asio::detail::epoll_reactor>
>>&) (in /tmp/a.out)
> ==7737==    by 0x41C809:
> boost::asio::detail::task_io_service<boost::asio::detail::epoll_reactor>::do_one(boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>&,
> boost::asio::detail::task_io_service<boost::asio::detail::epoll_reactor>::idle_thread_info*)
> (in /tmp/a.out)
> ==7737==    by 0x41A026:
> boost::asio::detail::task_io_service<boost::asio::detail::epoll_reactor>::run(boost::system::error_code&)
> (in /tmp/a.out)
> ==7737==    by 0x417D26: boost::asio::io_service::run() (in /tmp/a.out)
> ==7737==    by 0x430358: unsigned long boost::_mfi::mf0<unsigned long,
> boost::asio::io_service>::call<boost::shared_ptr<boost::asio::io_service>
>>(boost::shared_ptr<boost::asio::io_service>&, void const*) const (in
> /tmp/a.out)
> ==7737==    by 0x42FF74: unsigned long boost::_mfi::mf0<unsigned long,
> boost::asio::io_service>::operator()<boost::shared_ptr<boost::asio::io_service>
>>(boost::shared_ptr<boost::asio::io_service>&) const (in /tmp/a.out)
> ==7737==    by 0x42FE1E: unsigned long
> boost::_bi::list1<boost::_bi::value<boost::shared_ptr<boost::asio::io_service>
>> >::operator()<unsigned long, boost::_mfi::mf0<unsigned long,
> boost::asio::io_service>, boost::_bi::list0>(boost::_bi::type<unsigned
> long>, boost::_mfi::mf0<unsigned long, boost::asio::io_service>&,
> boost::_bi::list0&, long) (in /tmp/a.out)
> ==7737==    by 0x42FD66: boost::_bi::bind_t<unsigned long,
> boost::_mfi::mf0<unsigned long, boost::asio::io_service>,
> boost::_bi::list1<boost::_bi::value<boost::shared_ptr<boost::asio::io_service>
>> > >::operator()() (in /tmp/a.out)
> ==7737==    by 0x42EE79:
> boost::detail::thread_data<boost::_bi::bind_t<unsigned long,
> boost::_mfi::mf0<unsigned long, boost::asio::io_service>,
> boost::_bi::list1<boost::_bi::value<boost::shared_ptr<boost::asio::io_service>
>> > > >::run() (in /tmp/a.out)
> ==7737==    by 0x4E41F5F: thread_proxy (in
> /usr/local/lib/libboost_thread.so.1.43.0)
> ==7737==    by 0x4C2EA60: vgDrd_thread_wrapper (drd_pthread_intercepts.c:272)
> ==7737== Address 0x61875f0 is at offset 80 from 0x61875a0. Allocation context:
> ==7737==    at 0x4C26FD1: operator new(unsigned long) (vg_replace_malloc.c:261)
> ==7737==    by 0x43BDC8:
> __gnu_cxx::new_allocator<std::_List_node<std::pair<int,
> boost::asio::detail::epoll_reactor::descriptor_state> >
>>::allocate(unsigned long, void const*) (in /tmp/a.out)
> ==7737==    by 0x43B72D: std::_List_base<std::pair<int,
> boost::asio::detail::epoll_reactor::descriptor_state>,
> std::allocator<std::pair<int,
> boost::asio::detail::epoll_reactor::descriptor_state> >
>>::_M_get_node() (in /tmp/a.out)
> ==7737==    by 0x43AD64: std::list<std::pair<int,
> boost::asio::detail::epoll_reactor::descriptor_state>,
> std::allocator<std::pair<int,
> boost::asio::detail::epoll_reactor::descriptor_state> >
>>::_M_create_node(std::pair<int,
> boost::asio::detail::epoll_reactor::descriptor_state> const&) (in
> /tmp/a.out)
> ==7737==    by 0x43A0FC: std::list<std::pair<int,
> boost::asio::detail::epoll_reactor::descriptor_state>,
> std::allocator<std::pair<int,
> boost::asio::detail::epoll_reactor::descriptor_state> >
>>::insert(std::_List_iterator<std::pair<int,
> boost::asio::detail::epoll_reactor::descriptor_state> >,
> std::pair<int, boost::asio::detail::epoll_reactor::descriptor_state>
> const&) (in /tmp/a.out)
> ==7737==    by 0x439A77: boost::asio::detail::hash_map<int,
> boost::asio::detail::epoll_reactor::descriptor_state>::values_insert(std::_List_iterator<std::pair<int,
> boost::asio::detail::epoll_reactor::descriptor_state> >,
> std::pair<int, boost::asio::detail::epoll_reactor::descriptor_state>
> const&) (in /tmp/a.out)
> ==7737==    by 0x438B60: boost::asio::detail::hash_map<int,
> boost::asio::detail::epoll_reactor::descriptor_state>::insert(std::pair<int,
> boost::asio::detail::epoll_reactor::descriptor_state> const&) (in
> /tmp/a.out)
> ==7737==    by 0x43811B:
> boost::asio::detail::epoll_reactor::register_descriptor(int,
> boost::asio::detail::epoll_reactor::descriptor_state*&) (in
> /tmp/a.out)
> ==7737==    by 0x43DE25:
> boost::asio::detail::reactive_socket_service<boost::asio::ip::tcp>::assign(boost::asio::detail::reactive_socket_service<boost::asio::ip::tcp>::implementation_type&,
> boost::asio::ip::tcp const&, int const&, boost::system::error_code&)
> (in /tmp/a.out)
> ==7737==    by 0x43D954:
> boost::asio::stream_socket_service<boost::asio::ip::tcp>::assign(boost::asio::detail::reactive_socket_service<boost::asio::ip::tcp>::implementation_type&,
> boost::asio::ip::tcp const&, int const&, boost::system::error_code&)
> (in /tmp/a.out)
> ==7737==    by 0x43D689:
> boost::asio::basic_socket<boost::asio::ip::tcp,
> boost::asio::stream_socket_service<boost::asio::ip::tcp>
>>::assign(boost::asio::ip::tcp const&, int const&,
> boost::system::error_code&) (in /tmp/a.out)
> ==7737==    by 0x43D274:
> boost::asio::detail::reactive_socket_service<boost::asio::ip::tcp>::accept_op_base<boost::asio::basic_socket<boost::asio::ip::tcp,
> boost::asio::stream_socket_service<boost::asio::ip::tcp> >
>>::do_perform(boost::asio::detail::reactor_op*) (in /tmp/a.out)
> ==7737== Other segment start (thread 3)
> ==7737==    at 0x4C2F283: pthread_mutex_lock (drd_pthread_intercepts.c:578)
> ==7737==    by 0x40AC2B: boost::asio::detail::posix_mutex::lock() (in
> /tmp/a.out)
> ==7737==    by 0x40D4CF:
> boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>::scoped_lock(boost::asio::detail::posix_mutex&)
> (in /tmp/a.out)
> ==7737==    by 0x4380D4:
> boost::asio::detail::epoll_reactor::register_descriptor(int,
> boost::asio::detail::epoll_reactor::descriptor_state*&) (in
> /tmp/a.out)
> ==7737==    by 0x43DE25:
> boost::asio::detail::reactive_socket_service<boost::asio::ip::tcp>::assign(boost::asio::detail::reactive_socket_service<boost::asio::ip::tcp>::implementation_type&,
> boost::asio::ip::tcp const&, int const&, boost::system::error_code&)
> (in /tmp/a.out)
> ==7737==    by 0x43D954:
> boost::asio::stream_socket_service<boost::asio::ip::tcp>::assign(boost::asio::detail::reactive_socket_service<boost::asio::ip::tcp>::implementation_type&,
> boost::asio::ip::tcp const&, int const&, boost::system::error_code&)
> (in /tmp/a.out)
> ==7737==    by 0x43D689:
> boost::asio::basic_socket<boost::asio::ip::tcp,
> boost::asio::stream_socket_service<boost::asio::ip::tcp>
>>::assign(boost::asio::ip::tcp const&, int const&,
> boost::system::error_code&) (in /tmp/a.out)
> ==7737==    by 0x43D274:
> boost::asio::detail::reactive_socket_service<boost::asio::ip::tcp>::accept_op_base<boost::asio::basic_socket<boost::asio::ip::tcp,
> boost::asio::stream_socket_service<boost::asio::ip::tcp> >
>>::do_perform(boost::asio::detail::reactor_op*) (in /tmp/a.out)
> ==7737==    by 0x40AF7A: boost::asio::detail::reactor_op::perform()
> (in /tmp/a.out)
> ==7737==    by 0x41798B: boost::asio::detail::epoll_reactor::run(bool,
> boost::asio::detail::op_queue<boost::asio::detail::task_io_service_operation<boost::asio::detail::epoll_reactor>
>>&) (in /tmp/a.out)
> ==7737==    by 0x41C809:
> boost::asio::detail::task_io_service<boost::asio::detail::epoll_reactor>::do_one(boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>&,
> boost::asio::detail::task_io_service<boost::asio::detail::epoll_reactor>::idle_thread_info*)
> (in /tmp/a.out)
> ==7737==    by 0x41A026:
> boost::asio::detail::task_io_service<boost::asio::detail::epoll_reactor>::run(boost::system::error_code&)
> (in /tmp/a.out)
> ==7737== Other segment end (thread 3)
> ==7737==    at 0x4C2FE7F: pthread_mutex_unlock (drd_pthread_intercepts.c:631)
> ==7737==    by 0x40AC45: boost::asio::detail::posix_mutex::unlock()
> (in /tmp/a.out)
> ==7737==    by 0x40D500:
> boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>::~scoped_lock()
> (in /tmp/a.out)
> ==7737==    by 0x438273:
> boost::asio::detail::epoll_reactor::register_descriptor(int,
> boost::asio::detail::epoll_reactor::descriptor_state*&) (in
> /tmp/a.out)
> ==7737==    by 0x43DE25:
> boost::asio::detail::reactive_socket_service<boost::asio::ip::tcp>::assign(boost::asio::detail::reactive_socket_service<boost::asio::ip::tcp>::implementation_type&,
> boost::asio::ip::tcp const&, int const&, boost::system::error_code&)
> (in /tmp/a.out)
> ==7737==    by 0x43D954:
> boost::asio::stream_socket_service<boost::asio::ip::tcp>::assign(boost::asio::detail::reactive_socket_service<boost::asio::ip::tcp>::implementation_type&,
> boost::asio::ip::tcp const&, int const&, boost::system::error_code&)
> (in /tmp/a.out)
> ==7737==    by 0x43D689:
> boost::asio::basic_socket<boost::asio::ip::tcp,
> boost::asio::stream_socket_service<boost::asio::ip::tcp>
>>::assign(boost::asio::ip::tcp const&, int const&,
> boost::system::error_code&) (in /tmp/a.out)
> ==7737==    by 0x43D274:
> boost::asio::detail::reactive_socket_service<boost::asio::ip::tcp>::accept_op_base<boost::asio::basic_socket<boost::asio::ip::tcp,
> boost::asio::stream_socket_service<boost::asio::ip::tcp> >
>>::do_perform(boost::asio::detail::reactor_op*) (in /tmp/a.out)
> ==7737==    by 0x40AF7A: boost::asio::detail::reactor_op::perform()
> (in /tmp/a.out)
> ==7737==    by 0x41798B: boost::asio::detail::epoll_reactor::run(bool,
> boost::asio::detail::op_queue<boost::asio::detail::task_io_service_operation<boost::asio::detail::epoll_reactor>
>>&) (in /tmp/a.out)
> ==7737==    by 0x41C809:
> boost::asio::detail::task_io_service<boost::asio::detail::epoll_reactor>::do_one(boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>&,
> boost::asio::detail::task_io_service<boost::asio::detail::epoll_reactor>::idle_thread_info*)
> (in /tmp/a.out)
> ==7737==    by 0x41A026:
> boost::asio::detail::task_io_service<boost::asio::detail::epoll_reactor>::run(boost::system::error_code&)
> (in /tmp/a.out)
> ==7737==
> ==7737== Thread 3:
> ==7737== Probably a race condition: condition variable 0x8583c40 has
> been signaled but the associated mutex 0x61820f8 is not locked by the
> signalling thread.
> ==7737==    at 0x4C283F9: pthread_cond_signal@* (drd_pthread_intercepts.c:736)
> ==7737==    by 0x4119FE: void
> boost::asio::detail::posix_event::signal_and_unlock<boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>
>>(boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>&)
> (in /tmp/a.out)
> ==7737==    by 0x4108F2:
> boost::asio::detail::task_io_service<boost::asio::detail::epoll_reactor>::wake_one_idle_thread_and_unlock(boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>&)
> (in /tmp/a.out)
> ==7737==    by 0x40F1C2:
> boost::asio::detail::task_io_service<boost::asio::detail::epoll_reactor>::wake_one_thread_and_unlock(boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>&)
> (in /tmp/a.out)
> ==7737==    by 0x40DA56:
> boost::asio::detail::task_io_service<boost::asio::detail::epoll_reactor>::init_task()
> (in /tmp/a.out)
> ==7737==    by 0x40B941:
> boost::asio::detail::epoll_reactor::init_task() (in /tmp/a.out)
> ==7737==    by 0x4148D8:
> boost::asio::detail::reactive_socket_service<boost::asio::ip::tcp>::reactive_socket_service(boost::asio::io_service&)
> (in /tmp/a.out)
> ==7737==    by 0x414035:
> boost::asio::stream_socket_service<boost::asio::ip::tcp>::stream_socket_service(boost::asio::io_service&)
> (in /tmp/a.out)
> ==7737==    by 0x413893: boost::asio::io_service::service*
> boost::asio::detail::service_registry::create<boost::asio::stream_socket_service<boost::asio::ip::tcp>
>>(boost::asio::io_service&) (in /tmp/a.out)
> ==7737==    by 0x40ADB8:
> boost::asio::detail::service_registry::do_use_service(boost::asio::io_service::service::key
> const&, boost::asio::io_service::service*
> (*)(boost::asio::io_service&)) (in /tmp/a.out)
> ==7737==    by 0x412B7E:
> boost::asio::stream_socket_service<boost::asio::ip::tcp>&
> boost::asio::detail::service_registry::use_service<boost::asio::stream_socket_service<boost::asio::ip::tcp>
>>() (in /tmp/a.out)
> ==7737==    by 0x411B3A:
> boost::asio::stream_socket_service<boost::asio::ip::tcp>&
> boost::asio::use_service<boost::asio::stream_socket_service<boost::asio::ip::tcp>
>>(boost::asio::io_service&) (in /tmp/a.out)
> ==7737== cond 0x8583c40 was first observed at:
> ==7737==    at 0x4C27C59: pthread_cond_init@* (drd_pthread_intercepts.c:652)
> ==7737==    by 0x417573:
> boost::asio::detail::posix_event::posix_event() (in /tmp/a.out)
> ==7737==    by 0x419F07:
> boost::asio::detail::task_io_service<boost::asio::detail::epoll_reactor>::idle_thread_info::idle_thread_info()
> (in /tmp/a.out)
> ==7737==    by 0x419FBC:
> boost::asio::detail::task_io_service<boost::asio::detail::epoll_reactor>::run(boost::system::error_code&)
> (in /tmp/a.out)
> ==7737==    by 0x417D26: boost::asio::io_service::run() (in /tmp/a.out)
> ==7737==    by 0x430358: unsigned long boost::_mfi::mf0<unsigned long,
> boost::asio::io_service>::call<boost::shared_ptr<boost::asio::io_service>
>>(boost::shared_ptr<boost::asio::io_service>&, void const*) const (in
> /tmp/a.out)
> ==7737==    by 0x42FF74: unsigned long boost::_mfi::mf0<unsigned long,
> boost::asio::io_service>::operator()<boost::shared_ptr<boost::asio::io_service>
>>(boost::shared_ptr<boost::asio::io_service>&) const (in /tmp/a.out)
> ==7737==    by 0x42FE1E: unsigned long
> boost::_bi::list1<boost::_bi::value<boost::shared_ptr<boost::asio::io_service>
>> >::operator()<unsigned long, boost::_mfi::mf0<unsigned long,
> boost::asio::io_service>, boost::_bi::list0>(boost::_bi::type<unsigned
> long>, boost::_mfi::mf0<unsigned long, boost::asio::io_service>&,
> boost::_bi::list0&, long) (in /tmp/a.out)
> ==7737==    by 0x42FD66: boost::_bi::bind_t<unsigned long,
> boost::_mfi::mf0<unsigned long, boost::asio::io_service>,
> boost::_bi::list1<boost::_bi::value<boost::shared_ptr<boost::asio::io_service>
>> > >::operator()() (in /tmp/a.out)
> ==7737==    by 0x42EE79:
> boost::detail::thread_data<boost::_bi::bind_t<unsigned long,
> boost::_mfi::mf0<unsigned long, boost::asio::io_service>,
> boost::_bi::list1<boost::_bi::value<boost::shared_ptr<boost::asio::io_service>
>> > > >::run() (in /tmp/a.out)
> ==7737==    by 0x4E41F5F: thread_proxy (in
> /usr/local/lib/libboost_thread.so.1.43.0)
> ==7737==    by 0x4C2EA60: vgDrd_thread_wrapper (drd_pthread_intercepts.c:272)
> ==7737== mutex 0x61820f8 was first observed at:
> ==7737==    at 0x4C2EC6B: pthread_mutex_init (drd_pthread_intercepts.c:537)
> ==7737==    by 0x40AB63:
> boost::asio::detail::posix_mutex::posix_mutex() (in /tmp/a.out)
> ==7737==    by 0x410B72:
> boost::asio::detail::task_io_service<boost::asio::detail::epoll_reactor>::task_io_service(boost::asio::io_service&)
> (in /tmp/a.out)
> ==7737==    by 0x40F679: boost::asio::io_service::service*
> boost::asio::detail::service_registry::create<boost::asio::detail::task_io_service<boost::asio::detail::epoll_reactor>
>>(boost::asio::io_service&) (in /tmp/a.out)
> ==7737==    by 0x40ADB8:
> boost::asio::detail::service_registry::do_use_service(boost::asio::io_service::service::key
> const&, boost::asio::io_service::service*
> (*)(boost::asio::io_service&)) (in /tmp/a.out)
> ==7737==    by 0x40DDDE:
> boost::asio::detail::task_io_service<boost::asio::detail::epoll_reactor>&
> boost::asio::detail::service_registry::use_service<boost::asio::detail::task_io_service<boost::asio::detail::epoll_reactor>
>>() (in /tmp/a.out)
> ==7737==    by 0x417C48: boost::asio::io_service::io_service() (in /tmp/a.out)
> ==7737==    by 0x416921:
> http::server2::io_service_pool::io_service_pool(unsigned long) (in
> /tmp/a.out)
> ==7737==    by 0x437816: http::server2::server::server(std::string
> const&, std::string const&, std::string const&, unsigned long) (in
> /tmp/a.out)
> ==7737==    by 0x43064D: main (in /tmp/a.out)
> ==7737==
> ^C==7737==
> ==7737== For counts of detected and suppressed errors, rerun with: -v
> ==7737== ERROR SUMMARY: 3 errors from 2 contexts (suppressed: 225 from 194)
>

I asked the same question in the valgrind list and I append the reply
I got. They seem to suggest that there indeed may be some problems in
asio. Will someone who is familiar with boost.asio design comment? I
am getting the same warnings in my own program, that I modeled after
the example.

Response from valgrind list ----------------------------------------

This looks like a race on descriptor_state::op_queue_, so I had a look
at the following source code:
http://www.boost.org/doc/libs/1_43_0/boost/asio/detail/epoll_reactor.hpp.

My comments on this source code are as follows:
* The comments at the bottom of class epoll_reactor say that any
access of registered_descriptors_ should be protected by
registered_descriptors_mutex_. However, the method shutdown_service()
modifies the container registered_descriptors_ but doesn't lock
registered_descriptors_mutex_.
* The method epoll_reactor::register_descriptor() modifies its second
argument (descriptor_data) such that it points to the newly created
descriptor_state object. All data members of the struct
descriptor_state are public, but all accesses must be guarded by a
lock on descriptor_state::mutex_. So all callers of
register_descriptor() must be checked in order to verify whether or
not there are any thread-unsafe accesses of
descriptor_state::op_queue_ or descriptor_state::shutdown_. Personally
I never recommended such a class design.
* While all accesses of the members of struct descriptor_state should
be protected by locking descriptor_state::mutex_, no lock is held on
this last mutex by register_descriptor() when it sets
descriptor_data::shutdown_ nor by shutdown_service() while it modifies
descriptor_state::op_queue_ and descriptor_state::shutdown_. The
former is easy to fix: move the "descriptor_data->shutdown_ = false"
statement to somewhere before the epoll_ctl() system call.

Does one of the above scenarios explain the race report you have observed ?


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk