Boost logo

Boost Users :

Subject: [Boost-users] [ASIO] random crashes
From: Axel (axel.azerty_at_[hidden])
Date: 2009-01-28 09:59:10


Hello

I m using ASIO library (from boost 1.35) in a network daemon (running on
a Debian Etch system). The code structure is almost the same as the one
described in the HTTP Server example
(http://tenermerx.com/Asio/boost_asio_1_3_1/doc/html/boost_asio/examples.html#boost_asio.examples.http_server).
  One io_service is used, async_accept() creates new "sessions" and so on.

The daemon can run flawlessly for weeks (or only some hours), and crash
randomly (segmentation fault). The "network" load isn't really high,
every 5 minutes, a few megabytes of data (characters) are sent to the
daemon. I didn't noticed any memory leaks or null pointers accesses.
Among many crashes, I found 2 kind of crashes.

Since core are dumped, I tried to debug it, but I don't know how to
interpret the core result. The binary wasn't linked against debug
libraries and I just can get debug data from the binary itself.

Here's the gdb output on "bt full" command. The daemon has 4 threads,
here s the gdb output of the one which causes the crash (I assume this
is this one)

First kind of crash : (only the relevant part is pasted, the output is
huge). It seems to be related to the way I handle the timeout on a timer.

In my daemon, the io_service thread may call close() function on the
socket object when another thread may call cancel() on the socket
object. Could this lead to a crash ? (I can post source code if needed)
What is the best way to handle receive timeout *and* socket & timer
close from another thread ?

Thread 1 (process 27120):
Program terminated with signal 11, Segmentation fault.
#0 0xb7c7c024 in pthread_mutex_lock () from
/lib/tls/i686/cmov/libpthread.so.0
No symbol table info available.
#1 0xb7d5f0c6 in pthread_mutex_lock () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#2 0x08060438 in boost::asio::detail::posix_mutex::lock (this=0x16) at
/usr/include/boost/asio/detail/posix_mutex.hpp:71
        error = 0
#3 0x08060559 in scoped_lock (this=0xb796807c, m=@0x16) at
/usr/include/boost/asio/detail/scoped_lock.hpp:36
No locals.
#4 0x08060ad0 in
boost::asio::detail::epoll_reactor<false>::close_descriptor (this=0x2,
descriptor=-1291829448)
     at /usr/include/boost/asio/detail/epoll_reactor.hpp:297
        lock = {<boost::noncopyable_::noncopyable> = {<No data fields>}, mutex_
= @0x16, locked_ = 212}
        ev = {events = 6, data = {ptr = 0x25, fd = 37, u32 = 37, u64 = 8589934629}}
#5 0x0809b189 in
boost::asio::detail::reactive_socket_service<boost::asio::ip::tcp,
boost::asio::detail::epoll_reactor<false> >::close (this=0xb3004358,
impl=@0xb3037f1c, ec=@0xb7968128) at
/usr/include/boost/asio/detail/reactive_socket_service.hpp:210
No locals.
#6 0x0809b277 in
boost::asio::stream_socket_service<boost::asio::ip::tcp>::close
(this=0xb3000098, impl=@0xb3037f1c,
     ec=@0xb7968128) at
/usr/include/boost/asio/stream_socket_service.hpp:145
No locals.
#7 0x0809b2c5 in boost::asio::basic_socket<boost::asio::ip::tcp,
boost::asio::stream_socket_service<boost::asio::ip::tcp> >::close
(this=0xb3037f18) at /usr/include/boost/asio/basic_socket.hpp:253
        ec = {m_val = 0, m_cat = 0xb7c74c10}
#8 0x08091129 in Session::handleTimeout (this=0xb3037f18,
error=@0xb796821c) at Session.cc:80

Could someone bring me some explainations about this crash ?

Second kind of crash : (I really have no idea from where could come the
crash, the backtrace isn't very useful)

(gdb) bt full
#0
boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>::unlock
(this=0xb5631dac)
     at /usr/include/boost/asio/detail/scoped_lock.hpp:58
No locals.
#1 0x00000000 in ?? ()
No symbol table info available.

Could someone give me some advices about how to handle these crashes ?
How could I get more data in backtrace ?

Thanks in advance for your help.
Regards


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net