Hi Gentlemen,
I have an application that is using Boost.Log. The application is long-term running (days, weeks, maybe months). Currently I am experiencing random crashes in my application that seem related to Boost.Log. I dont't say that is is a bug in Boost.Log, but probably I am doing something wrong. I am using version 1.55.0 but I also tried upgrading to version 1.61.0 but it is still happening there.
I am currently able to reproduce the crash by forcing logs at a pace of 3 records per second. Then it takes up to 3 days for the crash to happen. The application is multi-threaded and therefore several threads may be logging at the same time. For that reason I chose "severity_logger_mt".
It seems related to the destruction of objects of type attribute_value_set. This is a gdb stacktrace of the crash (gcc O0 build):
(gdb) bt
#0 0xb6bdbed8 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:66
#1 0xb6bdf960 in __GI_abort () at abort.c:90
#2 0xb6c139fc in __libc_message (do_abort=2, fmt=0xb6cc8c44 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:200
#3 0xb6c1e17c in malloc_printerr (action=3, str=0xb6cc8d80 "double free or corruption (!prev)", ptr=<optimized out>) at malloc.c:4949
#4 0xb6b3fcb8 in boost::log::v2_mt_posix::attribute_value_set::~attribute_value_set() () from /usr/lib/libboost_log.so.1.55.0
#5 0xb6b42688 in boost::log::v2_mt_posix::record_view::public_data::destroy(boost::log::v2_mt_posix::record_view::public_data const*) ()
from /usr/lib/libboost_log.so.1.55.0
#6 0xb6b44c7c in boost::log::v2_mt_posix::core::push_record_move(boost::log::v2_mt_posix::record&) () from /usr/lib/libboost_log.so.1.55.0
#7 0xb6e66814 in Logger::addLog(Logger::LEVEL, char const*, char const*, int) () from /usr/lib/libtslog.so.1
#8 0x0005ba50 in addLogRusp (iMessageId=2, sMessage=<optimized out>, sProcess=0x8dee8 "Application", eLevel=Logger::LEVEL_INFO)
at ../src/RLS-V/Platform/System.inl.h:303
#9 RLS::Application::fnThreadRuspOut (thread=...) at ../src/RLS-V/Application.cpp:1447
#10 0xb6e7aa4c in Thread::startFunction(void*) () from /usr/lib/libtscommon.so.1
#11 0xb6edae38 in start_thread (arg=0xb3eff460) at pthread_create.c:313
#12 0xb6c7c318 in ?? () from /lib/libc.so.6
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
This is the global logger initialization (using some custom attributes):
BOOST_LOG_GLOBAL_LOGGER_INIT(appLogger, boost::log::sources::severity_logger_mt< robot::log::Severity >)
{
core::get()->add_global_attribute("TimeStamp", attributes::local_clock());
core::get()->add_global_attribute("ProcessID", AttrProcessId());
core::get()->add_global_attribute("LineCounter", attributes::counter< unsigned int >());
core::get()->add_global_attribute("Process", attributes::current_process_name());
core::get()->add_global_attribute("Scopes", attributes::named_scope());
core::get()->add_global_attribute("LineID", attributes::counter< unsigned int >(1));
core::get()->add_global_attribute("SimpleTimeStamp", AttrSimpleTime());
Logger::init();
sources::severity_logger_mt< Severity > lg;
return lg;
}
These are are my macros for logging:
//these are needed to convert a macro into a string
#define STRINGIFY(x) #x
#define TOSTRING(x) STRINGIFY(x)
/* NOTE:
* __FUNCTION__ is not a macro. It's a char pointer. So we use an empty std::string as basis to concatenate the char* to it
*/
#define R_LOG_ID(x,y) BOOST_LOG_SEV(robot::log::appLogger::get(), y) << boost::log::add_value("LogID", x) << boost::log::add_value("FLF", ROBOT_FLF)
#define R_LOG(x) R_LOG_ID(0, x)
#define ROBOT_FLF std::string() + TOSTRING(__FILE__) + ":" + TOSTRING(__LINE__) + " (" + __FUNCTION__ + ")"
While "R_LOG_ID" is the only one that is used currently.
There are currently three sinks active: syslog, console and a custom sink sending log records via an IPC mechanism (http://www.cs.cmu.edu/~./IPC/). I am currently testing if it will still happen if I deactivate one or more of those sinks.
I imagine it will be hard to tell the exact problem from the information above but maybe you guys could point me in the right direction where to dig deeper. I will happily provide any missing information and/or code snippets that could help.
Possibly there are well-known common mistakes that can lead to that memory corruption in attribute_value_set? Maybe something like lazy evaluation of log expressions or multi-threading issues?
Many thanks in advance for any help!
Robin Lorke