Boost logo

Boost :

Subject: Re: [boost] [Backtrace] Any interest in portable stack trace?
From: Lassi Tuura (lat_at_[hidden])
Date: 2010-10-26 18:30:51


Hi,

>> or unstable. It actually crashes sometimes.
>> [...]
>> Scary.
>
>
> Can you give me more information about it? Do you have
> any references? Because if backtrace crashes I would likely
> just not use this feature and this library would not be actually created.

As I mentioned before, there are uses for which backtrace() is perfectly
valid choice and will result in a fine user experience. There are limitations
and you need to decide if your library is fine with those limitations. It
will almost certainly be useful to some set of users - and almost certainly
will not work reliably for some users. Such is life :-)

We have applications which perform up to about 250k stack walks a second,
with average stack depth of 33 and maybe 700 loaded shared libraries. Max
stack depth is maybe 200-300. We'd do million walks per second if we found
a library capable of doing it. So fairly heavy use.

I can't really quote a lot of specific examples on how linux' backtrace()
fails, mostly because we moved to something else fairly early on as linux'
backtrace() wasn't at all viable - for our use.

From what I recall the main limitations on backtrace() were that a) it could
be very slow on a large binary, meaning a single stack trace could exceed 5ms
on a reasonably modern x86_64 system; b) it can call malloc so unsafe to use
inside malloc itself; c) it makes other calls which are unsafe when inside an
asynchronous signal handler (result: deadlocks and crashes); d) for the same
reasons, it's not re-entrant _in the same thread_, so problematic if you are
unlucky enough to get nested signals and try to call backtrace() again while
one was already running (NB: in the same thread!).

About 10% of our x86_64 stack walks hit incorrect or inaccurate unwind info or
various creative corner cases on 64-bit RHEL5-derived system. The issues split
into maybe 5-10 different categories. I wouldn't know which ones backtrace()
is protected from, but my experience seemed to indicate it probably is largely
unprotected. Which is fine - as long as you don't trigger a stack walk that
requires those corner cases to be handled!

If you limit yourself to calling backtrace() only inside the program flow,
not in signal handlers, not in instrumentation the compiler wasn't aware of,
and not from the global constructors / destructors, and you don't require high
performance, it should work just fine for you.

Note that backtrace() implementation has varied significantly over time and
platforms. If you are doing benchmarks, make sure you test enough variety of
systems. Something old like RHEL4 or RHEL5 may do something different than
newer systems like say a recent Ubuntu. I don't know much about backtrace()
on non-linux systems so can't speak for them.

Regards,
Lassi


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk