Boost logo

Boost :

Subject: Re: [boost] [Backtrace] Any interest in portable stack trace?
From: Lassi Tuura (lat_at_[hidden])
Date: 2010-10-21 14:05:58


Hi,

> Very Good idea:
>
> Limitations of use:
>
> 1. No it is not asynchronous interrupts safe
> 2. It would not likely work when the application is unstable or as serious
> issues
> 3. The trace collection is expected to be very fast (as walking on linked list)
> 4. The trace printing may be quite slow and costly, expected to be used quite
> rarely.

Good :-)

FWIW, backtrace() on recent'ish linux is fairly heavyweight. It's definitely not a frame pointer walk as your 3 above seems to suggest. For example on most x86_64 systems there's no frame pointer chain to walk to begin with, so it's full blown DWARF unwind - which among other things can allocate memory. Recent x86 GLIBC / linux will also do the full unwind on 32-bit, not just a simple frame pointer walk.

For a few random stack walks it's very acceptable, but you wouldn't want to call it thousands of times a second on deep stacks with lots of shared libraries. Exactly where the performance becomes an issue you'd have to benchmark and decide for yourself.

> Notes:
>
> - Generally trace collection should be interrupt safe, but it is up to the
> specific implementation of libc backtrace function on specific platform.

I doubt any libc backtrace() which uses unwind info is async safe. The only async-safe unwind library I know is libunwind.

> - It is possible to create degraded print tracing in case the process is unstable such, it
> would be safe to use in signal handlers, but it is unlikely to provide full information.
> Only pointers data and not symbols.

Of course the very act of printing gets complicated inside signal handlers and/or when the app is unstable. For one, you can't really allocate memory, so things like iostreams, std:strings and other high-level interfaces (including __cxa_demangle) are out. You are pretty much reduced to making system calls only (e.g. unix write()).

Calls to dladdr() are not async signal safe, but they are mostly ok in a crash.

As I noted, most backtrace() implementations are sufficiently complex to sometimes fail when the application has become unstable. Fortunately the failures will be rare, so the tool can still be useful most of the time. Unfortunately the failure modes can be nasty, either recursive signals (crash handlers doing stack dumps should have a static counter and abandon after N recursive signals), or dead-locks (no fix that I know of, not even any way to know if it's unsafe to call - you'll just be unlucky).

Where you draw the line on "useful even if not perfect" is really up to you and why I suggest to specify the limitations above, as you did.

> Limitations:
>
> - Under ELF platform - system should be compiled with -rdynamic
> - Under Windows/MSVC - debug information should be provided.
> - Function inlining and omitting frame pointers would significantly reduce
> visible frames
> of calls.
> - Static function would not be resolved under ELF platforms.
>
> Bottom line:
>
> - It can't be as powerful as Java's printStackTrace as C/C++ generally
> has fewer runtime information
> - It can be very useful with top level catch blocks to see what happen.
>
>
> Artyom
>
> P.S.: The implementation exists take a look on it and see if it useful enough
> P.P.S.: I don't think the implementation should include much more then now.

Sure, I did. I'm not really a client for it, just making recommendations from experience.

FWIW, I regularly use apps that dump a stack trace on themselves on crash. One of the most annoying failure modes is... the stack printing getting out of control, creating a fork bomb, or dead-locking forever leaving hanging processes on the system.

My experience is that it's relatively easy to make a stack dump that works often - and really quite hard to make one that Really Just Works(tm). It's surprisingly annoying to end up with side effects from the stack tracer itself in one run in 100, or 200, or even 1000. It's really up to you where on that continuum you want to position your library; pretty much no matter what you choose it will still be useful for some people :-)

Regards,
Lassi


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk