Boost logo

Boost :

Subject: Re: [boost] [Stacktrace] review, please stop discussing non-Stacktrace issues
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2016-12-18 08:56:00


On 17 Dec 2016 at 20:32, Peter Dimov wrote:

> > In an exception handler you cannot call any async unsafe routine such as
> > anything in MSVCRT nor anything
> implemented by kernel32.dll in userspace. As on POSIX, almost all syscalls
> implemented entirely in kernel space are safe.
>
> Thanks Niall. Do you know which Windows API functions are safe and which
> aren't? I couldn't find a list anywhere.

I don't think there is even a list internal to Microsoft. There are
various user compiled lists around the internet, the ReactOS team
also have a good list somewhere on their mailing list.

I've seen my code which worked perfectly on Win7 rarely and randomly
deadlock on Win10 and one time, vice versa. WOW64 also has a very
different safe list to native Win32.

There are some obvious things not to call: anything which obviously
runs code in userspace.

> > Antony makes the valid point that on Windows there are race problems with
> > the DbgHelp library, in fact not only is it not async-unsafe, it's also
> > thread-unsafe.
>
> He doesn't use DbgHelp in the Windows backend though, he uses Dbgeng.h. This
> is not the same thing, I think?

Woohoo!

That is amazing news, and congrats to Antony for getting Dbgeng
working. Last month when I prereviewed Stacktrace I mentioned that
DbgHelp was a steaming pile of poo and that I had had much more
reliable experience with the thoroughly superior Dbgeng. Antony asked
for some example code because Dbgeng is barely documented, I no
longer had access to the code I wrote many years ago which used it. I
tried to cobble something together and I made some progress over what
Antony had, but I ran out of time due to needing to mind the recent
new baby. I'm not actually sure what he changed from what I sent him,
mine and his look very close, I must have missed something very
small.

I hadn't realised Antony figured it out and had assumed he was still
on DbgHelp, and the fact he's using Dbgeng makes Stacktrace much
superior to 99% of the windows stack trace implementations out there.
Particular benefits include:

* Dbgeng understands non-native stack frames, so mixed .NET, WinRT
and C++/CLI stack traces just work.

* Dbgeng is threadsafe.

* Dbgeng doesn't randomly fail and randomly work next time you call
it.

> > Of course Windows has signals, as already referred to by myself earlier
> > it's called vectored exception handling which is exactly the same as a
> > signal implementation.
>
> Not quite. A signal immediately suspends the thread and calls the handler in
> it. Windows exception handling, in contrast, unwinds the stack. So if the
> kernel crashes somewhere deep, it can unwind itself to a usable state before
> the program gets to handle the exception. Or at least that's my
> understanding.

You're missing a few steps.

1. RaiseException() is like kill() and starts the signal handling
process with parameters. Hardware exceptions raise an exception at
the immediately point of ocurrance i.e. inside any locks etc.

2. Any installed vectored exception handlers are like sigaction()
except they are called for all exception codes. The handler returns
whether it handled it or whether to keep searching. Vectored
exception handlers are process wide.

3. If still unhandled, the Thread Information Block (TIB) for the
thread where the exception occurred is asked for the current
thread-local TEH (Table Exception Handling) on x64 or SEH (Structured
Exception Handling) on x86. A search for all handlers installed for
all code in the stack until the point of exception are called in
reverse order. Each handler may handle the exception, or say to keep
searching.

4. Every thread is always begun with a default exception handler
which opens that famous dialog box and terminates the process, so if
your exception reaches this the right thing happens.

5. C++ exceptions are implemented 100% a client of the same TEH and
SEH framework. In fact they are simply a RaiseException(0xE06D7363,
...). If you dive into the implementation of __CxxThrowException()
you'll see that.

What's very important to note is that all this occurs without
unwinding the stack and at the point of the exception with any locks
still held. This is the source of the reentrancy which causes the
deadlocks if you call any userspace implemented code from an
exception handler. The reason it doesn't unwind is because a handler
may choose to restart execution of the failed operation. There was a
very clever commercial C++ object-to-disk system a very long time ago
which serialised C++ objects out to disk and removed the RAM storage.
When the C++ program faulted on accessing them, the SEH handler would
deserialise the object back into RAM and restart the failed
instruction. Worked beautifully.

Niall

-- 
ned Productions Limited Consulting
http://www.nedproductions.biz/ 
http://ie.linkedin.com/in/nialldouglas/

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk