Boost logo

Boost :

Subject: Re: [boost] [Boost.Serialization] Crash in current master (1.68)
From: Robert Ramey (ramey_at_[hidden])
Date: 2018-07-11 19:20:06


On 7/11/18 9:32 AM, Alexander Grund via Boost wrote:
>
>> The problem is that I don't see any connection between your proposed
>> change and the failed test.  I'm not disputing that your change makes
>> a problematic symptom go away.  But I haven't been able to invest
>> enough time to really understand why that might be so.
> Please do not focus on that currently checked in test (test_dll_load or
> what it was) because that one is to complex to reason about.

it's test_dll_export. It's the simplest test I could figure out.
There's another one that tests dynamic loading of DLL test_dll_plugin.
I don't run that test as I would have to dive a lot more into bjam than
I can aford to do.

> Please check the very basic singleton interface tests I even added as a
> separate PR (https://github.com/boostorg/serialization/pull/110) which
> clearly shows, that the current implementation is flawed.

I think I did that once. On this basis I added a test_singleton to the
test suite. I'm pretty
https://github.com/boostorg/serialization/blob/master/test/test_singleton.cpp
. I does check that all singletons are destroyed in the reverse order.
Note: this test doesn't appear in any of the "teeks99" tests. I have no
idea why that might be.

Also note that the current singleton implementation does no allocation
from the heap. If there is a memory leak - it's not obvious why
singleton might be causing it.

OK - i've looked again at your pr. It replaces my test with your own.
Rather than doing that, it's been my practice to add new tests. This
helps prevent turning into a process of whack-a-mole. I'll add these
two new tests and see if I can re-produce the results on my own machine.

> Additionally I have (empiric) evidence that destruction order is
> unreliable (see the test I sent around in this ML).
I looked as one or two of your tests and as a result updated the test
suite as above. So I think things are covered. I'll double check though.

> So what my proposed change is, is simply:
>      a) fix the bug in the implementation shown by #110 (should be
> indisputably the right thing to do) and

LOL - sorry it's not indiputable to me.

>      b) check if the singleton accessed by a singleton-destructor is
> still alive (equivalent of `if(foo) foo->bar()`)

I will check this.

> I don't see how this fixes a symptom only. It tackles the root causes: A
> defect in the code and a use-after-free caused by the unexpected
> destruction order

I do not believe there is any freeing going on here.

(although the cause for that is unclear and only know
> to be related to shared libraries).

Right. I do not think its possible to avoid problems if libs are not
unloaded in the reverse order that they are loaded. I believe that this
will be something that has to be enforced at the application level. It
would be nice if it were possible to detect this from the library, but
without any standard defined behavior, this is going to be difficult.

> I do acknowledge your lack of time and because the fix is so
> logical/straight forward I asked for another reviewer to relieve you
> from that or help out.

I don't always agree with the PRs and the analysis of other
participants. I try to give all such suggestions the attention they
deserve and recognize the hard work that they entail - even if I decline
to include them. On the other hand. Most of the suggestions do
eventually get incorporated - perhaps after some back and forth to
refine them, add/update tests, etc. It is only because of efforts by
persons such as yourself, that I have been able to maintain the library
after all these years. And it only because of these efforts that I keep
motivated to keep it as perfect as possible. You make me a better man.

>> The whole issue of dynamically loaded DLLS/shared libraries has been a
>> festering sore for at least 20 years now.  It's undefined by the
>> standard and the committee has not seen fit to address it. Sorry it
>> just takes time to get this right.

> I also do understand this. But I have proof that your latest change
> causes a (possible) crash instead of a (certain) memory leak and hence
> ask you to revert that change for 1.68 or the library will be unusable
> for some users. Once there is more time, the issue can be fully resolved.

This is the crux. I do not this can be resolved definitively without a
real understanding of the source of the problem.

>
> Alex Grund
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk