Boost logo

Boost :

Subject: Re: [boost] [Boost.Serialization] Crash in current master (1.68)
From: Alexander Grund (alexander.grund_at_[hidden])
Date: 2018-07-12 07:45:27


> it's test_dll_export.  It's the simplest test I could figure out.
> There's another one that tests dynamic loading of DLL test_dll_plugin.
> I don't run that test as I would have to dive a lot more into bjam
> than I can aford to do.
Did you test what I sent around in this ML? As already explained I could
not get it into bjam either, but it does show the crash(!) on all linux
systems without further fiddling highlighting the need of an action.
> Also note that the current singleton implementation does no allocation
> from the heap.  If there is a memory leak - it's not obvious why
> singleton might be causing it.
The current implementation does NOT have a leak but a CRASH. Thats why I
want you to reverse that as the former is less severe (see above).
> OK - i've looked again at your pr.  It replaces my test with your own.
> Rather than doing that, it's been my practice to add new tests.  This
> helps prevent turning into a process of whack-a-mole.  I'll add these
> two new tests and see if I can re-produce the results on my own machine.
My test was developed before yours, so I had a merge conflict. Looking
at your test I found that it tests the wrong thing: Instead of testing
that the interface of singleton is correct, it tests the
construction/destruction order. That is guaranteed by the compiler,
hence you actually test the compiler. As that order only changes in the
context of shared libraries I did not see any added value of the test,
hence the complete overwrite (yeah, sorry, but was the easiest and as
explained a test of the interface is better than a test of the compiler)
>> Additionally I have (empiric) evidence that destruction order is
>> unreliable (see the test I sent around in this ML).
> I looked as one or two of your tests and as a result updated the test
> suite as above. So I think things are covered.  I'll double check though.
If you had run the test I sent around you'd see that this is not
covered. Destruction order is guaranteed except in the case of shared
libraries which is hard to get into bjam (well actually not, but it
requires building with fPIC. If that's possible/allowed, I'll add a test
to bjam that will crash on travis although it is very valid.
>> So what my proposed change is, is simply:
>>       a) fix the bug in the implementation shown by #110 (should be
>> indisputably the right thing to do) and
>
> LOL - sorry it's not indiputable to me.
Can you explain that a bit more? #110 shows that `is_destroyed` does
return a wrong value. Why would it be not right to fix that?
>> I don't see how this fixes a symptom only. It tackles the root
>> causes: A defect in the code and a use-after-free caused by the
>> unexpected destruction order
>
> I do not believe there is any freeing going on here.
An object is used after it is destroyed. That object has a map which is
freed on destruction but accessed afterwards. Try valgrind on the code I
sent around and it will show you the use-after-free.
> Right.  I do not think its possible to avoid problems if libs are not
> unloaded in the reverse order that they are loaded. I believe that
> this will be something that has to be enforced at the application
> level.  It would be nice if it were possible to detect this from the
> library, but without any standard defined behavior, this is going to
> be difficult.
How can this be enforced if there is no (manual) dynamic loading going
on? You simply link 2 shared libraries and the application crashes on
exit. Nothing a user can do about that.
>> I also do understand this. But I have proof that your latest change
>> causes a (possible) crash instead of a (certain) memory leak and
>> hence ask you to revert that change for 1.68 or the library will be
>> unusable for some users. Once there is more time, the issue can be
>> fully resolved.
>
> This is the crux.  I do not this can be resolved definitively without
> a real understanding of the source of the problem.
Well the source is indeterminate destruction order. You could try to
find the source for that or take it as given and handle it properly
which is easily possible.

If you have some time we can discuss this in IRC or another platform
directly. But please test what I sent around first. Otherwise we are
running around in circles as you don't seem to believe me that there is
a crash in a valid use case without anything a user can do.

Alex Grund


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk