Boost logo

Boost :

Subject: Re: [boost] [serialization] [libstdc++] [detail] utf8_codecvt_facet fixes broke serialization test_array_xml_warchive
From: Beman Dawes (bdawes_at_[hidden])
Date: 2014-09-05 11:15:25


On Thu, Sep 4, 2014 at 11:59 AM, Robert Ramey <ramey_at_[hidden]> wrote:

> Beman Dawes wrote
> > The specific crash message is:
> >
> > *** Error in
> >
> `../../../bin.v2/libs/serialization/test/test_array_xml_warchive.test/clang-linux-libstdcpp/debug/test_array_xml_warchive':
> > double free or corruption (!prev): 0x00000000015f6f90 ***
> >
> > It occurs for clang, gcc, and intel compilers, using libstdc++. It does
> > not
> > occur with clang using libc++. It does not occur with msvc 10.0, 11.0, or
> > 12.0.
>

Although the regression tests are only showing failures on non-Windows
systems, the failure is also easy to reproduce using cygwin/gcc on Windows.
It occurs in both C++03 and C++11 modes.

> >
> > None of the other libraries (filesystem, log, program_options,
> > property_tree) that use utf8_codecvt_facet are failing on develop.
> >
> > This is mainly a heads up to let people know that the serialization
> > problem
> > in develop is being worked on, but it may be a day or two before I have a
> > fix.
>
> Hmmmm - this looks like new behavior.

Actually, this is the same problem Marshall ran into a year or so ago when
he fixed boost/detail/utf8_codecvt_facet.hpp:

---
C:\Users\Beman\AppData\Local\Temp\TortoiseGit\utf253F.tmp\utf8_codecvt_facet-5ef03bf-left.hpp
2014-09-05 10:23:25.000000000 -0400
+++
C:\boost\modular\develop\libs\detail\include\boost\detail\utf8_codecvt_facet.hpp
2014-09-05 08:43:11.000000000 -0400
@@ -89,13 +89,13 @@
 namespace std {
     using ::mbstate_t;
     using ::size_t;
 }
 #endif
-#if !defined(__MSL_CPP__) && !defined(__LIBCOMO__)
+#if defined(_CPPLIB_VER) && (_CPPLIB_VER < 540)
     #define BOOST_CODECVT_DO_LENGTH_CONST const
 #else
     #define BOOST_CODECVT_DO_LENGTH_CONST
 #endif
 // maximum lenght of a multibyte string
>   I don't remember changing anything
> that might provoke this.  Am I wrong or is there some other change (perhaps
> in another library) which provokes this? Since C++11 we had
> some problems with utf8_codecvt_facet due to confusion between the
> now "built-in" implementation and the original "home grown" version.
AFAIK, serialization is the only library that tries to switch between the
std:: version and the boost:: version. It is quite clear the bug is in
serialization (or even stdlibc++ codecvt) rather than in the boost::detail
code.
>   It
> took some time to sort out because it varied according to which
> combinations
> of compiler version and compiler switches were selected and no one has
> all combinations on their desktop.
The bug is showing up regardless of the compiler version or switches. It is
easy to demonstrate; just switch back and forth between the two versions of
the #if line.
>   So fair warning about being too hasty
> about fixing this or declaring it fixed.  I got trapped several times this
> way.
>
The #if bug and several other bugs in boost::detail that got introduced
trying to make serialization work around the time Marshall introduced his
original patch. While those changes papered over the problem in
serialization, they are causing bug reports to be posted against other
libraries, particularly filesystem.
>
> Also note that it seems that is only used on wide character strings and
> lots
> of other libraries don't require these.  So it might be wrong and our tests
> might not be sufficiently exhaustive to detect this.
>
In filesystem, all BSD-based operating systems (such as Mac OS X) use the
boost::detail code.
>
> This raises another interesting question.  For many years we've been
> relying
> on Ron Garcia's original codecvt facet which has worked fine.  This in
> spite
> of the
> fact that it was never reviewed and attempts to include in boost outside of
> the detail directories were rebuffed.  I snuck the documentation and tests
> of it into the serialization library as I needed it and had no other
> choice.
>
> But now it's sort of intertwined with the std implementation (IRC) which is
> part of the problem.
Does any Boost library other than serialization try to switch between
boost:: and std:: versions?
>   A better solution might be a new library for codecvt facets.
> There
> is a rich opportunity here.
Why? Microsoft, for example, ships codecvt facets for 79 character sets,
including the difficult Asian character sets. Why should boost try to
duplicate the work that vendors have already done, particularly when
Unicode become predominate.?
--Beman

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk