Boost logo

Boost :

Subject: Re: [boost] [serialization] [libstdc++] [detail] utf8_codecvt_facet fixes broke serialization test_array_xml_warchive
From: Robert Ramey (ramey_at_[hidden])
Date: 2014-09-04 11:59:53

Beman Dawes wrote
> The specific crash message is:
> *** Error in
> `../../../bin.v2/libs/serialization/test/test_array_xml_warchive.test/clang-linux-libstdcpp/debug/test_array_xml_warchive':
> double free or corruption (!prev): 0x00000000015f6f90 ***
> It occurs for clang, gcc, and intel compilers, using libstdc++. It does
> not
> occur with clang using libc++. It does not occur with msvc 10.0, 11.0, or
> 12.0.
> None of the other libraries (filesystem, log, program_options,
> property_tree) that use utf8_codecvt_facet are failing on develop.
> This is mainly a heads up to let people know that the serialization
> problem
> in develop is being worked on, but it may be a day or two before I have a
> fix.

Hmmmm - this looks like new behavior. I don't remember changing anything
that might provoke this. Am I wrong or is there some other change (perhaps
in another library) which provokes this? Since C++11 we had
some problems with utf8_codecvt_facet due to confusion between the
now "built-in" implementation and the original "home grown" version. It
took some time to sort out because it varied according to which combinations
of compiler version and compiler switches were selected and no one has
all combinations on their desktop. So fair warning about being too hasty
about fixing this or declaring it fixed. I got trapped several times this

Also note that it seems that is only used on wide character strings and lots
of other libraries don't require these. So it might be wrong and our tests
might not be sufficiently exhaustive to detect this.

This raises another interesting question. For many years we've been relying
on Ron Garcia's original codecvt facet which has worked fine. This in spite
of the
fact that it was never reviewed and attempts to include in boost outside of
the detail directories were rebuffed. I snuck the documentation and tests
of it into the serialization library as I needed it and had no other choice.

But now it's sort of intertwined with the std implementation (IRC) which is
part of the
problem. A better solution might be a new library for codecvt facets.
is a rich opportunity here. The codecvt interface is actually quite general
codecvt facets can be used for translating text from one coding to another
even without any i/o involved. This new library would consist of

a) a codecvt "construction kit" consisting of code from the data flow
of the serialization library and/or implementations from the boost range
b) This "construction kit" would permit one to compose a "codecvt stack" of
at compile time.
c) any such "codecvt stack" could be used as a stream facet or as a stand
way to translate one character stream to another.

Having invested some time learning about how codecvt facets work, I've come
the conclusion that they are largely un appreciated. I'm guessing billions
of lines
of hand rolled code (BLOC s) which implement conversions on a pair by pair
could be replaced by such a library. Making such a library would be more or
less straight forward, but would require a lot of care to related issues
such as
it's documentation in order to make it more widely used. But the person who
does this will likely become as famous as I am.

Robert Ramey

View this message in context:
Sent from the Boost - Dev mailing list archive at

Boost list run by bdawes at, gregod at, cpdaniel at, john at