Re: [Boost-bugs] [Boost C++ Libraries] #13402: Log format JUNIT generates invalid XML files with incorrect encoding

Date view	Thread view	Subject view	Author view

Subject: Re: [Boost-bugs] [Boost C++ Libraries] #13402: Log format JUNIT generates invalid XML files with incorrect encoding
From: Boost C++ Libraries (noreply_at_[hidden])
Date: 2018-04-07 09:18:59

Next message: Boost C++ Libraries: "Re: [Boost-bugs] [Boost C++ Libraries] #10496: 100% cpu usage when reading serial port on MacOSX"
Previous message: Boost C++ Libraries: "Re: [Boost-bugs] [Boost C++ Libraries] #13402: Log format JUNIT generates invalid XML files with incorrect encoding"
In reply to: Boost C++ Libraries: "[Boost-bugs] [Boost C++ Libraries] #13402: Log format JUNIT generates invalid XML files with incorrect encoding"
Next in thread: Boost C++ Libraries: "Re: [Boost-bugs] [Boost C++ Libraries] #13402: Log format JUNIT generates invalid XML files with incorrect encoding"

#13402: Log format JUNIT generates invalid XML files with incorrect encoding
-------------------------------+-------------------------------
  Reporter: gallien@â€¦ | Owner: Gennadiy Rozental
      Type: Bugs | Status: new
Milestone: To Be Determined | Component: test
   Version: Boost 1.66.0 | Severity: Problem
Resolution: | Keywords:
-------------------------------+-------------------------------

Comment (by Raffi Enficiaud):

> I realize how difficult it is to guarantee that everything printed to
the junit XML is valid utf-8.

Indeed. The problem that you are facing, as I understand it, is that you
are comparing a string in the cp1252 domain that is not pure ascii, while
an `std::string` does not carry any encoding information. This cp1252
string is outputed **as is** to the JUNIT file, because boost.test does
not interpret anything.

This is a shortcoming that I believe boost.test should address at some
point, but OTOH boost.test does not interpret any char that is outputted,
simply because boost.test does not know anything about encoding.
I do not know if I should at some point support this: unicode and code-
point transformation are natively supported on Windows, while on other
operating systems I need to include an external library, which I do not
want. I haven't looked into C++11 encoding facilities, maybe it is easier
now.

The idea would be to be able to declare what encoding is being used for
strings, and to transform to utf-8. Transforming to utf-8 is also
something that you have to do to be correct: if you say that your source
code is utf-8, it is likely that at some point you will output a string
that is utf-8 encoded, while here you are willing to turn everything to
cp1252 because the input is cp1252. This approach will not scale very well
as some encoding will get mixed in the resulting log. The right approach
would be to transform everything to eg. utf-8 (or at least the correct
encoding that is declared in the xml file).

For now, I would just suggest to transform the strings to utf-8, until I
come up with a correct handling in boost.test. After all, there are not so
many chars that should be transformed in the cp1252 (and that you need).

-- 
Ticket URL: <https://svn.boost.org/trac10/ticket/13402#comment:13>
Boost C++ Libraries <http://www.boost.org/>
Boost provides free peer-reviewed portable C++ source libraries.

Date view	Thread view	Subject view	Author view

This archive was generated by hypermail 2.1.7 : 2018-04-07 09:25:20 UTC