Re: [Boost-bugs] [Boost C++ Libraries] #13402: Log format JUNIT generates invalid XML files with incorrect encoding

Subject: Re: [Boost-bugs] [Boost C++ Libraries] #13402: Log format JUNIT generates invalid XML files with incorrect encoding
From: Boost C++ Libraries (noreply_at_[hidden])
Date: 2018-04-07 09:18:59


#13402: Log format JUNIT generates invalid XML files with incorrect encoding
-------------------------------+-------------------------------
  Reporter: gallien@… | Owner: Gennadiy Rozental
      Type: Bugs | Status: new
 Milestone: To Be Determined | Component: test
   Version: Boost 1.66.0 | Severity: Problem
Resolution: | Keywords:
-------------------------------+-------------------------------

Comment (by Raffi Enficiaud):

> I realize how difficult it is to guarantee that everything printed to
 the junit XML is valid utf-8.

 Indeed. The problem that you are facing, as I understand it, is that you
 are comparing a string in the cp1252 domain that is not pure ascii, while
 an `std::string` does not carry any encoding information. This cp1252
 string is outputed **as is** to the JUNIT file, because boost.test does
 not interpret anything.

 This is a shortcoming that I believe boost.test should address at some
 point, but OTOH boost.test does not interpret any char that is outputted,
 simply because boost.test does not know anything about encoding.
 I do not know if I should at some point support this: unicode and code-
 point transformation are natively supported on Windows, while on other
 operating systems I need to include an external library, which I do not
 want. I haven't looked into C++11 encoding facilities, maybe it is easier
 now.

 The idea would be to be able to declare what encoding is being used for
 strings, and to transform to utf-8. Transforming to utf-8 is also
 something that you have to do to be correct: if you say that your source
 code is utf-8, it is likely that at some point you will output a string
 that is utf-8 encoded, while here you are willing to turn everything to
 cp1252 because the input is cp1252. This approach will not scale very well
 as some encoding will get mixed in the resulting log. The right approach
 would be to transform everything to eg. utf-8 (or at least the correct
 encoding that is declared in the xml file).

 For now, I would just suggest to transform the strings to utf-8, until I
 come up with a correct handling in boost.test. After all, there are not so
 many chars that should be transformed in the cp1252 (and that you need).

-- 
Ticket URL: <https://svn.boost.org/trac10/ticket/13402#comment:13>
Boost C++ Libraries <http://www.boost.org/>
Boost provides free peer-reviewed portable C++ source libraries.

This archive was generated by hypermail 2.1.7 : 2018-04-07 09:25:20 UTC