Subject: Re: [Boost-bugs] [Boost C++ Libraries] #8883: property_tree JSON reader does not parse unicode characters properly
From: Boost C++ Libraries (noreply_at_[hidden])
Date: 2013-09-04 09:05:41
#8883: property_tree JSON reader does not parse unicode characters properly
----------------------------------+----------------------------------------
Reporter: Ronny Krueger | Owner: cornedbee
<rk@â¦> | Status: new
Type: Bugs | Component: property_tree
Milestone: To Be Determined | Severity: Problem
Version: Boost 1.54.0 | Keywords: property_tree JSON unicode
Resolution: |
----------------------------------+----------------------------------------
Comment (by ecotax@â¦):
@Lettort: There is a difference betweeen Unicode, specifying 'ä' maps to
code point E4, and the various ways to encode this code point in bits or
bytes. There is UTF-16, encoding this as 00E4 (16 bits, fits in a wide
char), but also UTF-8, encoding this as two bytes, C3 A4.
When parsing a /u00E4, the correct way to handle this depends on what
encoding you want for your string.
If you have a wide string and expect UTF-16, then yes, you'd expect the
wide char 00E4.
If you have a regular string and expect UTF-8, you'd expect the two bytes
C3 A4.
The original bug report states that first writing and then reading 'ä',
the writer (defensively?) writes this using two \u encoded characters,
each being one byte of the UTF-8 encoding. Regardless if this is the best
choice or not, you'd want the reader to handle this in such a way that it
'round-trips' as much as possible, which currently is not the case.
BTW, For future questions/discussions, I guess a site like
stackoverflow.com is more appropriate.
-- Ticket URL: <https://svn.boost.org/trac/boost/ticket/8883#comment:3> Boost C++ Libraries <http://www.boost.org/> Boost provides free peer-reviewed portable C++ source libraries.
This archive was generated by hypermail 2.1.7 : 2017-02-16 18:50:14 UTC