[Boost-bugs] [Boost C++ Libraries] #4340: property_tree xml parser do not handle wstream with i18n char well

Subject: [Boost-bugs] [Boost C++ Libraries] #4340: property_tree xml parser do not handle wstream with i18n char well
From: Boost C++ Libraries (noreply_at_[hidden])
Date: 2010-06-13 15:01:51


#4340: property_tree xml parser do not handle wstream with i18n char well
----------------------------------------+-----------------------------------
 Reporter: zhuo.qiang@… | Owner: cornedbee
     Type: Patches | Status: new
Milestone: Boost 1.43.0 | Component: property_tree
  Version: Boost Development Trunk | Severity: Problem
 Keywords: property_tree i18n unicode |
----------------------------------------+-----------------------------------
 boost::property_tree::read_xml(wistream&, wptree&) throw exception if
 tag/value/attribute has a unicode with '\0' inside,

 for example, for chinese character L'\u4E00', an exception will be throwed
 because the parser treat L'\u4E00' as '\0'.

 for example, the following test case won't pass:

 BOOST_AUTO_TEST_CASE(test_i18n_xml_tag_name)
 {
     wistringstream in(L"<\u4E00>abc</\u4E00>");
     property_tree::wptree pt;
     BOOST_REQUIRE_NO_THROW(property_tree::read_xml(in, pt));
     BOOST_CHECK(L"abc" == pt.get<wstring>(L"\u4E00"));
 }

 BOOST_AUTO_TEST_CASE(test_i18n_xml_attribute_name)
 {
     wistringstream in(L"<tag \u4E00=\"abc\">def</tag>");
     property_tree::wptree pt;
     BOOST_REQUIRE_NO_THROW(property_tree::read_xml(in, pt));
     BOOST_CHECK(L"abc" == pt.get<wstring>(L"tag.<xmlattr>.\u4E00"));
 }

 BOOST_AUTO_TEST_CASE(test_i18n_xml_attribute_value)
 {
     wistringstream in(L"<tag attribute=\"\u4E00\">def</tag>");
     property_tree::wptree pt;
     BOOST_REQUIRE_NO_THROW(property_tree::read_xml(in, pt));
     BOOST_CHECK(L"\u4E00" == pt.get<wstring>(L"tag.<xmlattr>.attribute"));
 }

 BOOST_AUTO_TEST_CASE(test_i18n_xml_tag_value)
 {
     wistringstream in(L"<tag>\u4E00</tag>");
     property_tree::wptree pt;
     BOOST_REQUIRE_NO_THROW(property_tree::read_xml(in, pt));
     BOOST_CHECK(L"\u4E00" == pt.get<wstring>(L"tag"));
 }


 The fix is treat the all char whose value above 255 same as 'z' when doing
 semantic action.

-- 
Ticket URL: <https://svn.boost.org/trac/boost/ticket/4340>
Boost C++ Libraries <http://www.boost.org/>
Boost provides free peer-reviewed portable C++ source libraries.

This archive was generated by hypermail 2.1.7 : 2017-02-16 18:50:03 UTC