[Boost-bugs] [Boost C++ Libraries] #11600: boost property_tree exponential newline growth

Subject: [Boost-bugs] [Boost C++ Libraries] #11600: boost property_tree exponential newline growth
From: Boost C++ Libraries (noreply_at_[hidden])
Date: 2015-08-30 14:04:50


#11600: boost property_tree exponential newline growth
-----------------------------------------+---------------------------
 Reporter: Timo Strunk <Timo.Strunk@…> | Owner: cornedbee
     Type: Bugs | Status: new
Milestone: To Be Determined | Component: property_tree
  Version: Boost 1.59.0 | Severity: Problem
 Keywords: |
-----------------------------------------+---------------------------
 == Problem ==

 Boost "property_tree to xml" includes many newlines on roundtrip, when it
 is used without the trim_whitespace option.
 This makes using ptree unusuable, when not using it with trim_whitespace.
 ptree is not an option, when whitespace in xml text has to be actually
 preserved.

 == Example ==
 in.xml:
 {{{
     <simona_input>
       <simona_configuration>
         <coord residue_name="LI1" residue_id="0" chain_id="0" name="C"
 id="0">
           <X>0.0</X>
           <Y>0.0</Y>
 }}}
 rewritten.xml:
 {{{
     <simona_input>
       &#10; &#10; &#10; &#10; &#10; &#10; &#10;
       <simona_configuration>
         &#10; &#10; &#10; &#10;
         <coord residue_name="LI1" residue_id="0" chain_id="0" name="C"
 id="0">
 }}}

 == Cause ==
 The problem is due to the interpretation of strings in the following XML
 example:

 {{{
     <element1>
         <subelement/>
         <subelement/>
         <subelement/>
     </element1>
 }}}

 rapidxml interprets this as element1, which has 3 children + a textelement
 with a bunch of newlines + whitespace.

 == Fix ==
 Fixing this is easy, because the logic for removing this scenario is
 already present in trim_whitespace. The solution is to remove all
 whitespace in an xml element in case said xml element consists ONLY of
 whitespace. The diff (which applies against 1.59 rapidxml.hpp) is
 attached.

 == Reproduction ==
 I attached a testcase, which includes a clean 1.59 boost property tree and
 a fixed 1.59 boost property tree and a big test xml file.
 To reproduce: Call ./build.sh, which will generate a fixed and a broken
 executable. The executable will read in.xml and write it as stage1.xml.
 Then it will read stage1.xml and write it as stage2.xml and then again the
 same for stage3.xml.

 Open in.xml and compare the first lines against stage3.xml and you will
 see that the roundtrip included many newlines, which actually got encoded
 in the text as &#10;

 == Remarks ==
 This is a regression and did not happen before. This problem exists since
 at least four years http://stackoverflow.com/questions/6572550
 /boostproperty-tree-xml-pretty-printing

 == Change of existing behaviour - second bug ==
 The stackoverflow answer also shows a difference between now and before,
 as newlines are encoded now. This is a separate bug, but in my opinion
 there is no reason to encode \n or \t, as these are not reserved XML
 statements. They should be removed from detail/xml_parser_utils.hpp:73-74,
 as they change existing behaviour without a reason.

-- 
Ticket URL: <https://svn.boost.org/trac/boost/ticket/11600>
Boost C++ Libraries <http://www.boost.org/>
Boost provides free peer-reviewed portable C++ source libraries.

This archive was generated by hypermail 2.1.7 : 2017-02-16 18:50:18 UTC