Subject: [Boost-bugs] [Boost C++ Libraries] #11600: boost property_tree exponential newline growth
From: Boost C++ Libraries (noreply_at_[hidden])
Date: 2015-08-30 14:04:50
#11600: boost property_tree exponential newline growth
-----------------------------------------+---------------------------
Reporter: Timo Strunk <Timo.Strunk@â¦> | Owner: cornedbee
Type: Bugs | Status: new
Milestone: To Be Determined | Component: property_tree
Version: Boost 1.59.0 | Severity: Problem
Keywords: |
-----------------------------------------+---------------------------
== Problem ==
Boost "property_tree to xml" includes many newlines on roundtrip, when it
is used without the trim_whitespace option.
This makes using ptree unusuable, when not using it with trim_whitespace.
ptree is not an option, when whitespace in xml text has to be actually
preserved.
== Example ==
in.xml:
{{{
<simona_input>
<simona_configuration>
<coord residue_name="LI1" residue_id="0" chain_id="0" name="C"
id="0">
<X>0.0</X>
<Y>0.0</Y>
}}}
rewritten.xml:
{{{
<simona_input>
<simona_configuration>
<coord residue_name="LI1" residue_id="0" chain_id="0" name="C"
id="0">
}}}
== Cause ==
The problem is due to the interpretation of strings in the following XML
example:
{{{
<element1>
<subelement/>
<subelement/>
<subelement/>
</element1>
}}}
rapidxml interprets this as element1, which has 3 children + a textelement
with a bunch of newlines + whitespace.
== Fix ==
Fixing this is easy, because the logic for removing this scenario is
already present in trim_whitespace. The solution is to remove all
whitespace in an xml element in case said xml element consists ONLY of
whitespace. The diff (which applies against 1.59 rapidxml.hpp) is
attached.
== Reproduction ==
I attached a testcase, which includes a clean 1.59 boost property tree and
a fixed 1.59 boost property tree and a big test xml file.
To reproduce: Call ./build.sh, which will generate a fixed and a broken
executable. The executable will read in.xml and write it as stage1.xml.
Then it will read stage1.xml and write it as stage2.xml and then again the
same for stage3.xml.
Open in.xml and compare the first lines against stage3.xml and you will
see that the roundtrip included many newlines, which actually got encoded
in the text as
== Remarks ==
This is a regression and did not happen before. This problem exists since
at least four years http://stackoverflow.com/questions/6572550
/boostproperty-tree-xml-pretty-printing
== Change of existing behaviour - second bug ==
The stackoverflow answer also shows a difference between now and before,
as newlines are encoded now. This is a separate bug, but in my opinion
there is no reason to encode \n or \t, as these are not reserved XML
statements. They should be removed from detail/xml_parser_utils.hpp:73-74,
as they change existing behaviour without a reason.
-- Ticket URL: <https://svn.boost.org/trac/boost/ticket/11600> Boost C++ Libraries <http://www.boost.org/> Boost provides free peer-reviewed portable C++ source libraries.
This archive was generated by hypermail 2.1.7 : 2017-02-16 18:50:18 UTC