Boost logo

Boost :

From: Sebastian Redl (sebastian.redl_at_[hidden])
Date: 2006-04-26 13:48:39


Jose wrote:

>I am currently testing only the read_xml parsing, and although it is only
>meant for very simple xml files i find its xml support very very sketchy.
><snip>
>1. parsing the artima.com spotlight feed
>
>Result: FAILED
>
>The path is rdf:RDF.item.title and I get invalid character entitly.
>I think the parser should support the semicolon within the tag name, given
>that in many cases the config files might be generated by real xml programs
>which use namespaces and it should be able to read them even if it does not
>support save.
>
>
The problem here is not the colon. The problem is the &quot; entity,
which is a required part of XML but is not supported by
boost::property_tree::xml_parser::decode_char_entities() in
detail/xml_parser_utils.hpp, lines 62-87. Also not supported is the
&apos; entity, which is also required.
Definitely a bug in PropTree.

>2. parsing the MSDN visual c++ feed
>
>Result: FAILED
>
>The path is rss.channel.item.title and I get an "xml parse error". Is there
>a posibility of getting more meaningful errors ?
>
>
Do you have an URL?

>3. parsing the main CNN feed
>
>Result: FAILED
>
>The path is rss.channel.item.title. This query fails with no error but if
>the path is shortened to rss.channel.item it dumps all the values within
>item, but there is no value at that level (only nested tags)
>
>
You misunderstand your own program. A node has only one value. What your
loop does it retrieve all the children of the node you select with the
path and print their values.
So for the path rss.channel.item.title, you get the title element of the
first item element in the channel. This element has no children, so the
loop is never entered.
In your second test you specify rss.channel.item, so you get the item
element. This element has four children: the title, link, description
and pubDate elements. For each of these children, the value (content) is
printed.
The test succeeded.

>4. Parsing the Google News RSS feed
>
>Result: FAILED
>
>The path is rss.channel.item.title. I get "Invalid character entity error".
>A more meaningful error should be possible with the position in the file
>where the entity occurs.
>
>
Again, the problem seems to be the &quot; entity.

>5. Parsing the Google News Atom feed
>
>Result: FAILED
>
>The path is feed.entry.title. I get "Invalid character entity error".
>
>
Same.

Attached is a patch that fixes the bug.

Sebastian Redl




Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk