Boost logo

Boost Users :

Subject: Re: [Boost-users] [spirit] qi xml parser
From: Michael Powell (mwpowellhtx_at_[hidden])
Date: 2014-06-30 22:49:37


On Mon, Jun 30, 2014 at 5:07 PM, Michael Powell <mwpowellhtx_at_[hidden]> wrote:
> On Mon, Jun 30, 2014 at 4:44 PM, Michael Powell <mwpowellhtx_at_[hidden]> wrote:
>> On Mon, Jun 30, 2014 at 3:14 PM, Michael Powell <mwpowellhtx_at_[hidden]> wrote:
>>> Hello,
>>>
>>> I am building out a general use xml parser including attributes,
>>> arbitrary number of elements, and so on.
>>>
>>> So far so good, makes sense parsing names and so forth. However, how
>>> do you handle element content? Which could either be a string, or zero
>>> or more other elements (basically of the same rule as the enclosing
>>> element rule).
>>>
>>> It would seem you need a terminus, the empty element tag. In such a
>>> way that populates the parent (initial) element, and its children (of
>>> the same element kind).
>>>
>>> I'll be adapting structs to capture the results. I am also using a
>>> couple of helpful references, for instance:
>>>
>>> http://www.w3.org/TR/xml11/
>>> http://stackoverflow.com/questions/9473843/boost-spirit-how-to-extend-xml-parsing
>>
>> I'm not sure reading the Xml specification, and some boost tickets
>> from several years ago, the following couldn't represent content:
>>
>> content %= *(chars_ - chars_("<&")) | *(comment | child_element);
>>
>> Where comment is defined as expected. child_element is the potential
>> for recursion into the element grammar where content is defined.
>> Basically a member variable of the same type as the container struct
>> (element grammar).
>
> Indeed, I cook up a simple(ish) example, and I get the error:
>
> Error 3 error C2460:
> 'xml::xml_element_grammar<std::_String_const_iterator<std::_String_val<std::_Simple_types<char>>>,boost::spirit::ascii::space_type>::child_element'
> : uses 'xml::xml_element_grammar<std::_String_const_iterator<std::_String_val<std::_Simple_types<char>>>,boost::spirit::ascii::space_type>',
> which is being defined i:\source\kingdom
> software\cppxml\xml\xiparser.h 187 1 xml
>
> Nothing fancy, fairly plain-old-Xml there:
>
> using boost::spirit::qi::phrase_parse;
> using boost::spirit::ascii::space;
>
> std::string txt = "<test><one /><two>2</two><three att=\"3\"/></test>";
>
> xml::xml_element_grammar<> g;
> xml::xelement element;
>
> bool result = phrase_parse(txt.cbegin(), txt.cend(), g, space, element);
>
> How do you model when parent needs to look like a child, depending on
> the direction of the grammar's rule? In other words, the defining rule
> is a "parent", but when it's done parsing, it could very well operate
> like a child to a container parent.

I made it a little ways past this part. Focused on the simpler parts
and got those parsing fine.

typedef boost::make_recursive_variant<
            boost::variant<std::string, std::vector<boost::recursive_variant_> >
>::type tag_soup;

I'm not positive, but I think the best possible way to represent what
an Xml content can be, either a vector of xelement, or a std::string,
is to represent that fork in the road as a recursive_variant_. There's
still the parent/child nature to resolve, though. xelement[child] can
have an xelement[parent], and xelement[parent] has children.

>>> Also not sure quite how to capture the adapted parts at strategic rule
>>> opportunities.
>>>
>>> My domain model will look something like this, keeping it simple as possible:
>>>
>>> struct xattribute {
>>> std::string name;
>>> std::string value;
>>> };
>>>
>>> typedef std::vector<xattribute> xattribute_vector;
>>>
>>> struct xelement;
>>>
>>> typedef std::vector<xelement> xelement_vector;
>>>
>>> struct xelement {
>>> std::string name;
>>> std::string content;
>>> xattribute_vector attributes;
>>> xelement_vector children;
>>> };
>>>
>>> Thanks...
>>>
>>> Best regards,
>>>
>>> Michael Powell


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net