Boost logo

Boost :

Subject: Re: [boost] Standard c++ XML parser API (Boost.XML)
From: Bjorn Reese (breese_at_[hidden])
Date: 2014-03-20 04:34:18


On 03/18/2014 04:46 PM, Stefan Seefeld wrote:

> I don't see any reason why such an XML API wouldn't be usable by other
> Boost libraries.

It should be part of the GSoC project to verify this for the most
common use cases (XML serialization is the most obvious one.)

>> What is the purpose of the S template argument?
>
> To keep the concern for unicode or any other string type orthogonal from
> the XML library, i.e. to allow Boost.XML to interact with different
> Unicode implementations. In fact, in the existing demos I'm restricting
> content to ASCII, so I can in fact get away with using std::string, so
> this is a good example of the "modularity" design goal I mentioned
> above: Don't force anything on users they don't actually need.

I agree with the goal, but I am not sure that the S type solves the
problem. I must admit that I am having difficulty understanding exactly
how you envision it should work for other encodings, because std::string
is orthogonal to encoding (locale is usually attached to the I/O
stream.)

What encoding is used for std::string? ASCII, UTF-8, or "whatever the
XML library gives me"? This should be documented as part of the API
regardless of the answer.

Should I define a new string type if I want to use Latin-1 or another
encoding in my application? What if the rest of my application uses
std::string for Latin-1 encodings? (I am wondering how will work with
the current convert trait specialization for std::string.)

How does the convert trait know the XML document encoding so that it
is able to convert between this and the application encoding?

I suggest that you adopt the libxml2 design decision to always use
UTF-8 for std::string (and UTF-16 for std::wstring if needed.) See
the design rationale here:

   http://xmlsoft.org/encoding.html

Any backend that does not provide UTF-8 will have to be wrapped.

With such a design decision, the S template parameter becomes
superfluous (or should be changed to CharT if you wish to support
both std::string and std::wstring.)

Conversion between UTF-8 and application encodings would have to
be done explicitly in the application.

At any rate, encoding should be addressed in the GSoC project.

>> What is the purpose of the convert trait?
>
> To allow conversion between the backend's own string representation and
> the string type that is used with Boost.XML.

Ok. You should, however, make sure that the strings are converted
correctly:

   http://xmlsoft.org/html/libxml-xmlstring.html

For instance, convert::in() does not take libxml2 custom allocators into
account:

   http://xmlsoft.org/html/libxml-xmlmemory.html


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk