Boost logo

Boost :

Subject: Re: [boost] Standard c++ XML parser API (Boost.XML)
From: Stefan Seefeld (stefan_at_[hidden])
Date: 2014-03-20 07:55:36


On 03/20/2014 04:34 AM, Bjorn Reese wrote:
> On 03/18/2014 04:46 PM, Stefan Seefeld wrote:
>
>> I don't see any reason why such an XML API wouldn't be usable by other
>> Boost libraries.
>
> It should be part of the GSoC project to verify this for the most
> common use cases (XML serialization is the most obvious one.)

I don't entirely understand your point. The goal is to define an XML
API, and implement it, which complies with all related standards. As
long as the existing Boost components (e.g. Boost.Serialization) work
with standard XML tools, we should be compatible.
I don't think, however, that we should be constrained to be
API-compatible with existing tools, as otherwise the whole exercise to
define a new API would be pointless. On the other hand, making minor
adjustments to those libraries to work with Boost.XML would be fine. I
just don't think we should make this part of the proposal, as it isn't
even clear what existing Boost components would be affected, whether
they are actively maintained / developed, etc.

>
>>> What is the purpose of the S template argument?
>>
>> To keep the concern for unicode or any other string type orthogonal from
>> the XML library, i.e. to allow Boost.XML to interact with different
>> Unicode implementations. In fact, in the existing demos I'm restricting
>> content to ASCII, so I can in fact get away with using std::string, so
>> this is a good example of the "modularity" design goal I mentioned
>> above: Don't force anything on users they don't actually need.
>
> I agree with the goal, but I am not sure that the S type solves the
> problem. I must admit that I am having difficulty understanding exactly
> how you envision it should work for other encodings, because std::string
> is orthogonal to encoding (locale is usually attached to the I/O
> stream.)

You are right, encoding and string type are (mostly) orthogonal. I have
never said anything else. :-)

>
> What encoding is used for std::string? ASCII, UTF-8, or "whatever the
> XML library gives me"? This should be documented as part of the API
> regardless of the answer.

Yes.

>
> Should I define a new string type if I want to use Latin-1 or another
> encoding in my application? What if the rest of my application uses
> std::string for Latin-1 encodings? (I am wondering how will work with
> the current convert trait specialization for std::string.)
>
> How does the convert trait know the XML document encoding so that it
> is able to convert between this and the application encoding?
>
> I suggest that you adopt the libxml2 design decision to always use
> UTF-8 for std::string (and UTF-16 for std::wstring if needed.) See
> the design rationale here:
>
> http://xmlsoft.org/encoding.html
>
> Any backend that does not provide UTF-8 will have to be wrapped.
>
> With such a design decision, the S template parameter becomes
> superfluous (or should be changed to CharT if you wish to support
> both std::string and std::wstring.)
>
> Conversion between UTF-8 and application encodings would have to
> be done explicitly in the application.
>
> At any rate, encoding should be addressed in the GSoC project.

I agree, and this is in fact part of the proposal. To be specific, one
of the first steps is to add tests that instantiate the XML classes with
existing unicode string classes (such as glib::ustring or Qt's QString),
and demonstrate how to use them.

>
>>> What is the purpose of the convert trait?
>>
>> To allow conversion between the backend's own string representation and
>> the string type that is used with Boost.XML.
>
> Ok. You should, however, make sure that the strings are converted
> correctly:
>
> http://xmlsoft.org/html/libxml-xmlstring.html
>
> For instance, convert::in() does not take libxml2 custom allocators into
> account:
>
> http://xmlsoft.org/html/libxml-xmlmemory.html

Good point. As I said, the existing Boost.XML was meant to be a
proof-of-concept.

Thanks for your feedback,

        Stefan

-- 
      ...ich hab' noch einen Koffer in Berlin...

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk