Boost logo

Boost Users :

Subject: Re: [Boost-users] [Boost Serialization] Cannot handle UTF-8 BOM bytesin XML file
From: Robert Ramey (ramey_at_[hidden])
Date: 2011-04-13 01:44:46


The serialization library uses a code_convert facet to generate utf-8 from wchar_t.

I don't know about the BOM bytes. Sounds like this would require
an enhanement to the xml_warchive and/or text_warchive implementation.
Feel free to submit a suggested patch to the track system.

Robert Ramey

Tijmen van Voorthuijsen wrote:
> Hi,
>
> I am using boost::archive::xml_woarchive to create XML files under
> Windows, Visual Studio 2008, and in wide character mode. The
> boost::archive::xml_woarchive does not write the UTF-8 three BOM
> bytes to the file and from
> http://en.wikipedia.org/wiki/Byte_order_mark I understand that this
> is all right since it is optional and even not recommended.
>
> Problems start when I want to edit the file in for example XML
> Notepad which adds the three BOM bytes when saving. Under Windows
> this seems normal behavior. Then parsing the XML file throws an
> exception through the boost::archive::xml_wiarchive.
>
> My question/recommendation:
>
> - Why can't the boost::archive::xml_serialization library
> not cope with the UTF-8 BOM bytes?
> - I would recommend that the library can handle XML UTF-8
> files, with and without the three BOM bytes. Both are in fact valid
> UTF-8 XML files.
>
> I now check for the BOM bytes myself before I parse the ifstream in
> boost::xml_serialization and that works fine.
>
> Many thanks for your answer.
> Tijmen van Voorthuijsen
>
>
>
>
>
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net