Boost logo

Boost Users :

Subject: [Boost-users] [Boost Serialization] Cannot handle UTF-8 BOM bytes in XML file
From: Tijmen van Voorthuijsen (T.van.Voorthuijsen_at_[hidden])
Date: 2011-04-12 03:42:32


Hi,

I am using boost::archive::xml_woarchive to create XML files under Windows, Visual Studio 2008, and in wide character mode. The boost::archive::xml_woarchive does not write the UTF-8 three BOM bytes to the file and from http://en.wikipedia.org/wiki/Byte_order_mark I understand that this is all right since it is optional and even not recommended.

Problems start when I want to edit the file in for example XML Notepad which adds the three BOM bytes when saving. Under Windows this seems normal behavior. Then parsing the XML file throws an exception through the boost::archive::xml_wiarchive.

My question/recommendation:

- Why can't the boost::archive::xml_serialization library not cope with the UTF-8 BOM bytes?

- I would recommend that the library can handle XML UTF-8 files, with and without the three BOM bytes. Both are in fact valid UTF-8 XML files.

I now check for the BOM bytes myself before I parse the ifstream in boost::xml_serialization and that works fine.

Many thanks for your answer.
Tijmen van Voorthuijsen



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net