Boost logo

Boost :

Subject: [boost] [serialization] Add UTF-8 BOM support to xml_warchive
From: Tijmen van Voorthuijsen (T.van.Voorthuijsen_at_[hidden])
Date: 2011-04-15 08:21:04


Recently I ran into the problem that the boost::serialization library could not handle XML files which contain the three UTF-8 BOM (Byte Order Mark) bytes. The serialization library creates XML files without the BOM bytes but when saving such files in an external Windows program, for example XML Notepad, these bytes are automatically added. Thereafter the XML file cannot be read anymore by the boost::serialization library.

According to Wikipedia the UTF-8 BOM is optional and therefore creating XML files without the BOM bytes is all right.
However, the reading should extended and be able to handle both types, files with and without the BOM.

I propose to enhance the xml_warchive and text_warchive for reading with support of the BOM bytes.

    const wchar_t g_cchUtf8Bom1 = 0xEF;
    const wchar_t g_cchUtf8Bom2 = 0xBB;
    const wchar_t g_cchUtf8Bom3 = 0xBF;

void CheckAndCorrectUtf8Bom(std::wifstream* pifs)

    wchar_t chUtf8Bom1 = 0;
    wchar_t chUtf8Bom2 = 0;
    wchar_t chUtf8Bom3 = 0;

    chUtf8Bom1 = pifs->peek();
    if (chUtf8Bom1 == g_cchUtf8Bom1)
        *pifs >> chUtf8Bom1;
        _ASSERT(chUtf8Bom1 == g_cchUtf8Bom1);
        *pifs >> chUtf8Bom2;
        _ASSERT(chUtf8Bom2 == g_cchUtf8Bom2);
        *pifs >> chUtf8Bom3;
        _ASSERT(chUtf8Bom3 == g_cchUtf8Bom3);
        // Reset to start of the stream
        pifs->seekg(0, std::ios_base::beg);

Kind regards,
Tijmen van Voorthuijsen

T. van Voorthuijsen
Senior System Engineer

Noldus Information Technology bv
Nieuwe Kanaal 5
P.O. Box 268
6700 AG Wageningen
The Netherlands

Phone: +31-(0)317-473300
Fax: +31-(0)317-424496
E-mail: T.van.Voorthuijsen_at_[hidden]<mailto:T.van.Voorthuijsen_at_[hidden]>

Boost list run by bdawes at, gregod at, cpdaniel at, john at