Boost logo

Boost :

Subject: Re: [boost] [serialization] Add UTF-8 BOM support to xml_warchive
From: Frank Mori Hess (frank.hess_at_[hidden])
Date: 2011-04-15 17:14:03


On Friday, April 15, 2011, Tijmen van Voorthuijsen wrote:
> Recently I ran into the problem that the boost::serialization library
> could not handle XML files which contain the three UTF-8 BOM (Byte Order
> Mark) bytes.

> I propose to enhance the xml_warchive and text_warchive for reading with
> support of the BOM bytes. Example:
>
> namespace
> {
> const wchar_t g_cchUtf8Bom1 = 0xEF;
> const wchar_t g_cchUtf8Bom2 = 0xBB;
> const wchar_t g_cchUtf8Bom3 = 0xBF;
> }
>
>
> void CheckAndCorrectUtf8Bom(std::wifstream* pifs)
> {
> _ASSERT_POINTER(pifs);
>
> wchar_t chUtf8Bom1 = 0;
> wchar_t chUtf8Bom2 = 0;
> wchar_t chUtf8Bom3 = 0;
>
> chUtf8Bom1 = pifs->peek();
> if (chUtf8Bom1 == g_cchUtf8Bom1)
> {
> *pifs >> chUtf8Bom1;
> _ASSERT(chUtf8Bom1 == g_cchUtf8Bom1);
> *pifs >> chUtf8Bom2;
> _ASSERT(chUtf8Bom2 == g_cchUtf8Bom2);
> *pifs >> chUtf8Bom3;
> _ASSERT(chUtf8Bom3 == g_cchUtf8Bom3);

This logic seems wrong. Just because the first byte is 0xef doesn't mean
it's necessarily a BOM.




Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk