Boost logo

Boost :

Subject: Re: [boost] [serialization] Add UTF-8 BOM support to xml_warchive
From: Lars Viklund (zao_at_[hidden])
Date: 2011-04-15 19:07:03


On Fri, Apr 15, 2011 at 05:14:03PM -0400, Frank Mori Hess wrote:
> On Friday, April 15, 2011, Tijmen van Voorthuijsen wrote:
> > Recently I ran into the problem that the boost::serialization library
> > could not handle XML files which contain the three UTF-8 BOM (Byte Order
> > Mark) bytes.
> >
> > I propose to enhance the xml_warchive and text_warchive for reading with
> > support of the BOM bytes. Example:
> This logic seems wrong. Just because the first byte is 0xef doesn't mean
> it's necessarily a BOM.

If it's supposed to be well-formed XML, there's nothing in the mandatory
'prolog' production that can have the value 0xef as the first octet.

An XML document in UTF-16 MUST have a BOM, and MAY have a BOM in UTF-8.
Unless indicated externally (MIME, other framing), an XML processor
MUST be able to handle the precense of BOMs, and MUST be able to process
the UTF-8 and UTF-16 families of encodings.

Of course, I may have misread the specification (XML 1.0 5e), feel free
to show a well-formed counter-example.

-- 
Lars Viklund | zao_at_[hidden]

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk