Boost logo

Boost :

From: Jonathan Turkanis (technews_at_[hidden])
Date: 2005-07-20 12:50:29


Eelis van der Weegen wrote:
> Jonathan Wakely wrote:
>> It's not a valid entity, using it means your XML is
>> not well-formed. It doesn't matter whether you say � or �
>> (the decimal and hexadecmial forms are exactly equivalent - but 0 is
>> still not a validnumerical entity.)
>
> Yes, in XML 1.1, the null character is a special case by itself;
> ordinary nonprintable characters can be embedded as numerical
> character references, but the null character cannot (see the "Legal
> Character" well-formedness constraint for production 66).

> 4. Encode strings containing null characters using binary encodings
> such as those defined by XML Schema's data types:
>
> http://www.w3.org/TR/xmlschema-2/#base64Binary
> http://www.w3.org/TR/xmlschema-2/#hexBinary
>
> This would require some additional flag that indicates whether a
> string is encoded textually or binary (unless of course all strings
> are encoded this way, but then we'd lose the human-readability of
> strings in XML archives).

I like this.

> 5. Disallow serialization of std::(w)strings that contain null
> characters to XML archives.
>
> This is my personal favorite. XML's normal character data is simply
> inherently textual and not suited to storing binary data containing
> null characters. We shouldn't try to hack around this. Doing so would
> only make things complicated in further external processing. If users
> insist on storing binary fragments in their XML archives they can
> always resort to vector<char> (by the way, the binary encodings I
> mentioned above might be very nice for storing things like
> vector<char> efficiently).

The problem with this is that it's hard to remember the restriction. One of the
main advantages of basic_string over C-style strings is that they can store
arbitrary sequences, so it's natural for users to take this feature for granted.
Errors resulting from accidental embedded nulls can be very hard to track down.

> Eelis

Jonathan


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk