Boost logo

Boost :

From: Lucas Galfaso (lgalfaso_at_[hidden])
Date: 2005-07-20 19:09:38


Just a small correction, the representation would be "�" (without the
quotes but _with_ the semicolon )

\LG

"Robert Ramey" <ramey_at_[hidden]> wrote in message
news:dbm8l9$4ud$1_at_sea.gmane.org...
> This is a great email. It illustrates why I tend to drag my feet on
> things
> like this. This is not going to be addressed right away so feel free to
> investigate and discuss it.
>
> FWIW I personally would like options 1 - use &#0 anyway - basically
> because
> it would preserve the idea that an xml_archive can do anything any other
> archive can do and doesn't ripple XML - ness back into the library or user
> programs. But even this is not so trivial. Its not clear to me whether
> it
> should apply to all non-printable character. This then raises the issue
> of
> what is non-printable in a UTF context. Then it makes me wonder what the
> "encoding" attribute in XML is for in a UTF file. This is a perfect
> example
> how something that seems simple at first glance turns in to a really time
> consuming issue.
>
> I've never warmed up to XML myself. I learned enough of the details to
> implement xml_?archive but I still never learned to like it. The only
> thing
> I've found it useful for is checking that load/save functions match. The
> xml_archive classes check that the end tag is found in the right place and
> in fact matches the start tag so any difference in the save / load
> functions
> throws an exception. So if I have an obscure problem I test using
> xml_archive.
>
> Other than the above, the only utility I can see for the xml_?archive is
> as
> some sort of bridge to the "outside world". That's why I set aside the
> original string representation - as a sequence of numbers - in favor of
> the
> current one - a text string. The mismatch between what std::string does
> and
> xml text data does is the source of the problem.
>
> I would hope that some smart person can find the sentence, in the
> paragraph,
> on the page, in the chapter of the relevant document which can deal with
> this is some sort of comforming way.
>
> Good Luck
>
> Robert Ramey
>
> Eelis van der Weegen wrote:
>> Jonathan Wakely wrote:
>>> It's not a valid entity, using it means your XML is
>>> not well-formed. It doesn't matter whether you say &#0; or &#x0;
>>> (the decimal and hexadecmial forms are exactly equivalent - but 0 is
>>> still not a validnumerical entity.)
>>
>> Yes, in XML 1.1, the null character is a special case by itself;
>> ordinary nonprintable characters can be embedded as numerical
>> character references, but the null character cannot (see the "Legal
>> Character" well-formedness constraint for production 66).
>>
>>> As long as you can read the same data back and restore the same
>>> sequence of bytes it doesn't really matter.
>>
>> I strongly agree with Robert that further processing of generated XML
>> archives by external tools is one of the main strengths of XML
>> archives and should be the main concern when evaluating our options
>> when it comes to dealing with this problem. That said, I see the
>> following options:
>>
>> 1. Use &#0; anyway.
>>
>> I've googled around a bit and found that &#0;'s being generated by
>> one tool in a toolchain and rejected by the next is a reasonably
>> common problem, so I don't really like this option.
>>
>> 2. Encode it using some escape sequence: <foo>bar\0bas</foo>
>>
>> This would introduce an extra grammar layer that software used for
>> further processing must parse.
>>
>> 3. Encode it using a dedicated element:
>> <foo>bar<serialization:null/>bas</foo>
>>
>> This seems like a reasonable way to encode null characters, but
>> wouldn't work in attribute values.
>>
>> 4. Encode strings containing null characters using binary encodings
>> such as those defined by XML Schema's data types:
>>
>> http://www.w3.org/TR/xmlschema-2/#base64Binary
>> http://www.w3.org/TR/xmlschema-2/#hexBinary
>>
>> This would require some additional flag that indicates whether a
>> string is encoded textually or binary (unless of course all strings
>> are encoded this way, but then we'd lose the human-readability of
>> strings in XML archives).
>>
>> 5. Disallow serialization of std::(w)strings that contain null
>> characters to XML archives.
>>
>> This is my personal favorite. XML's normal character data is simply
>> inherently textual and not suited to storing binary data containing
>> null characters. We shouldn't try to hack around this. Doing so would
>> only make things complicated in further external processing. If users
>> insist on storing binary fragments in their XML archives they can
>> always resort to vector<char> (by the way, the binary encodings I
>> mentioned above might be very nice for storing things like
>> vector<char> efficiently).
>>
>> Eelis
>>
>> _______________________________________________
>> Unsubscribe & other changes:
>> http://lists.boost.org/mailman/listinfo.cgi/boost
>
>
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk