Boost logo

Boost :

From: Robert Ramey (ramey_at_[hidden])
Date: 2005-07-18 11:48:11


Jonathan Wakely wrote:
> On Mon, Jul 18, 2005 at 08:11:31AM -0700, Robert Ramey wrote:
>
>> Hmm, I've twiddled with the set of allowable characters from time to
>> time on sort of an ad hoc basis. For some reason it never occured
>> to me to actually try and find the difinitive source for this. So I
>> suppose there are couple
>
> Assuming you're referring to XML, it's here:
> http://www.w3.org/TR/REC-xml
>
>> of pending fine points here:
>>
>> a) the exact rules for what characters are legal in which part of
>> tag names. This might not be all that obvious given that the html
>> can be coded in wide characters then to utf-8. Also the narrow
>> character version is coded with the current locale so that's another
>> story.
>
> A character is a character, how it is encoded is irrelevent.

Thanks for the link.

That's not obvious to me - especially when one is using a locale specific
character set. Maybe XML requires that that all characters be ucs-16 (or
32) or some such thing but as a practical matter lots of people are still
using locale-specific types for strings. So its not obvious what the
implications are of including a '\0' as part of text string in and xml
archive. This is one of those things that seemed simple when I started but
ran into a lot of small "gotchas' as time when on.

>
> Re-encoding an XML file doesn't change whether it is well-formed or
> not (assuming you update any encoding specifiers in the document
> itself.)
>
> So if 'a' is allowed in an element name then the representation of 'a'
> in the document's encoding is allowed in an element name, whatever
> that encoding is.
>
> jon
>
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk