Boost logo

Boost :

From: Simon Buchan (simon_at_[hidden])
Date: 2005-09-29 19:57:00


Martin Bonner wrote:
> On 9/28/05, David Abrahams <dave_at_[hidden]> wrote:
>
>>>>Hmm... Also, is the apparent dependency on ASCII encoding truly
>>>>portable?
>
>
> Caleb Epstein wrote:
>
>>>Doubtful. Wouldn't testing for std::isalnum || '-' || '_' be a
>>>better idea? Perhaps not quite as performant (once the lookup table
>>>was made static), but certainly more portable and simpler to read.
>
>
> Simon Buchan wrote:
>
>>In most implementations, the is*()'s are implemented using exactly the
>>same method.
>
>
> Yes, but the table will be different on an EBCDIC implementation than they
> are on an ASCII implementation. The point is that the specified table hard
> codes ASCII, so when somebody runs it on an IBM mainframe it will give the
> wrong answer.
>
This may be irrelavant anyway:
http://www.w3.org/TR/REC-xml/#charsets
2.2 Characters

[Definition: A parsed entity contains text, a sequence of characters,
which may represent markup or character data.] [Definition: A character
is an atomic unit of text as specified by ISO/IEC 10646:2000 [ISO/IEC
10646]. Legal characters are tab, carriage return, line feed, and the
legal characters of Unicode and ISO/IEC 10646. The versions of these
standards cited in A.1 Normative References were current at the time
this document was prepared. New characters may be added to these
standards by amendments or new editions. Consequently, XML processors
MUST accept any character in the range specified for Char. ]
Character Range
[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] |
[#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding
the surrogate blocks, FFFE, and FFFF. */

The mechanism for encoding character code points into bit patterns MAY
vary from entity to entity. All XML processors MUST accept the UTF-8 and
UTF-16 encodings of Unicode 3.1 [Unicode3]; the mechanisms for signaling
which of the two is in use, or for bringing other encodings into play,
are discussed later, in 4.3.3 Character Encoding in Entities.

The interesting bit is the last sentance. (I didn't want to take it out
of context)


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk