Boost logo

Boost :

From: Stefan Seefeld (seefeld_at_[hidden])
Date: 2007-02-27 16:27:23


Péter Szilágyi wrote:

> Actually, it is what *you* put into it. Compiler decides what the size
>> of wchar_t should be. As long as your code points fit into that size, you
>> will be fine. For example you can store UTF-16 characters in 4-byte
>> wchar_t.
>
>
> Well that's true, but wouldn't that be a waste? The other problem is that in
> UTF-32 every single "character" is an actual separate entity. In UTF-16 and
> UTF-8 espcially, entities are made up of multiple "characters", so you would
> need to "decode" them to their 32bit representation in order to use them
> correctly. (Actually, doing it this way would lead to quite a flexible
> lib... only the reader and the writer must be aware of the conversions and
> internally a wstring will suffice...)

I think you are missing the point. It's not an argument for any particular
encoding. Rather, the point is that there is no pre-defined mapping between
Unicode (or other) encoding and any C++ character type.

>> (I'd dare to say that those who propose to re-implement everything inside
>> boost
>>> either suffer the NotInventedHere syndrome, don't have a good
>> understanding of
>>> what XML is, or grossly underestimate the required work, not only to
>> implement
>>> it, but also to make it reasonably efficient.)
>> I'd second that. One middle-ground option would be to include a
>> small XML parser
>>
>
> How much functionality do you mean by "small XML parser"?

That's a good question. Also, it would still be a parser only, as opposed
to any in-memory representation (tree ?) with assorted APIs. Such a parser
may be sufficient if all you have in mind is an XMLReader-like API, but
it surely isn't if what you want is a DOM, with XPath-based lookup, incremental
validation, etc., etc.

Regards,
                Stefan

-- 
      ...ich hab' noch einen Koffer in Berlin...

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk