|
Boost : |
From: Erik Wien (wien_at_[hidden])
Date: 2004-10-19 11:29:01
----- Original Message -----
From: "Edward Diener" <eddielee_at_[hidden]>
> A few points you probably already know:
>
> 1) Wide characters and Unicode characters are not necessarily the same
> thing
> for any given implementation.
> 2) There are quite a few Unicode encodings.
Yes I know. Thanks for the heads up though! ;)
> 3) The idea is to be able to plug in a Unicode encoding into the same
> standard library templates and boost templates which now support 'char'
> and
> wchar_t'. In other words ideally you want to treat your Unicode encoding
> as
> just another character type, with extra smarts depending on the encoding.
> The extra smarts would be used in specializations.
>
Agreed. That is one of the main design goals for a potential library in my
opinion. I have recently created a little test library for simple unicode
strings that provides iterators that can be used with the different
algorithms in boost and std. I would probably base some parts of a new
library on that implementation. I will post a new message with more
information about this later.
> In the past in comp.std.c++ I attempted to promote the idea that all
> standard library functionality which dealt generally in characters and
> strings should be parameterized on the character type for the sake of
> orthogonality and the future. While most are, there is still some
> functionality which does not, ie exceptions and file names and locale
> message files, and assume that only narrow characters exist in its usage.
> I
> am still amazed that programmers from countries which would normally use
> wide characters as Unicode encodings, such as the Japanese, have not made
> more of an issue with this, but perhaps they are so used to their far more
> difficult DBCS roots that pursuing wide characters everywhere, much less a
> real Unicode encoding, is a minor issue with them.
>
I completely agree. There are a few areas of the standard that makes a lot
of assumptions about how characters and strings are represented, and many of
these assumptions are not necceseraly true when it comes to unicode. How to
match a potential library with the standard is therefore an important issue
in the development, and one I hope to devote some time to resolve (Or at
least knowingly ignore! ;) ) if I move forward with the project.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk