From: Dale Peakall (dale_at_[hidden])
Date: 2004-04-06 08:32:13
> I have read your proposal. Maybe I'm missing something very serious,
> but I would prefer to have a similar scheme as used by stl.
> So that, there will be variants accepting char and wchar_t data types,
> and all possible unicode problems will be addressed by char_traits and
I have to agree. Programs should internally work in terms of
fixed-width character sets. When string data needs to be imported/
exported locales should be used to perform the transformation.
I would make program_options support an imbue() function that
allows a locale to be specified (otherwise use the default locale)
and template any functions that need to process strings on the
This provides much more flexibility than just supporting UTF-8.
UTF-8 is a really impractical encoding for almost any locale where the
majority of text is not ASCII like and the user may well prefer to
encode text is Shift-JIS or other encodings.
> I understand, that stl support unicode for unicode is not the best,
> but there are facilities, that can provide required functionality
> if properly extended/configured.
The support really isn't that bad. Mostly, it's a case of the
standard not mandating support for specific features (leaving it
as a QOI issue) and programmers not understanding whats required
of them in order to make things work.
It's a shame that wchar_t is only guaranteed to be 16-bit, but for
almost all real-world uses UCS-2 provides the required functionality.
Java (a "Unicode compliant") language only supports 16-bit wide
characters - really the only difference is that Java doesn't support
8-bit characters and handles all the character transformation in its
I/O library without the user having to get involved (most of the time).
There is a definite need for a decent UTF-8 code converter. I know
there is at least one is the vault. I can't answer for it's quality
as I haven't tried to use it.
The other need is for a type that is guaranteed to be at least 32-bits
and can support UCS-4 for the odd occasions that there is a need to
use characters from outside the BMP.
> I think, that there is no big reason to try to reinvent a
> wheel and provide all encompassing solution in the library like
> It should be enough if it will be unicode-enabled so it can
> be used in the any specific scenario, provided that all necessary
> facilities are on place.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk