Boost logo

Boost :

From: Dale Peakall (dale_at_[hidden])
Date: 2004-04-06 08:32:13


> I have read your proposal. Maybe I'm missing something very serious,
> but I would prefer to have a similar scheme as used by stl.

> So that, there will be variants accepting char and wchar_t data types,
> and all possible unicode problems will be addressed by char_traits and
> locale.

I have to agree. Programs should internally work in terms of
fixed-width character sets. When string data needs to be imported/
exported locales should be used to perform the transformation.

I would make program_options support an imbue() function that
allows a locale to be specified (otherwise use the default locale)
and template any functions that need to process strings on the
character type.

This provides much more flexibility than just supporting UTF-8.
UTF-8 is a really impractical encoding for almost any locale where the
majority of text is not ASCII like and the user may well prefer to
encode text is Shift-JIS or other encodings.

> I understand, that stl support unicode for unicode is not the best,
> but there are facilities, that can provide required functionality
> if properly extended/configured.

The support really isn't that bad. Mostly, it's a case of the
standard not mandating support for specific features (leaving it
as a QOI issue) and programmers not understanding whats required
of them in order to make things work.

It's a shame that wchar_t is only guaranteed to be 16-bit, but for
almost all real-world uses UCS-2 provides the required functionality.
Java (a "Unicode compliant") language only supports 16-bit wide
characters - really the only difference is that Java doesn't support
8-bit characters and handles all the character transformation in its
I/O library without the user having to get involved (most of the time).

There is a definite need for a decent UTF-8 code converter. I know
there is at least one is the vault. I can't answer for it's quality
as I haven't tried to use it.

The other need is for a type that is guaranteed to be at least 32-bits
and can support UCS-4 for the odd occasions that there is a need to
use characters from outside the BMP.

> I think, that there is no big reason to try to reinvent a
> wheel and provide all encompassing solution in the library like
> program_options.

> It should be enough if it will be unicode-enabled so it can
> be used in the any specific scenario, provided that all necessary
> facilities are on place.

Here, here.

        - Dale.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk