Boost logo

Boost Users :

From: Aaron W. LaFramboise (aaronrabiddog51_at_[hidden])
Date: 2004-07-02 15:42:36


John Meinel wrote:
> David Abrahams wrote:
>>Sure; by the same token we could also use utf-8 and encode your
>>Unicode in narrow strings.
>
> Actually, I was wondering why this isn't used? The "big" advantage for
> UTF-16 was that it followed the one char->one code point. But then that
> was broken with the new UNICODE spec. So why not stick with utf-8. I
> know that most Linux file systems will support utf-8 (if your terminal
> supports it, then you see the nice characters, otherwise you see really
> bad "ASCII" ones.)
>
> I know there is a gnome library with a Glib::ustring that I believe
> internally uses a utf-8 string.
>
> However, isn't utf-8 fully compatible with std::string? Provided that
> you understand some "characters" take more than one char? But that only
> matters when you are trying to interpret what the string means, which is
> done by the OS, or by something that is rendering it on the screen.

(I am not an expert.)

Unfortunately, utf8 and similar do not work correctly in C++ for many
common cases. For example, the thousands separator in a C++ is mandated
by the standard to only be a single character, but in some locales, the
utf8 sequence to represent the preferred character is more than one
character.

utf8 is great for simply storing and copying strings, but it will fail
quickly if you try to do any character-level direct manipulation on it
without outside help.

> I suppose you still have to convert whenever you call one of the
> OpenFileW commands. And probably that is what all this is about. Someone
> feels that everything should be handled in the "native" format (which on
> Win32 is some sort of wchar_t, and on other platforms is char (though a
> UTF-8 char)).
>
> My personal vote is to have the library convert to whatever internal
> representation is considered "preferred", and then have the convenience
> functions for converting to whatever the user wants. (native_file_wstring).

I agree. I think the interface should have both narrow and wide
versions, provided was normal functions without templates or other
character polymorphism. On operating systems that only use char, we can
do the same conversion that std::wcout presently does on these systems.
 On operating systems such as Win32 that have the unique ability to take
both narrow and wide operands natively, no conversion will be necessary.

I don't think this will do the wrong thing in any reasonable case.

Aaron W. LaFramboise


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net