Boost logo

Boost :

Subject: Re: [boost] [locale] Review results for Boost.Locale library
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2011-04-27 18:11:43


On 27/04/2011 21:42, Jeremy Maitin-Shepard wrote:

> Why not simply provide a compile-time or run-time option to allow the
> user to specify the following:
>
> - encoding of narrow keys to be given as char * arguments, or specify
> that none is to be supported (in which case narrow keys cannot be used
> at all), the default being UTF-8;
>
> - whether wchar_t * arguments are supported (the encoding will be
> assumed to be UTF-16 or UTF-32 depending on sizeof(wchar_t)) [by
> default, not supported]
>
> - whether char16_t arguments are supported [by default, not supported]
>
> - whether char32_t arguments are supported [by default, not supported]
>
> The library would simply convert the UTF-8 encoded keys in the message
> catalogs to each of the supported key argument encodings. In most cases,
> there would only be a single supported encoding. Because the narrow
> version could be disabled, with Japanese text and UTF-16 wchar_t, this
> would actually _save_ space since UTF-16 is more efficient than UTF-8
> for encoding Japanese text.

Why is it so complicated?

User gives string and says what encoding it is in, the library converts
to the catalog encoding and looks it up, then returns the localized
string, converting again if needed.

Unlike what Artyom said earlier, converting a string does not
necessarily require dynamic memory allocation, and localization is not
particularly performance critical anyway.

If that runtime conversion is a concern, it's also possible to do that
at compile time, at least with C++0x (syntax is ugly in C++03).

Actually, I fail to understand what the problem is.
Is it just the MSVC BOM problem? I think it should be handled by the
build system.

> I agree that it is very unfortunate that wchar_t can mean either UTF-16
> or UTF-32 depending on the platform

How is that unfortunate? You can tell which one depending on the size of
wchar_t.

> but in practice the same source
> code containing L"" string literals can be used on both Windows and
> Linux to reliably specify Unicode string literals (provided that care is
> taken to ensure the compiler knows the source code encoding). The fact
> that UTF-32 (which Linux tends to use for wchar_t) is space-inefficient
> does in some ways make render Linux a second-class citizen if a solution
> based on wide string literals is used for portability, but using UTF-8
> on MSVC is basically just impossible, rather than merely less efficient,
> so there doesn't seem to be another option. (Assuming you are unwilling
> to rely on the Windows "ANSI" narrow encodings.)

You can always use a macro USTRING("foo") that expands to u8"foo" or
u"foo" on systems with unicode string literals and L"foo" elsewhere.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk