Boost logo

Boost :

Subject: Re: [boost] [locale] Review results for Boost.Locale library
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2011-04-28 03:34:25


On 28/04/2011 00:32, Jeremy Maitin-Shepard wrote:

>> User gives string and says what encoding it is in, the library converts
>> to the catalog encoding and looks it up, then returns the localized
>> string, converting again if needed.
>>
>> Unlike what Artyom said earlier, converting a string does not
>> necessarily require dynamic memory allocation, and localization is not
>> particularly performance critical anyway.
>
> It may often not be performance critical. In some cases, it might be
> though. Consider the case of a web server, where the work done by the
> web server machines themselves may essentially just consist of pasting
> together strings from various sources. (There is possibly a separate
> database server, etc.) This is also precisely the use case for which
> Artyom designed the library, I think. In this setting it is fairly clear
> why converting the messages once when loaded is better than doing it
> when needed.

Converting between encodings without memory allocation could be even
cheaper than concatenating strings.

>> If that runtime conversion is a concern, it's also possible to do that
>> at compile time, at least with C++0x (syntax is ugly in C++03).
>
> Maybe it can be done, but I don't think it is a viable possibility.

It could work if you only need it for short strings and you can spend
time at compile time to do that conversion.

> It is unfortunate simply because it is not uniform, even though it is
> possible to work around that, and furthermore, it is unfortunate because
> UTF-32 is generally not wanted.

It is uniform since it's always Unicode (except on some platforms that
very few people care about).

>>> but in practice the same source
>>> code containing L"" string literals can be used on both Windows and
>>> Linux to reliably specify Unicode string literals (provided that care is
>>> taken to ensure the compiler knows the source code encoding). The fact
>>> that UTF-32 (which Linux tends to use for wchar_t) is space-inefficient
>>> does in some ways make render Linux a second-class citizen if a solution
>>> based on wide string literals is used for portability, but using UTF-8
>>> on MSVC is basically just impossible, rather than merely less efficient,
>>> so there doesn't seem to be another option. (Assuming you are unwilling
>>> to rely on the Windows "ANSI" narrow encodings.)
>>
>> You can always use a macro USTRING("foo") that expands to u8"foo" or
>> u"foo" on systems with unicode string literals and L"foo" elsewhere.
>
> You can, but it adds complexity, etc...

How so? It solves exactly the problem you explained, i.e. avoid wasting
memory with UTF-32 when you can.
If USTRING is too long, you can just use _U or something like that.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk