Boost logo

Boost :

Subject: Re: [boost] [locale] Formal review of Boost.Locale library EXTENDED
From: Edward Diener (eldiener_at_[hidden])
Date: 2011-04-19 08:37:54


On 4/19/2011 3:17 AM, Matus Chochlik wrote:
> On Tue, Apr 19, 2011 at 2:10 AM, Edward Diener<eldiener_at_[hidden]> wrote:
>> On 4/18/2011 9:53 AM, Paul A. Bristow wrote:
>>
>> My personal objection to Gnu gettext and its English bias has nothing to do
>> with any desire myself to use a language other than English in order to
>> communicate, since English ( or perhaps Americanese ) is the language of the
>> country in which I was born, but nearly everything to do with my sense of
>> the problems of translating even computer program phraseology from one
>> language to another without complicating things by having to put some other
>> language, even a very popular one, in the middle.
>>
>> Was that a single sentence ? I wonder if it can be translated to Japanese ?
>
> These are all valid points Speaking in a particular language means
> to be thinking in a certain way and many things can be lost in the translation.
> But I don't see above any solutions to the actual problem.
>
>> From how I see it there are several ways to handle this:
>
> 1) Stick to English phrases
> (-) Requires good knowledge of the English language
> (+) Easy to find someone to translate to language Y
> (+) Portable
> (+) Lots of mature l10n libraries work this way
> (+) Works (for English speakers) even if the translation fails
>
> 2) Use English identifier strings (as Peter Dimov suggested)
> (-) Still requires some English
> (-) Hard to keep unique in large applications
> (-) Doesn't look good if the translation fails for some reason
> (+) Requires "less" English
>
> 3) Use the u"" U"" literals
> (-) Current support by the compilers
> (-) Requires the u/U/... prefix
> (+) Will be portable in the future
> (+) Does not require English
>
> 4) Use wchar_t and the L"" prefix literals
> (-) Non-portable and platform dependent
> (-) Requires the L prefix
> (+) Works if you are limited to a single platform
> (+) Does not require English
>
> 5) use char with some GUID literals/hashes
> (-) Completely unusable if the translation fails
> (-) Takes a lot of using to (easier for GIT users :))
> (-) Requires a GUID/hash generator
> (+) Portable
> (+) Does not require English
>
> 6) keep in original language but transliterate to Latin characters [a-z0-9]
> (-) Requires picking a good transliteration scheme
> (-) Hard to read in the code
> (-) Pretty unusable if the translation fails
> (+) Does not require the use of English
> (+) Portable
>
>
> Take your pick :-)

My pick is to use what the language currently provides, which is
wchar_t, which can represent UTF-16, a popular Unicode variant which
also happens to be the standard for wide characters on Windows, which
just happens to be the dominant operating system in the world ( by alot
) in terms of end-users.

I am not saying this has to be done by Locale immediately because I
realize that there is no Unicode in C++03 and creating a Unicode library
is hardly easy ( Boost will soon have a Unicode library for submission).
I also realize that it is much easier to use what already exists, such
as gnu gettext, than to create one's own system from scratch or modify
another system of computer language translation. So I am much in
sympathy with the current choice of the Locale author.

What I object to is not the way that Locale currently works but that the
author of the library seems to have a closed mind about this issue. He
thinks that UTF-8 must be the standard because it is what Linux uses,
and he thinks that everyone must follow the way that gnu gettext does
things because that also comes from the Linux world about which he is
knowledgable. Even when it is pointed out to him the flaw in gnu gettext
which forces other languages to go through English to be translated, he
feels that this is correct on the basis that English is the dominant
language in computer programming, so every programmer must know it to
write computer programmers in C++.

I am an English speaker only and quite realize that English is as much
as a common language in computer programming as one can have. But I also
realize that translations that must go through English ( or any third
language ) are not only a PITA but represent linguistically a fallacy,
which is that it is relatively easy translating from one language to
another. The assumption that you can create a "correct" translation
system in computer programming which dictates not only a common language
to be used by everyone but also claims that going through that common
language is "easier" or "better" or uses less "resources" I find absurd.
Claiming that all programmers must know English to do programming, or
that translating through English is a rote job I also find absurd.

I really do not see how hard conceptually it would be that would allow a
translation system to use message catalogues which can also be Unicode (
wchar_t in C++ ) or some multi-byte encoding. This would obviously allow
people whose language encoding is not a narrow character ( like Japanese
) to translate to another language without having to go through some
intermediate 3rd language ( English in the present case ). I know many
decisions would have to be made about how to do this, and it would no
doubt mean abandoning a popular model of doing translations ( gnu
gettext ), and it would mean much programming work, but in the face of
dictating to programmers of other countries that English must be used I
think it would be worthwhile. Clearly a locale translation system which
forces all programmers using it to not only go through another language,
even as common as that language appears in the programming world, but to
deal with the linguistic translation issues involved, is a good way to
alienate many, many programmers from using your software.

Whatever one thinks of Unicode, and I myself am critical of it for my
own reasons, it is an attempt to not only bring in end-users of computer
programs who do not know English but also computer programmers
themselves who do not know English, and have them participate in the
computer world. Creating a programming system, even if it is a
translation system for a particular library, which insists that English
must be used as the intermediate "glue" is clearly going against this
idea IMO.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk