Subject: Re: [boost] [locale] Formal review of Boost.Locale library EXTENDED
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-04-19 05:08:16
You know what, I'm not going to tell
you what are best practices and what
are the good ways to do things.
(See appendix below)
You'll probably disagree with them,
But lets face the facts
> Ryou Ezoe Wrote:
> Insisting English knowledge is not practical.
> Most Japanese C++ programmers don't know English at all.
> Worse, it will cause serious problem.
> I bet Non-english people will use this library with non-english language.
> As a Japanese, I think this library better be rejected.
Lets appeal to facts.
I have a simple Debian machine with
various software installed.
I've checked how many applications (gettext-domains)
has translation files for different languages
(calculated number of files under /usr/share/locale/LANG/LC_MESSAGES/*.mo)
This is typical Debian system with some more
tools installed for Hebrew. As I don't know
Japanese I have never installed any Japanese specific
Now lets see, I've taken few
samples of languages:
he - 140 ; Hebrew
ar - 176 ; Arabic
zh_TW - 200 ; Chinese Traditional
zh_CN - 228 ; Chinese Simplified
ja - 255 ; Japanese
ru - 263 ; Russian
es - 287 ; Spanish
fr - 296 ; French
de - 297 ; German
So I can't say that gettext is not good for Japanese.
In fact it is not bad at all
Now then I sorted all locales by number of translated
And this is the top 50
As you can see, Japanese is in top 10 at 9th
Given that Japanese speaking population
is relatively small I can't buy it that
gettext is not good for Japanese...
Any more comments?
Copied rationale from other message for English Strings:
I'll explain it the way I explained it before,
there are many reasons to have English as core/source
language rather then other native language of
Living aside technical notes (I'll talk about them later)
I'll explain why English source strings is the best
a) Most of the software around is being only partially
translated, it is frequent that the translations
are out of sync with major development line,
Beta versions come usually with only limited
Now consider yourself beta test of a program
developed in Japan by programmers who does not
know English well and you try to open a file
and you see a message:
"File this is bad format" // Bad English
"ããã¯ä¸æ£ãªãã¡ã¤ã«å½¢å¼ã§ãã" // Good Japanese (actually
// translated with google :-)
I hope now it is more clear. Even for most
of us who do not speak English well are already
familiar to see English as international
language and would be able to handle partially
But with this "natural method" it would be
"all greek to us"
b) In many cases, especially when it comes to GNU Gettext
your customer or just a volonteer can take
a dictionary template, sit for about an hour
or two with "Poedit" and give acceptable quality
translation of medium size program.
Load it test it and send it back.
This is actually happens 95% of time on open
source world and it happens in closed source
world as well.
Reason - it is easy, accesable to everyone
and you do not have to be a programmer to do
this. You even do not have to be a professional
tanslator with a degree in Lingustics to
translate messages from English to your
That is why it is the best practice,
that is why all translation systems around use
same technique for this.
It is not rediculas, it is not strange
it is reality and actually is not so bad
Now technical reasons:
1. There is a total mess with encodings between different
It would be quite hard to relate on charset of the
localized strings in source.
For windows it would likely be one of 12XX or 9xx codepages
For Unix it would likely to be UTF-8
And it is actually impossible to make both MSVC
and GCC to see same UTF-8 encoded string
in same source
Because MSVC would want a stupid BOM from you and
all other "sane" compilers would not accept BOM
in sources at all.
2. You want to be able to conver the string to the target
string very fast if it is missed in dictionary,
so if you use wide strings and you can't use wide
Unicode strings in sources because of the problem
above you will have to do charset conversion.
When for English and ASCII it would be just byte by
You don't want to do charset conversion for every
string around in runtime.
3. All translation systems (not Gettext only) assume
ASCII keys as input.
And you do not want to create yet another
new translation file format as you will
a) Port it to other languages as projects
may want to use unified system
b) Develop nice GUI tools like Lokalize
or Poedit (that had been under development
c) Try to convinse all users around that
your reinvented wheel is better then gettext
(and it wouldn't be better)
I hope not it is clear enough.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk