|
Boost : |
Subject: Re: [boost] [locale] Formal review of Boost.Locale library EXTENDED
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-04-19 05:08:16
Hello,
You know what, I'm not going to tell
you what are best practices and what
are the good ways to do things.
(See appendix below)
You'll probably disagree with them,
But lets face the facts
You telling
> Ryou Ezoe Wrote:
>
> Insisting English knowledge is not practical.
> Most Japanese C++ programmers don't know English at all.
> Worse, it will cause serious problem.
>
> I bet Non-english people will use this library with non-english language.
>
> translate("æ¥æ¬èª")
>
> [snip]
>
> As a Japanese, I think this library better be rejected.
>
>
Lets appeal to facts.
I have a simple Debian machine with
various software installed.
I've checked how many applications (gettext-domains)
has translation files for different languages
(calculated number of files under /usr/share/locale/LANG/LC_MESSAGES/*.mo)
This is typical Debian system with some more
tools installed for Hebrew. As I don't know
Japanese I have never installed any Japanese specific
software.
Now lets see, I've taken few
samples of languages:
he - 140 ; Hebrew
ar - 176 ; Arabic
zh_TW - 200 ; Chinese Traditional
zh_CN - 228 ; Chinese Simplified
ja - 255 ; Japanese
ru - 263 ; Russian
es - 287 ; Spanish
fr - 296 ; French
de - 297 ; German
So I can't say that gettext is not good for Japanese.
In fact it is not bad at all
Now then I sorted all locales by number of translated
domains (applications)
And this is the top 50
297 de
296 fr
287 es
285 sv
272 it
263 ru
263 pl
256 cs
255 ja
253 pt_BR
251 nl
237 da
233 tr
233 ca
228 zh_CN
223 hu
213 nb
212 sk
210 uk
208 fi
206 el
200 zh_TW
199 pt
194 gl
188 ko
187 vi
178 ro
176 ar
175 sr
173 et
169 lt
166 eu
166 bg
163 en_GB
158 sl
158 pa
150 th
140 rw
140 he
140 ga
133 sr_at_Latn
132 ms
131 mk
131 id
129 ta
127 nn
125 hr
120 dz
As you can see, Japanese is in top 10 at 9th
place.
Given that Japanese speaking population
is relatively small I can't buy it that
gettext is not good for Japanese...
Any more comments?
--------------------------------------------------
Appendix:
=========
Copied rationale from other message for English Strings:
---------------------------------------------------------
I'll explain it the way I explained it before,
there are many reasons to have English as core/source
language rather then other native language of
the developer.
Living aside technical notes (I'll talk about them later)
I'll explain why English source strings is the best
pratice.
a) Most of the software around is being only partially
translated, it is frequent that the translations
are out of sync with major development line,
Beta versions come usually with only limited
translation support.
Now consider yourself beta test of a program
developed in Japan by programmers who does not
know English well and you try to open a file
and you see a message:
"File this is bad format" // Bad English
Or see:
"ããã¯ä¸æ£ãªãã¡ã¤ã«å½¢å¼ã§ãã" // Good Japanese (actually
// translated with google :-)
Opps?!
I hope now it is more clear. Even for most
of us who do not speak English well are already
familiar to see English as international
language and would be able to handle partially
translated software.
But with this "natural method" it would be
"all greek to us"
b) In many cases, especially when it comes to GNU Gettext
your customer or just a volonteer can take
a dictionary template, sit for about an hour
or two with "Poedit" and give acceptable quality
translation of medium size program.
Load it test it and send it back.
This is actually happens 95% of time on open
source world and it happens in closed source
world as well.
Reason - it is easy, accesable to everyone
and you do not have to be a programmer to do
this. You even do not have to be a professional
tanslator with a degree in Lingustics to
translate messages from English to your
own language.
That is why it is the best practice,
that is why all translation systems around use
same technique for this.
It is not rediculas, it is not strange
it is reality and actually is not so bad
reality.
Now technical reasons:
1. There is a total mess with encodings between different
compilers.
It would be quite hard to relate on charset of the
localized strings in source.
For windows it would likely be one of 12XX or 9xx codepages
For Unix it would likely to be UTF-8
And it is actually impossible to make both MSVC
and GCC to see same UTF-8 encoded string
in same source
L"ש×××-سÙاÙ
-pease"
Because MSVC would want a stupid BOM from you and
all other "sane" compilers would not accept BOM
in sources at all.
2. You want to be able to conver the string to the target
string very fast if it is missed in dictionary,
so if you use wide strings and you can't use wide
Unicode strings in sources because of the problem
above you will have to do charset conversion.
When for English and ASCII it would be just byte by
byte casting.
You don't want to do charset conversion for every
string around in runtime.
3. All translation systems (not Gettext only) assume
ASCII keys as input.
And you do not want to create yet another
new translation file format as you will
need to:
a) Port it to other languages as projects
may want to use unified system
b) Develop nice GUI tools like Lokalize
or Poedit (that had been under development
for years)
c) Try to convinse all users around that
your reinvented wheel is better then gettext
(and it wouldn't be better)
I hope not it is clear enough.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk