Boost logo

Boost :

Subject: Re: [boost] [locale] Formal review of Boost.Locale library EXTENDED
From: Ryou Ezoe (boostcpp_at_[hidden])
Date: 2011-04-19 05:54:05


There are many Japanese translated softwares, because there are many
Japanese translators who contributed the free software.
The gettext works for translating English software to other languages
such as Japanese.
But you can't use gettext in Japanese software.

Using English as a primary language means it's not a Japanese software anymore.

Most Japanese cannot even write "File this is bad format".
We don't even know what "file" means.
Just like you don't know what ファイル means without using the machine translation.
You can look it up the meaning from dictionary, but it still a
completely unfamiliar symbols.
Just like you feel from ファイル.
ファイル means file.
Now you know what ファイル means.
Does that makes you write 「不正なファイル形式です」 ?
I don't think so.

How could we, the Japanese write a code?
I think we don't recognize the meaning of function name like fopen by English.
It's just a unique identifier for a function that does ファイルを開く(open the file)

Even if all programmer use English(that is impossible), It's not only
the programmer who write texts in the software.
Many texts in the software are written by non-programmers.
Do you force English to them too?
By some miracle, if you could somehow achieved that, why do we need
translation in that ideal world?
Everybody use English. No translation is needed. Isn't it?

This library can be used only by people who's language can be
expressed by using basic source character set.
Isn't it flawed considering this is a localization library?

You're saying that, in order to localize the software, you need to
abandon your language for the start.
That will never works.

On Tue, Apr 19, 2011 at 6:08 PM, Artyom <artyomtnk_at_[hidden]> wrote:
> Hello,
>
> You know what, I'm not going to tell
> you what are best practices and what
> are the good ways to do things.
>
> (See appendix below)
>
> You'll probably disagree with them,
>
> But lets face the facts
>
> You telling
>
>
>> Ryou  Ezoe Wrote:
>>
>> Insisting English knowledge is not practical.
>> Most Japanese C++ programmers  don't know English at all.
>> Worse, it will cause serious problem.
>>
>
>> I bet  Non-english people will use this library with non-english  language.
>>
>> translate("日本語")
>>
>> [snip]
>>
>> As a Japanese, I think this  library better be rejected.
>>
>>
>
>
> Lets appeal to facts.
>
> I have a simple Debian machine with
> various software installed.
>
> I've checked how many applications (gettext-domains)
> has translation files for different languages
>
> (calculated number of files under /usr/share/locale/LANG/LC_MESSAGES/*.mo)
>
> This is typical Debian system with some more
> tools installed for Hebrew. As I don't know
> Japanese I have never installed any Japanese specific
> software.
>
> Now lets see, I've taken few
> samples of languages:
>
> he    - 140 ; Hebrew
> ar    - 176 ; Arabic
> zh_TW - 200 ; Chinese Traditional
> zh_CN - 228 ; Chinese Simplified
> ja    - 255 ; Japanese
> ru    - 263 ; Russian
> es    - 287 ; Spanish
> fr    - 296 ; French
> de    - 297 ; German
>
>
> So I can't say that gettext is not good for Japanese.
>
> In fact it is not bad at all
>
> Now then I sorted all locales by number of translated
> domains (applications)
>
> And this is the top 50
>
> 297 de
> 296 fr
> 287 es
> 285 sv
> 272 it
> 263 ru
> 263 pl
> 256 cs
> 255 ja
> 253 pt_BR
> 251 nl
> 237 da
> 233 tr
> 233 ca
> 228 zh_CN
> 223 hu
> 213 nb
> 212 sk
> 210 uk
> 208 fi
> 206 el
> 200 zh_TW
> 199 pt
> 194 gl
> 188 ko
> 187 vi
> 178 ro
> 176 ar
> 175 sr
> 173 et
> 169 lt
> 166 eu
> 166 bg
> 163 en_GB
> 158 sl
> 158 pa
> 150 th
> 140 rw
> 140 he
> 140 ga
> 133 sr_at_Latn
> 132 ms
> 131 mk
> 131 id
> 129 ta
> 127 nn
> 125 hr
> 120 dz
>
>
> As you can see, Japanese is in top 10 at 9th
> place.
>
> Given that Japanese speaking population
> is relatively small I can't buy it that
> gettext is not good for Japanese...
>
> Any more comments?
>
> --------------------------------------------------
>
>
>
> Appendix:
> =========
>
> Copied rationale from other message for English Strings:
> ---------------------------------------------------------
>
> I'll explain it the way I explained it before,
>
> there are many reasons to have English as core/source
> language rather then other native language of
> the developer.
>
> Living aside technical notes (I'll talk about them later)
> I'll explain why English source strings is the best
> pratice.
>
>
> a) Most of the software around is being only partially
>   translated, it is frequent that the translations
>   are out of sync with major development line,
>
>   Beta versions come usually with only limited
>   translation support.
>
>   Now consider yourself beta test of a program
>   developed in Japan by programmers who does not
>   know English well and you try to open a file
>   and you see a message:
>
>      "File this is bad format" // Bad English
>
>   Or see:
>
>      "これは不正なファイル形式です。" // Good Japanese (actually
>                                // translated with google :-)
>
>   Opps?!
>
>   I hope now it is more clear. Even for most
>   of us who do not speak English well are already
>   familiar to see English as international
>   language and would be able to handle partially
>   translated software.
>
>   But with this "natural method" it would be
>   "all greek to us"
>
> b) In many cases, especially when it comes to GNU Gettext
>   your customer or just a volonteer can take
>   a dictionary template, sit for about an hour
>   or two with "Poedit" and give acceptable quality
>   translation of medium size program.
>
>   Load it test it and send it back.
>
>   This is actually happens 95% of time on open
>   source world and it happens in closed source
>   world as well.
>
>   Reason - it is easy, accesable to everyone
>   and you do not have to be a programmer to do
>   this. You even do not have to be a professional
>   tanslator with a degree in Lingustics to
>   translate messages from English to your
>   own language.
>
> That is why it is the best practice,
> that is why all translation systems around use
> same technique for this.
>
> It is not rediculas, it is not strange
> it is reality and actually is not so bad
> reality.
>
>
> Now technical reasons:
>
> 1. There is a total mess with encodings between different
>   compilers.
>
>   It would be quite hard to relate on charset of the
>   localized strings in source.
>
>   For windows it would likely be one of 12XX or 9xx codepages
>   For Unix it would likely to be UTF-8
>
>   And it is actually impossible to make both MSVC
>   and GCC to see same UTF-8 encoded string
>   in same source
>
>       L"שלום-سلام-pease"
>
>   Because MSVC would want a stupid BOM from you and
>   all other "sane" compilers would not accept BOM
>   in sources at all.
>
> 2. You want to be able to conver the string to the target
>   string very fast if it is missed in dictionary,
>   so if you use wide strings and you can't use wide
>   Unicode strings in sources because of the problem
>   above you will have to do charset conversion.
>
>   When for English and ASCII it would be just byte by
>   byte casting.
>
>   You don't want to do charset conversion for every
>   string around in runtime.
>
> 3. All translation systems (not Gettext only) assume
>   ASCII keys as input.
>
>   And you do not want to create yet another
>   new translation file format as you will
>   need to:
>
>   a) Port it to other languages as projects
>      may want to use unified system
>
>   b) Develop nice GUI tools like Lokalize
>      or Poedit (that had been under development
>      for years)
>
>   c) Try to convinse all users around that
>      your reinvented wheel is better then gettext
>
>      (and it wouldn't be better)
>
>
> I hope not it is clear enough.
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-- 
Ryou Ezoe

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk