Boost logo

Boost :

Subject: Re: [boost] [locale] Formal review of Boost.Locale library EXTENDED
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-04-19 05:08:16


Hello, You know what, I'm not going to tell you what are best practices and what are the good ways to do things. (See appendix below) You'll probably disagree with them, But lets face the facts You telling > Ryou Ezoe Wrote: > > Insisting English knowledge is not practical. > Most Japanese C++ programmers don't know English at all. > Worse, it will cause serious problem. > > I bet Non-english people will use this library with non-english language. > > translate("日本語") > > [snip] > > As a Japanese, I think this library better be rejected. > > Lets appeal to facts. I have a simple Debian machine with various software installed. I've checked how many applications (gettext-domains) has translation files for different languages (calculated number of files under /usr/share/locale/LANG/LC_MESSAGES/*.mo) This is typical Debian system with some more tools installed for Hebrew. As I don't know Japanese I have never installed any Japanese specific software. Now lets see, I've taken few samples of languages: he - 140 ; Hebrew ar - 176 ; Arabic zh_TW - 200 ; Chinese Traditional zh_CN - 228 ; Chinese Simplified ja - 255 ; Japanese ru - 263 ; Russian es - 287 ; Spanish fr - 296 ; French de - 297 ; German So I can't say that gettext is not good for Japanese. In fact it is not bad at all Now then I sorted all locales by number of translated domains (applications) And this is the top 50 297 de 296 fr 287 es 285 sv 272 it 263 ru 263 pl 256 cs 255 ja 253 pt_BR 251 nl 237 da 233 tr 233 ca 228 zh_CN 223 hu 213 nb 212 sk 210 uk 208 fi 206 el 200 zh_TW 199 pt 194 gl 188 ko 187 vi 178 ro 176 ar 175 sr 173 et 169 lt 166 eu 166 bg 163 en_GB 158 sl 158 pa 150 th 140 rw 140 he 140 ga 133 sr_at_Latn 132 ms 131 mk 131 id 129 ta 127 nn 125 hr 120 dz As you can see, Japanese is in top 10 at 9th place. Given that Japanese speaking population is relatively small I can't buy it that gettext is not good for Japanese... Any more comments? -------------------------------------------------- Appendix: ========= Copied rationale from other message for English Strings: --------------------------------------------------------- I'll explain it the way I explained it before, there are many reasons to have English as core/source language rather then other native language of the developer. Living aside technical notes (I'll talk about them later) I'll explain why English source strings is the best pratice. a) Most of the software around is being only partially translated, it is frequent that the translations are out of sync with major development line, Beta versions come usually with only limited translation support. Now consider yourself beta test of a program developed in Japan by programmers who does not know English well and you try to open a file and you see a message: "File this is bad format" // Bad English Or see: "これは不正なファイル形式です。" // Good Japanese (actually // translated with google :-) Opps?! I hope now it is more clear. Even for most of us who do not speak English well are already familiar to see English as international language and would be able to handle partially translated software. But with this "natural method" it would be "all greek to us" b) In many cases, especially when it comes to GNU Gettext your customer or just a volonteer can take a dictionary template, sit for about an hour or two with "Poedit" and give acceptable quality translation of medium size program. Load it test it and send it back. This is actually happens 95% of time on open source world and it happens in closed source world as well. Reason - it is easy, accesable to everyone and you do not have to be a programmer to do this. You even do not have to be a professional tanslator with a degree in Lingustics to translate messages from English to your own language. That is why it is the best practice, that is why all translation systems around use same technique for this. It is not rediculas, it is not strange it is reality and actually is not so bad reality. Now technical reasons: 1. There is a total mess with encodings between different compilers. It would be quite hard to relate on charset of the localized strings in source. For windows it would likely be one of 12XX or 9xx codepages For Unix it would likely to be UTF-8 And it is actually impossible to make both MSVC and GCC to see same UTF-8 encoded string in same source L"שלום-سلام-pease" Because MSVC would want a stupid BOM from you and all other "sane" compilers would not accept BOM in sources at all. 2. You want to be able to conver the string to the target string very fast if it is missed in dictionary, so if you use wide strings and you can't use wide Unicode strings in sources because of the problem above you will have to do charset conversion. When for English and ASCII it would be just byte by byte casting. You don't want to do charset conversion for every string around in runtime. 3. All translation systems (not Gettext only) assume ASCII keys as input. And you do not want to create yet another new translation file format as you will need to: a) Port it to other languages as projects may want to use unified system b) Develop nice GUI tools like Lokalize or Poedit (that had been under development for years) c) Try to convinse all users around that your reinvented wheel is better then gettext (and it wouldn't be better) I hope not it is clear enough.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk