Boost logo

Boost :

Subject: Re: [boost] [Locale] Preview of 3rd version
From: Klaim (mjklaim_at_[hidden])
Date: 2010-09-13 03:53:00


On Sun, Sep 12, 2010 at 06:57, Artyom <artyomtnk_at_[hidden]> wrote:

> >
> > A. Dictionary source :
> >
> > Currently, if I my understanding is correct, the boost::locale library
> will
> > always assume that dictionary files are on the (standard?) filesystem.
>
>
> I would be easy to fix.
>

Great!

>
> > For example OGRE (graphic rendering engine) allow loading textures and
> > models from anywhere by providing such a mechanism.
>
> Small notice, unlike textures and other game resources, the text itself
> usually very small by its nature as it is only text. And usually games
> (client side) are played in one language only so I really doubt that
> they should be treated the same way that other resources. For example,
> translations of almost all software on my Debian PC to Russian takes about
> 12MB and this is about 250 applications.
>

I agree on the principle but in practice that is really
application/game-relative. Some type of games rely on a lot of text that
have to be loaded only if really required and/or from external sources. Even
12Mo is huge in some cases. It's really a technical-budget thing.
Anyway I agree that it's not the most common case. MMOs and some extern
module based games/apps are "the exception" you could say and often require
large amount of memory to run anyway.

> >
> > B. Dictionary loading control
> >
> > This is about "when is a dictionary loaded in memory and usable without
> > having to process something first?".
> > If my understanding is correct, boost::locale will automatically load
> the
> > dictionary when needed? I guess it will load the dictionary when the
> > corresponding language/domain will be invoked?
>
> The dictionaries are loaded when locale is created. Usually locale
> generation is not cheap, so you create it only once at startup and use it.
>
> i.e. when you call
>
>
> generator gen;
> gen.add_messages_path("/usr/share/locale");
> gen.add_messages_domain("my_cool_app");
> std::locale mine = gen("ru_RU.UTF-8")
>
> all dictionaries for application my_cool_app for ru_RU locale are loaded.
>
> >
> > Anyway, some ways to manually load and unload dictionaries (or
> dictionaries
> > related to a locale?) would help controlling the application
> > performance/flow.
>
> When you destroy locale the dictionary is destroyed.

Excellent.

> > For example most games first load all "whole app life"
> > resources on startup, then will load "world-chunk-specific" resources
> each
> > time it need it and will unload those resources at some point without
> > exiting the whole app,
>
> You can bind world-chunk-specific translation strings to other domain and
> load
> it with locale,
> and then destroy the locale, or cache it or... whatever you do.
>
> If you want good performance of dictionaries loading, use UTF-8 locale and
> UTF-8
> dictionaries
> then loading it would be as fast as pointing a memory chunk to specific
> point.
>
>
Agreed, I'm already on this path. (After having read your(?) answer on
StackOverflow some months ago I banished wide strings and made all UTF-8
based)

> > The module structure of my application and memory limitations makes
> > impossible to load all modules at startup, that would be too much and I
> > don't even know how much modules will be available some time after the
> > release. Manual control over when to load/unload what is required for my
> > current "big" game.
> >
> > So some manual control on this side would be of great help.
> > Maybe some kind of strategy could be provided by the user?
>
> Just bind separate chunks to separate domains, but IMHO, just load all
> dictionaries
> for user locale at once. For example, all translation string for evolution
> (the
> biggest
> dictionary I had found) take about 500K, it is about a size of one hi-res
> texture.
>
> Soo... such stuff should be taken in proportion.
>
>
You're still assuming here that graphics are the most expensive resource in
my example but it is not. I agree with the general advice, it's just not
practical in my specific case. (in fact it's the first time I'm in a case
where it's not good to load everything first...)

>
> >
> > C. Dictionary format
> >
> > You already pointed the way to provide a custom format for dictionaries,
> so
> > this is good from my point of view.
> > A lot of companies uses simple
> > excell/csv files to manipulate localization texts, making simple to
> provide
> > texts to translate to localization companies.
>
> Ohhh dear God!
> NEVER, NEVER do such things!
>
> This is exactly the point why 99% of developers are aware of 1% of issues.
>
> Example: plural forms
>
> See:
> <
> http://cppcms.sourceforge.net/boost_locale/html/tutorial.html#6f4922f45568161a8cdf4ad2299f6d23
> >
>
>
> Translating text is much more complicated then taking one string and
> matching
> other.
>
> Best - use tools like Lokalize, poedit or others, they do much better job
> and much helpful for translators.
>
> That is exactly why when it comes to localization, you should never give
> developer
> too much freedom as it would do crappy job. Always use a library written by
> expert.
>
>
I fully agree.
I was just pointing that companies already using tools that their
not-technical not-translation-expert are used to (whatever the organisation
of an excell sheet) could use your library without having to have those
non-expert people still do their work without loosing time learning a new
tool.
Even some translation companies requires you to provide data in excell
files.

I'm in a position where I can choose whatever translation tools, I'll not
use excell files.

> - the ids have to be strings.
>
> Ok... Yes they have to.
>
> > - having the user to provide
> > custom id would help to manage tools/performance on his side
>
> NEVER, NEVER, NEVER use non-string ids for localization.
>
> Things like get_resource_string(HELLO_WORLD_CONSTANT)
> is one of the most terrible solutions in the world that lead
> to very hard development work and very bad localization results.
>
> Translation should be done from "context + string" to "string" and never
> through some kind of other ids.
>
> I'm performance freak (see CppCMS), but I still think that using string is
> fast
> enough
> and has much bigger advantage they few microseconds that you can gain.
>
> You want performance, do profiling, and I doubt if you'll even find
> translation
> of string id as bottle neck.
>
>
That depends on the use and size of strings, but you're right for most
usage. I've worked on some hardware where it was not the case but I agree
it's not common.

> > - it assume that the string id will have some context informations
> allowing
> > to know the right localization needed.
>
> Yes
>
> > It looks like a hack to me because I
> > think each unique text should have a unique id.
>
> No! Human languages unlike programming are context dependent and ambiguous,
> same string may be translated into two different strings depending on
> context.
>
> Small but clear example:
>
>
> http://cppcms.sourceforge.net/boost_locale/html/tutorial.html#1f0e3dad99908345f7439f8ffabdffc4
>
>
I was thinking about more cultural/language based example where there is not
only context that make the translation hard.
For example some expressions that exists in some languages don't exists in
others and just have equivalents that could be used in the given context.
Now as you point :

>
> > The domain string seem to be some hack to fix this case.
>
> You have a misconception - domains are actually application/module names.
>
>
So if "domain" are module names, how to differenciate two sentences that are
the same in a language with two different contextes, but are not the same in
an other language with the same different contextes? (I've seen cases like
that but I'm not an expert and I'll have too search for an example I
guess...)

>From what I understand, I would have to add additional context informations
other than module name to each text?

> For example, for Excel you would have "excel.mo" dictionary and the domain
> is
> "excel".
>
> Rationale: usually all dictionaries from many applications kept in same
> place
> i.e.
>
> /usr/share/locale/ru/LC_MESSAGES/excel.mo
> /usr/share/locale/ru/LC_MESSAGES/word.mo
>
> > would prefer some way to get a unique id from each text, provided by the
> > user. As boost::locale follow the gettext philosophy I don't see how it
> > would be possible to change this without changing the backend.
>
> You should not changes this, not for coding reason, but rather for
> linguistics
> reason and quality of language support.
>
>
Ok, I think you're the expert here so I'll follow your advice.

>
> >
> >
> > I planned to write my specific solution for my game's localisation,
> having a
> > somewhat complex user-provided-module-based structure, but if
> boost::locale
> > provide a solution for the points I've listed, then I can plug it in my
> game
> > without a problem and that will simplify a lot of things (assuming
> > performance is correct for my need). For the moment I'll keep following
> how
> > boost::locale goes until I reach the point where I need to make a final
> > decision.
>
>
> One additional point to remember:
>
> localization is not only about translating strings,
> translating strings is only one, important but small part of it.
>

I'm aware of that and text translation is my last problem on this side
(thanks to the context of the game and some other libs that make displaying
any text easier) but it's always good to remember it, thanks.

>
> Regards,
> Artyom
>
>
>
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk