Boost logo

Boost :

Subject: Re: [boost] [Locale] Preview of 3rd version
From: Artyom (artyomtnk_at_[hidden])
Date: 2010-09-12 00:57:53


>
> A. Dictionary source :
>
> Currently, if I my understanding is correct, the boost::locale library will
> always assume that dictionary files are on the (standard?) filesystem.

I would be easy to fix.

> For example OGRE (graphic rendering engine) allow loading textures and
> models from anywhere by providing such a mechanism.

Small notice, unlike textures and other game resources, the text itself
usually very small by its nature as it is only text. And usually games
(client side) are played in one language only so I really doubt that
they should be treated the same way that other resources. For example,
translations of almost all software on my Debian PC to Russian takes about
12MB and this is about 250 applications.

>
> B. Dictionary loading control
>
> This is about "when is a dictionary loaded in memory and usable without
> having to process something first?".
> If my understanding is correct, boost::locale will automatically load the
> dictionary when needed? I guess it will load the dictionary when the
> corresponding language/domain will be invoked?

The dictionaries are loaded when locale is created. Usually locale
generation is not cheap, so you create it only once at startup and use it.

i.e. when you call

   generator gen;
   gen.add_messages_path("/usr/share/locale");
   gen.add_messages_domain("my_cool_app");
   std::locale mine = gen("ru_RU.UTF-8")

all dictionaries for application my_cool_app for ru_RU locale are loaded.

>
> Anyway, some ways to manually load and unload dictionaries (or dictionaries
> related to a locale?) would help controlling the application
> performance/flow.

When you destroy locale the dictionary is destroyed.

> For example most games first load all "whole app life"
> resources on startup, then will load "world-chunk-specific" resources each
> time it need it and will unload those resources at some point without
> exiting the whole app,

You can bind world-chunk-specific translation strings to other domain and load
it with locale,
and then destroy the locale, or cache it or... whatever you do.

If you want good performance of dictionaries loading, use UTF-8 locale and UTF-8
dictionaries
then loading it would be as fast as pointing a memory chunk to specific point.

> The module structure of my application and memory limitations makes
> impossible to load all modules at startup, that would be too much and I
> don't even know how much modules will be available some time after the
> release. Manual control over when to load/unload what is required for my
> current "big" game.
>
> So some manual control on this side would be of great help.
> Maybe some kind of strategy could be provided by the user?

Just bind separate chunks to separate domains, but IMHO, just load all
dictionaries
for user locale at once. For example, all translation string for evolution (the
biggest
dictionary I had found) take about 500K, it is about a size of one hi-res
texture.

Soo... such stuff should be taken in proportion.

>
> C. Dictionary format
>
> You already pointed the way to provide a custom format for dictionaries, so
> this is good from my point of view.
> A lot of companies uses simple
> excell/csv files to manipulate localization texts, making simple to provide
> texts to translate to localization companies.

Ohhh dear God!
NEVER, NEVER do such things!

This is exactly the point why 99% of developers are aware of 1% of issues.

Example: plural forms

See:
<http://cppcms.sourceforge.net/boost_locale/html/tutorial.html#6f4922f45568161a8cdf4ad2299f6d23>

Translating text is much more complicated then taking one string and matching
other.

Best - use tools like Lokalize, poedit or others, they do much better job
and much helpful for translators.

That is exactly why when it comes to localization, you should never give
developer
too much freedom as it would do crappy job. Always use a library written by
expert.

> - the ids have to be strings.

Ok... Yes they have to.

> - having the user to provide
> custom id would help to manage tools/performance on his side

NEVER, NEVER, NEVER use non-string ids for localization.

Things like get_resource_string(HELLO_WORLD_CONSTANT)
is one of the most terrible solutions in the world that lead
to very hard development work and very bad localization results.

Translation should be done from "context + string" to "string" and never
through some kind of other ids.

I'm performance freak (see CppCMS), but I still think that using string is fast
enough
and has much bigger advantage they few microseconds that you can gain.

You want performance, do profiling, and I doubt if you'll even find translation
of string id as bottle neck.

> - it assume that the string id will have some context informations allowing
> to know the right localization needed.

Yes

> It looks like a hack to me because I
> think each unique text should have a unique id.

No! Human languages unlike programming are context dependent and ambiguous,
same string may be translated into two different strings depending on context.

Small but clear example:

http://cppcms.sourceforge.net/boost_locale/html/tutorial.html#1f0e3dad99908345f7439f8ffabdffc4

> The domain string seem to be some hack to fix this case.

You have a misconception - domains are actually application/module names.

For example, for Excel you would have "excel.mo" dictionary and the domain is
"excel".

Rationale: usually all dictionaries from many applications kept in same place
i.e.

  /usr/share/locale/ru/LC_MESSAGES/excel.mo
  /usr/share/locale/ru/LC_MESSAGES/word.mo
    
> would prefer some way to get a unique id from each text, provided by the
> user. As boost::locale follow the gettext philosophy I don't see how it
> would be possible to change this without changing the backend.

You should not changes this, not for coding reason, but rather for linguistics
reason and quality of language support.

>
>
> I planned to write my specific solution for my game's localisation, having a
> somewhat complex user-provided-module-based structure, but if boost::locale
> provide a solution for the points I've listed, then I can plug it in my game
> without a problem and that will simplify a lot of things (assuming
> performance is correct for my need). For the moment I'll keep following how
> boost::locale goes until I reach the point where I need to make a final
> decision.

One additional point to remember:

  localization is not only about translating strings,
  translating strings is only one, important but small part of it.

Regards,
  Artyom

      


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk