Boost logo

Boost :

Subject: Re: [boost] Boost.Locale and the standard "message" facet
From: Vicente BOTET (vicente.botet_at_[hidden])
Date: 2011-05-02 18:05:15


> Message du 02/05/11 16:02
> De : "Artyom"
> A : boost_at_[hidden]
> Copie à :
> Objet : Re: [boost] Boost.Locale and the standard "message" facet
>
> > > This is std::message::get function:
> > >
> > > string_type get (catalog cat, int set, int msgid,
> > > const string_type&dfault) const;
> > >
> > >
> > > cat - is the "domain" in Boost.Locale
> > > set - is can be used as context but it is an integer
> > > and not some user friendly id - bad for localization
> > > msgid - is the identification of the specific message
> > > but still integer bad for localization
> > >
> > > dfault - is the default returned string it is not found
> > > and it can be used as an alternative to msgid.
> > >
> > >
> > > Now:
> > >
> > > - if you want textual context you can't
> >
> > Well, you can always use a map of textual context that give you the integer,
> >isn't it?
> >
>
> How would you map it? Where would you keep it? How would you
> convert it?

You can define this map in a centralized way initialized staticaly.

> > > - if you want to get plural form you can't.
> >
> > Why? The fact the interface doesn't manage explicitly with plurals doesn't
> >mean you can not get them.
> >
>
> The interface must receive an integer for number as
> parameter as you need several forms.

You can use several integers for the translation of plurals.

> > > It uses in input parameter of actual number to identify one
> > >
> > > When you call
> > >
> > > format(translate("File was opened {1} day ago",
> > > "File was opened {1} days ago",
> > > no_of_files)
> > > % no_of_files
> > >
> > > Which is basically, in Hebrew for example:
> > >
> > >
> > > translate("File was opened {1} day ago",
> > > "File was opened {1} days ago",
> > > no_of_files)
> > > when no_of_files == 1 returns "Kovetz niftah lifney yom {1}"
> > > when no_of_files == 2 returns "Kovetz niftah lifney yomaim"
> > > when no_of_files <1 or >2 returns "Kovetz niftah lifney {1} yamim"
> > >
> > > And then format formats it with no_of_files.
> > >
> > > If the string is not in the dictionary then for no_of_files==1
> > > it returns "File was opened {1} day ago" and for no_of_files==2 it
> > > returns "File was opened {1} days ago"
> >
> >
> > Sorry, but I don't understand how this works,
> > to which string are you referring to on
> > "If the string is not in ...?. Could you
> > show the catalog associated to this
> > translation in English and in Hebrew?
> >
>
> If "File was opened {1} day ago" is not in dictionary that
> it would be used as no Hebrew alternative provided, also
> it would have 2 plural forms (as English) instead of
> 3 (in Hebrew).

I insists, could you show the catalog associated to this
translation in English and in Hebrew? I'm sure I'm missing something and I don't reach to see what.

> > > > How your library manage plurals for message that have several parameters?
> >For
> >
> > > >example
> > > >
> > > > translate("%1 hours, %2 minutes, %3 seconds") % h % m % s
> > > >
> > >
> > > You do it in different way
> > >
> > > format(translate("Format date with H-M-S","{1}, {2}, {3}"))
> > > % format(translate("Format date with H-M-S","{1} hour","{1} hours"))
> > > % format(translate("Format date with H-M-S","{1} minute","{1} minutes"))
> > > % format(translate("Format date with H-M-S","{1} second","{1} seconds"))
> >
> > As a programmer, I would like a library that let me write just
> >
> > translate("%1 hours, %2 minutes, %3 seconds") % h % m % s
> >
> > As a translator, I would need to translate more than one string of course.
> >
>
> For Slavic language it would be 4^3 = 64 strings. Not good.

You are right if the translate function uses just one translation. What I was trying to get is that the translate function with 3 arguments behaves like yours

format(translate("Format date with H-M-S","{1}, {2}, {3}"))
% format(translate("Format date with H-M-S","{1} hour","{1} hours"))
% format(translate("Format date with H-M-S","{1} minute","{1} minutes"))
% format(translate("Format date with H-M-S","{1} second","{1} seconds"))

The single problem I see which character use to split the string. Maybe % could be used

translate("%1 hours%,% %2 minutes%,% %3 seconds") % h % m % s

> > > In any case it is impossible to use it in real life.
> >
> > I guess some people is using it now.
> >
>
> Show me one program that uses them? At least
> programs that work with MSVC does not as it is
> not implemented there...
>
> > >
> > > "You are going to connect to the untrusted web site {1} "
> > > "its original is unknown and you may be a victim of a scam"
> >
> > I don't think it is good to include such messages in the code :(.
> > This belongs to the translation part.
> >
>
> Is it? Ask developers whether they prefer to write
> the clear text inline in the context of the software
> or have a separate unreadable key to something else.
>
> > > So how would you put it into the code?
> > >
> > > MyMessage::UntrustedWarning?
> > >
> > > And if you have something slightly different like
> > > the encryption is too weak then programmers would write
> > >
> > > MyMessage::UntrustedWarning2?
> > >
> > > Beleive me this is what happens in real life..
> >
> > I guess the programmer is able to find more appropriated symbolic names, don't
> >you?
> >
>
> How how many really meaningful identifier names have you
> seen in production code?
>
> I'm not talking about a theory, I'm talking
> about real programmers.

I didn't know that my team and I were not real programmers ;-)

> > > It is about maintainability and linguistics.
> >
> > As far as I remember we didn't have maintainability issues.
> >
>
> But having separate files for messages without their
> context (source files) and separate code without
> clear messages.

IMO, this is relation is part of a specification document.

> It is bad and unmaintainable. It is doable but
> it should never be done.
>
> > >
> > > It is very important to have powerful translation
> > > tools that would allow you to merge translations
> > > work on them with built in spell checker and
> > > so on.
> > >
> > > You do not work on translations today with a simple
> > > text editor.
> >
> > As I said before, I was working with some years ago,
> > and we didn't need so much tools.
> >
>
> Yes, it is possible to work without tools... With
> gettext as well.
>
> The question how is it better to work and what
> is the way to do it.
>
> I wonder if you have ever worked with tools
> like PO-Edit or Lokalize on real messages and
> have seen how convenient it is.

No never, and I don't know them. But I can tell you that I've worked with real messages.

> > > > I've not take a look at your implementation yet
> > > > Please could you tell me when the translation file is read?
> > > > Is the file parsed only once and the translations stored on a cache?
> > > >
> > >
> > > The dictionary parsed and loaded during generation of the locale
> > > then it is stored in the memory and not changed till
> > > the std::locale object is destroyed.
> >

> > > > > 2. Support of pural forms
> > > >
> > > > Plural forms can be designed on top of the message facet?
> > > >
> > >
> > > No, New message facet required
> >
> > You have added one, isn't it? If I'm not wrong gettext doesn't
> > take care of plurals, and you have added something on top of.
> >
>
> It does.
>
> See: http://linux.die.net/man/3/ngettext
>
> It could be done without breaking binary
> messages format but it does not mean
> that it is not implemented by gettext.

Thanks for the pointer.

> > > > > 4. Using natural language identifiers as keys
> >
> > I have some use cases needing a more compact format.
> >
>
> If you really want make your case "msg1234"...

I agree.

> > I'm not saying the standard can not be improved,
> > but I think it would be better to build on top of it,
> > instead of providing two interfaces that use incompatible catalogs.
> > Making internationalizable applications that use C++ internationalizable
> > libraries using different catalogs would be a complex for the translator.
> >
>
> Really?

Well maybe not too much, but you will need to use different tools, ....

> The C++0x had deprecated std::auto_ptr that everybody
> uses and had given std::unique_ptr.
>
> You are suggesting to enforce bad design to
> good facet just because it exists and nobody
> uses it?
>
> I disagree. This std::messages facet should be
> deprecated or even removed.

No. I'm just telling that if you have valid arguments it will be better to deprecate one and add one that is better. But having two catalogs is not good.
For example if I want to make Chrrno internationalizable I can just use Std facet message until there is a better facet.

> > I also think that if you find the message facet
> > is not usable in real life, you should make a
> > standard proposal to improve it (Why not for TR2?).
>
> And I would suggest to deprecate std::message
> facet along with many other broken facets.
>
> > I'm sure you will have a lot of constructive feedback from
> > some experts.
>
> Current std::locale badly mimics POSIX/C locales
> infrastructure and it was good at that point but
> yet had included too many flaws from it
> and introduced even more flaws.
>
> In order to make useful TR2 proposal
> you should do some groundbreaking and
> do things like:
>
> 1. Standardize locale names
> 2. Standardize messages catalogs formats
> 3. Rewrite some of existing facets
> completely
> 4. Deprecate some of the facets and functions.
>
> The 3 and 4 are quite easy to do however the 1st
> and the 2nd would be very hard if possible at all.

So, are you saying that we can not have a other than implementation defined standard for localization?

> Even the C++03/C++11 that fully mimics and copies
> POSIX message catalogs: catgets, catopen, catclose
> hadn't defined anything useful about them or
> referred to POSIX standards.
>
> So... Yes, I'd like to see such things in TR2
> but believe me message catalogs facet is the
> easiest things to rewrite, while the
> real localization problem lays far beyond them.

Well, having better facets could be one step ahead.

> This what really concerns me in the standardization
> of localization facilities.

I really suggest you to participate on the standardization of a better locale library proposal, at the end this is also one of the goals of Boost.

Best,
Vicente


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk