Boost logo

Boost :

Subject: Re: [boost] Boost.Locale and the standard "message" facet
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-05-02 05:23:25


> From: Vicente BOTET <vicente.botet_at_[hidden]> > > Hi, > > > Message du 30/04/11 15:48 > > De : "Artyom" > > A : boost_at_[hidden] > > Copie à : > > Objet : Re: [boost] Boost.Locale and the standard "message" facet > > > > > Subject: [boost] Boost.Locale and the standard "message" facet > > > > > > Hi, > > > > > > I was wondering how Boost.Locale is related > > > to the standard message facet which is used > > > to translate messages. > > > > > > The standard message catalogs allow to extract > > messages by integer identifiers but may use string > > identifiers and it is implementation defined > > > > It is undefined how to load message facets > > Well implementation defined doesn't mean undefined. > But it makes it useless as each compiler can do anything it wants. > > or format them and so on. > > The formatting can always be done on top of this facet, isn't it? > By formatting I mean the entire infrastructure of catalog formats, binary formats, message extracting software, user friendly translation tools like po-edit and so on. > > It does not support plural forms and context. > > I'm not an expert, but doesn't catalogs and the set > parameter can be used for your domain an context? > This is std::message::get function: string_type get (catalog cat, int set, int msgid, const string_type&dfault) const; cat - is the "domain" in Boost.Locale set - is can be used as context but it is an integer and not some user friendly id - bad for localization msgid - is the identification of the specific message but still integer bad for localization dfault - is the default returned string it is not found and it can be used as an alternative to msgid. Now: - if you want textual context you can't - if you want to get plural form you can't. So basically it is too weak and limited. > Respect to plural forms, how Boost.Locale manages with locales > that have 3 or 4 forms of plurals (If I understood your private > mail, Hebrew is an example of this)? I've the impression that > Boost.Locale manage simple plurals but not all kind of > plurals? It handles all kinds of purals. > BTW, what is the criteria in Boost.Locale to > identify a plural form? > It uses in input parameter of actual number to identify one When you call format(translate("File was opened {1} day ago", "File was opened {1} days ago", no_of_files) % no_of_files Which is basically, in Hebrew for example: translate("File was opened {1} day ago", "File was opened {1} days ago", no_of_files) when no_of_files == 1 returns "Kovetz niftah lifney yom {1}" when no_of_files == 2 returns "Kovetz niftah lifney yomaim" when no_of_files <1 or >2 returns "Kovetz niftah lifney {1} yamim" And then format formats it with no_of_files. If the string is not in the dictionary then for no_of_files==1 it returns "File was opened {1} day ago" and for no_of_files==2 it returns "File was opened {1} days ago" > How your library manage plurals for message that have several parameters? For >example > > translate("%1 hours, %2 minutes, %3 seconds") % h % m % s > You do it in different way format(translate("Format date with H-M-S","{1}, {2}, {3}")) % format(translate("Format date with H-M-S","{1} hour","{1} hours")) % format(translate("Format date with H-M-S","{1} minute","{1} minutes")) % format(translate("Format date with H-M-S","{1} second","{1} seconds")) Basically you provide good context "Format date with H-M-S" and a basic pattern for formatting "{1}, {2}, {3}" which translator can alter then you translate with same context three subpatterns each with its own plural form, > > It is the most unless facet around. > > What do you mean? That the scope is reduced? > It should be useless not unless. In any case it is impossible to use it in real life. > > > Note that the facet > > > interface work with integer identifiers > > > avoiding all the issues raised by the > > > get_text/translate functions > > > provided by Boost.Locale. > > > > > > > Use of integer identifiers is the best > > way to screw the localization in the software. > > > > What does 3456 means? Do you really think > > it is good to write translate(MY_MESSAGE_OPENING_FILE) > > Why not? > Note that we can also write translate(MyMessage::OpeningFile) if this seems >clearer to you > Same problems as in real file you need quite a complicated strings and expressions. Many messages are not just "Open a file" but rather: "You are going to connect to the untrusted web site {1} " "its original is unknown and you may be a victim of a scam" So how would you put it into the code? MyMessage::UntrustedWarning? And if you have something slightly different like the encryption is too weak then programmers would write MyMessage::UntrustedWarning2? Beleive me this is what happens in real life.. See notes below about rules of thumb. > > > No, never - never - never - never - never > > use such "constant" or "integer" identifiers. > > I understand that some as you can find that the use of > integers scale worst than the use of strings as we need > to maintain constants unique on a given context, > but this can be checked on debug mode. > It is about maintainability and linguistics. > Some time ago when I had to write a localization application, > we used integer identifier to log internationalizable messages. > This reduced drastically the size of the log. I guess only > this advantage was enough to take integers instead of strings, > for us of course. > If you really want short identifiers for specific cases, you for example can write things like log() << "EINVAL" or log() << "MSG::Inval" > An advantage I see is that you need to concentrate all the message id of a > set/context in a single file so no need to have tools > that parse your code to get the strings to translate. > You need so many tools that the tool that extracts the strings from sources code is the minor one. It is very important to have powerful translation tools that would allow you to merge translations work on them with built in spell checker and so on. You do not work on translations today with a simple text editor. > BTW, how do you recover message translation > from a domain with your underlying implementation, > copy/paste or is there a possibility to have specific > files for specific domains, or there is a single > translation file by locale? > It depends on your design. If you for example have a single program just use a single domain named after your program, but if you have for example some independent component it may be used in its own domain. In any case each dictionary is per locale (or language) and per domain. > I've not take a look at your implementation yet > Please could you tell me when the translation file is read? > Is the file parsed only once and the translations stored on a cache? > The dictionary parsed and loaded during generation of the locale then it is stored in the memory and not changed till the std::locale object is destroyed. > > Always use natural text. > > I think your natural language interface lets the user > to have the impression that anything is possible, > when a lot of limitations seem to be there as other > have already commented in this ML. The natural language interface is the most powerful. > > > > Is there any reason Boost.Locale > > > could not follow the standard design? > > > > The standard message catalogs to weak > > I would like to here the rationale of the standard > message design by someone that is aware of it > (some pointers will be welcome also). > > > > > What are the advantages of the Boost.Locale design? > > > > > > > 1. Defined way to load and format catalogs > > You could provide a defined way on top of this facet, isn't it? > > > 2. Support of pural forms > > Plural forms can be designed on top of the message facet? > No, New message facet required > > 3. Support of message-context > > As I said, I suspect that the set parameter is interpreted as your context. No see notes above. > > > 4. Using natural language identifiers as keys > > See below. This is not always an advantage, > and as far as I see adds some constraints > on the interface so the tool can take care > to automatically extract the strings to translate. > You are looking on the problem from a pure software engineer point of view, however when it comes to UI and Localization there are two important rules of thumb: 1. Provide as much information as possible to make life on the translator as easier as possible, for example: a) Context Instead of MyMessage::FileOpen Provide: "File Opening Dialog", "Open" b) The unit gettext provides an option to extract nearby comments from the source so in the code // We open a file in CSV format with // prices of the items. AddMenuItem(translate("File Opening Dialog", "Open")) And the translator will see all the text! 2. Assume as few as possible - make as generic interface For example you have a dialogs. How is this row? <Good> <Bad> How is this color? <Good> <Bad> So you can translate: translate("How is this row?"); translate("How is this color"); translate("Good"); translate("Bad"); What is the problem with this? Think about a minute before you look behind... Gender, in some languages row and color have different gender so Good and Bad should have different forms according to gender. So you need to write translate("How is this row?"); translate("About row","Good"); translate("About row","Bad"); translate("How is this color"); translate("About color","Good"); translate("About color","Bad"); This is what I'm talking about assumptions. Making integer identifiers would detach the strings from the context even more and it is bad. So yes, you can use integers but it is VERY-BAD design. > > 5. Convenense interface > What do you mean? > Could you compare both? > Yes // get generation stage std::message::catalog domain_id = std::use_facet<std::messages<char> >(l).open("domain",l); // in use AddMenuItem(std::use_facet<std::messages<char> >(std::locale()).get(domain_id,0,0,"Open File")) // at close std::use_facet<std::messages<char> >(l).close(domain_id); Or boost::locale::generator gen; gen.add_messages_domain("domain"); std::locale::global(gen("")); // in use AddMenuItem(translate("Open File")); // destroyed with the locale Standard message catalog requires you to store somewhere the catalog variable while the boost.Locale messages facet has some default and allows to use a string based key for domain. > > More? > > Well, I think that it would be great if you can add a complete > comparison of the interfaces and a rationale why you think your > design is superior on the documentation. > Too many flaws, too many problems... If so I should write about 10-20 pages on flaws of all facets around I have some small summary of problems but full side by side? Do you really need them? > Best, > Vicente Artyom


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk