Boost logo

Boost :

Subject: Re: [boost] [Boost.Locale] For Preliminary Review of Boost Community (with Full documentation)
From: Artyom (artyomtnk_at_[hidden])
Date: 2009-11-17 04:53:04

Hello Marcus, > > The overall design fits very well into C++ and boost. > But for boundary analysis I have other expections: > I would rather expect an iterator similar to > boost::tokenizer > > which lets me iterate over characters, words and > sentences. > The design of your boundary analysis seem to be very > complicated and because > of the manual use of offsets very error prone. Thanks, that is very good point. I wasn't aware of `boost::tockenizer` functionality. I think it would be quite easy to provide several TockenizerFunction concepts based on the created mapping. So the interface would be more convenient. I think something like `boost::locale::boundary_separator` that would split the text according to the index, also it would be useful to add flags of types of breaks/types of words should be taken. Something like that: boost::locale::boundary_separator<char>(...) boost::tokenizer<boost::locale::boundary ... > > > There is a part where I see intersection to existing boost > libraries: > - date formatting, especially the ftime functionality are > already supported > by boost::date_time, see: > First of all boost::date_time supports only Gregorian calendar, Boost.Locale (ICU) supports many others. For example: std::cout << as::date << time(0) <<std::endl; Can create, in en_US locale would be something like November 11, 2009 When in he_IL_at_calendar=hebrew locale it would be something like (translated to English) 21 Cheshvan 5770 The number of months in Hebrew year even not constant. So date_time and date_time formatting are not appropriate for these purposes. > - time zones, also already in boost::date_time > however I find your way better that you use > std::set<std::string> and not > std::vector<std::string> like in the Time Zone > Datebase of boost::date: > 1. ICU has quite complete and powerful support of timezones. Built in into the library itself and not relays on external file that should be downloaded and loaded directly to the source. 2. It is quite hard to merge between ICU timezones and boost::date_time time zones. On the API level but is is very simple to assign timezone by its id. 3. Timezone support is used mostly for formatting. So, meanwhile I think that it is quite hard to merge between two timezones implementations because of: a) Different date-time representation b) Different sources of time zone information. > > On the other hand there are parts inside boost which should > use > boost::locale: > - boost::regex (seems like it is currently using libicu > directly?) Yes, `boost::regex` uses ICU directly for Unicode support but, regular expression matching is not locale dependent. (At least according to ICU API) So there is no reason to use `boost::locale`. > - boost::format > Take a look on boost::format notes there: In fact there some issues with boost::format. For example std::locale::global(gen("ar_EG")); cout.imbue(gen("en_US")); cout << boost::format("%1$d") % 103 << endl May print "١٠٣" instead of "103" because it uses global locale instead of iostream locale. This can't be fixed easily, if fact it would cause quite big changes in boost::format semantics. So boost::locale::format created that is more appropriate for localization then boost::format that is much more appropriate for actual data formatting. > Some remarks about the doxygen documentation: > > > - It should read #include > <boost/locale/generator.hpp> > instead of #include <generator.hpp> (and all the > others). > > - impl() should be private and should not be documented. > "Do not use this" > is not very useful. I'll try to remove it form doxygen, but making it public allows much easier implementation of functionality of many functions that do need access to impl. In fact impl keeps icu::Locale class. In any case it can't be used by user because the class is not defined for him, just forward declaration exists. > > - Documentation missing for operator== in timezone: Does it > compare object > identity or id? > Thanks. It compares ids. I'll add this to documentation. > Now I will discuss the problems in the order of the > tutorial: > > > - I did not find a list of all available facets. How can I > get back to the > defaults (all facets activated) after some are turned off? > Thanks, You can pass flag "all_categories" or "all_characters". I'll add this remark and the list of all facets. > == Collation == > > - There should be some explanation of std::use_facet or > facet in general > because it is a very seldom used feature of the C++ > standard. > Thats because without something like boost.locale it is > barely useable. > Ok, good point, I'll add. > - loc was used in > use_facet<collator<wchar_t> > >(loc).compare(collator_base::secondary,a,b); > but not defined before (inside Collation chapter). Better > you use locale(). > The same problem with some_locale and some_level. > Give working examples whenever it is easily possible. Thanks, I'll try to add remarks or fix. > > - You wrote the comment: > // Now strings uses default system locale for string > comparison > Does it means that it uses the default system locale ("C"), > the locale when > you call gen(""); or the default which was set before with > std::locale::global()? > Thanks, you are right this is not clear. In the case I mean std::locale::global(). I'll fix it. > I would suggest some glossary, like in boost::filesystem: > > Very good point. Thanks! I'll add. > == Number, Time and Currency == > > - What does "as" mean? Is this just meant as the english > word as? > English word "as" > - Negative Numbers as currency do not work as expected. The > output of the > program with a de_AT locale is: > -€ 14,00 > but I would expect € -14,00 Unfortunately, locale database thinks differently. ICU formats money as -€ 14,00, same does POSIX strfmon in de_AT locale. So I assume this is the standard for monetary formatting in this locale. > > - Parsing has the same problem, but even worse the varialbe > contains a > completely wrong number afterwards! > Parsing > € -14,00 > leads that the variable contains 11027 afterwards! The > library really should > throw an exception if there are parsing errors. 1. You should provide -€ 14,00, instead of € -14,00 2. You should check failbit as it is done for regular numbers parsings is not thrown by iostreams unless specific flag is set. cin >> as::currency >> how_much; if( { // error occured } Or: cin.exceptions(istream::failbit); try { cin >> as::currency >> how_much; // do something } catch .... So the behavior is fully standard complaint. > > - You did not mention that cout.imbue() or cin.imbue() must > be called first > in the beginning of the section. Just setting the global > locale (or default > locale?) is not enough. > Yes, your right, this worth to mention. Many do not know this. > - as::ordinal did not work. > In the locale de_AT As I had notices in documentation, not all locales support as::spellout or as::ordinal. Because some locales missing required rules. > The rule is simple: > just add a "." > for every number. In any case, the best practice is to write cout << format(translate("The {1,ordinal} document")) % no << endl; So smart translator may simply substitute a translation string with more correct formatting, in your case. The {1,number}. document In any case I would suggest using latest ICU version. Many things in locale database improved since 3.6 > - boost::locale::as::currency_iso does not work: > std::cout << boost::locale::as::currency << > boost::locale::as::currency_iso > << 1523.45 << std::endl; > € 1.523,45 > std::cout << boost::locale::as::currency_iso << > 1523.45 << std::endl; > 1523.45 > But it should read: > 1.523,45 EUR as::currency_iso requires ICU 4.2 and above. See: > > - Why is the type double used for time? You should use > time_t or TimeType > instead. double is used in the example, but also in > offset_from_gmt. > Any type of number can be used for time representation time_t, double, int64_t etc. > - Also for time it should be more clear that e.g. time_long > is a additional > modificator to time. I think I mentioned this, but I'll clarify it more. > - gmt seems to modificate both. time_zone also modificated > both, even though > it has the prefix "time_". All others with that prefix only > modificate time > formatting. Maybe you should avoid prefixing and use a > namespace instead? > gmt, time_zone - should not modify both, of so this is bug and I'll fix it. > - typing mistakes: > signle -> single Thanks, I'll fix it. > - as::domain did not work, when using: > std::cout << boost::locale::as::domain("foo") > << > boost::locale::translate("foo") << std::endl; Thanks, I'll check and fix this (for some wired reason I remember I had already fixed this) > - Why is wchar_t used even though it is assigned to a > string? > std::string msg = translate("Do you want to open the > file?").str<wchar_t>(some_locale) > Ooops, thanks , I'll fix this. > - The datatype "message" in send_to_all is not explained? Ok, I'll clarify this, message is special container of translated message that can be rendered according to locale when it is written. > What is "ms"? Typo.. Thanks. > - " missing and translate misspelled in: > cout << format(tranlsate("You have 1 file in the > directory",You have {1} > files in the directory",files)) % files << endl; > > - gettext, ngettext are exposed into global namespace when > <boost/locale.hpp> is included! > Thanks, I'll fix this > == Code-page conversions == > > - Please give an example for code converter. > Good point, it definitively needs some clarifications. > - Spelling: indx should be index? No just shorter name. > > - first test example does not output any character (only > line breaks) > You are right, I'll fix this. > > - misspelling: tranlsate Thanks, > > - std::locale::name (or better std::locale().name()) just > returns *, even > german locals are set. I would expect that your library > sets a name? > No. Boost.Locale can't change the base locale class. So it can't change how name is defined. Also name is not portable. For example under linux: en_US.UTF-8 is fine name, but Windows would not use it and throw an exception, Also if the locale is not generated the exception is thrown as well. So I can't change the behavior of underlying std::locale class. So boost::locale::info is provided. > - How is info meant to be used? ~info is protected, so it > seems like that > the object should not be instanciated myself? As all other facets -- they have protected destructors and they are managed and destroyed by std::locale class. So you create is as and locale class manages it. std::locale some_new_locale(some_old_locale,new info("en_US")); Also you never use info directly you rather you use it via use_facet functions. cout << std::use_facet<info>(someloc).language() << endl; If some of my facet destructors are not protected this is rather bug. > == Multiple Locales == > > - Do you mean gen.get() in: > std::locale ar=get("ar_EG"); Yes. thanks. > > Why is it not cached in that case? > Because `get` member function is const memeber and it tryes to find the locale in the existing set and then fetches one. If nothing is found it generates. So, first you populate the cache with all frequently used locales and then fetch them, if something not ordinary requested it would be generated in place. I could use some true "caching" mechanism, but it would require usage of locking per each locale request. > - How much effort did you put into your evaluation? > about 6 hours Markus, thanks you a lot for this review, it extremely valuable nad useful. It would help me a lot. I think I'll be able to do all the fixes in few days and I'll send a notice. Best, Artyom

Boost list run by bdawes at, gregod at, cpdaniel at, john at