Boost logo

Boost :

Subject: Re: [boost] [locale] Review of the proposed Boost.Locale library
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-04-17 05:03:48


> From: Volker Lukas <vlukas_at_[hidden]> > > I vote to accept the proposed Boost.Locale as a Boost library. > I'm glad to hear. > - What is your evaluation of the implementation? > I was having only a superficial look at some of the source files, so I > can not comment much. What I have seen looks clean to me. While > reading the source code I think I could benefit from even more code > comments. There are a lot already, but on the other hand there exists > for example a source file libs/locale/src/shared/mo_lambda.cpp > apparently containing a whole state machine or something without any > word how it is used. Good point. It is C expression parser. I think I need to comment it better. > I also stumbled over an odd comment in file > libs/locale/src/std/numeric.cpp in line 163: "// workaround common bug > - substitute NBSP with ordinary space". As a reader who did not > already write the code I can not infer much from comments such as this > one - I mean "What is the common bug then?" If this comment refers to > the situation that e.g. the locale implementation of GCCs standard > library implementation generates invalid UTF-8 when using NBSP as > thousands seperator, Yes, I mentioned this bug in tutorial. > then the comment should say that. Well, my point > about comments is a minor issue of course. Noted > I do not suggest that the > author goes over all the code again just to add or change comments. > I'm still glad to hear about possible problems. > I am a bit worried over the size of the libraries binary. On GNU/Linux > x86_64 the built libboost_locale.so is almost 10MB large and codesize > according to the "size" utility is almost 1MB. > This comes in addition > to the underlying ICU library. Is there any potential for reducing > codesize? You likely compiled it with debug information (this is default for CMake build), without debug information it should be relatively small if I'm not mistaken about 700K. Also you can always reduce its size by turning unused backends off. I always kept in mind an embedded use of it on Linux platforms (which is important for CppCMS) so it can be not so big, it is very configurable. > Due to the same concern about codesize I have also one > concrete suggestion relating to the comparator template of the > collation component. I suggest to not make the default comparision > level a template parameter, instead to store it as a member. This way, > when instantiating container templates the number of instantiations > can be kept down. Or am I mistaken in thinking this would lead to more > compact code? Not really, because most of the code is actually the backend that deals with comparison. And finally it goes as a parameter to the underlying ICU or Win32API function. > [snip] > > There are some issues I have: > Phrasing is sometime odd. For example right in the beginning, under > the headline "What is Boost.Locale" there is a sentence which reads > "C++ offers a very good base for localization via the existing C++ > locale facets std::num_put, std::ctype, std::collate etc. But these > are very limited and sometimes buggy by design." I can only guess what > that means. Superficially it seems like a contradiction to me, > something which is excellent can not be buggy by design, not even > sometimes. Okay, minor issue again, it just stands out because it is > only the fourth or fifth sentence or so. Actually I mean the base is good (std::locale, facets idea, integration with iostreams) but implementation and actual existing facets quite bad designed. > > There are some typos (I guess) where it is important to be correct: > Under "Messages Formatting (Translation)", subheading "Indirect > Message Translation" all functions are documented as "The source text > is not copied.". From the emphasis in the original document I think I > can conclude that some of these functions copy and some do not. > Already Noted. > The reference section of the documentation should document the > requirements on the CharType which many functions/classes are > parameterized upon (can it be char/wchar_t only?). > Yes it can only be char and wchar_t, in experimental C++0x mode (not really supported due to compiler/std library bugs) it may be char16_t and char32_t. It is all over the code, I probably may mention it more explicitly. > The reference documentation for token_iterator and break_iterator > should not have individual entries for operator++, operator--, > operator== and other operators which are always required for > iterators, unless the documentation adds information. Instead, it > should just be stated to which iterator concepts these templates > conform. > Actually it is how doxygen works... In any case good point I'll check it. > Documentation for the dereferening operators (operator*) of these > templates states that the underlying sequences iterated over can not > be modified with these operators. As this is the case, could a usage > hint be incorporated stating that users should always use > const_iterators as arguments to these templates? If a user sometimes > employs for example std::string::iterator and sometimes > std::string::const_iterator this would lead to unnecessary code bloat. > Also, operator-> is missing from these templates. > Ok, noted I'll take a look on this. > [snip] > > I played around with some of the components of the library and ran the > supplied test cases. There is one inconsistency I noted, the following > program generates output "Currency: $100.11" when the environment > variable LANG is set to "en_US.UTF-8" and variable LC_MONETARY is set > to "de_DE.UTF-8". Shouldn't the library honor LC_MONETARY? If LANG is > set to "de_DE.UTF-8" "Currency: 100,11 €" is put out. > Program: > ------------------------------------------------- > #include <ios> > #include <iostream> > #include <ostream> > #include <locale> > > #include "boost/locale.hpp" > > int main() { > namespace loc = boost::locale; > using std::cout; > > loc::generator gen; > cout.imbue(gen("")); > > cout << "Currency: " << loc::as::currency << 100.11 << '\n'; > } > ------------------------------------------------- > Actually, it does not follows the POSIX separation to subclasses. It uses LANG, LC_ALL and LC_CTYPE in that order to get the current locale. > > With best regards, > Volker Lukas Thank you very much for the review. Artyom


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk