Boost logo

Boost :

Subject: Re: [boost] [locale] Review part 1: headers
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-04-09 05:50:25


> > This is getting rather long already, so will be > a multi-part review. I've attached a diff of > various fixes in comments that I made while > going through the headers. > Thanks, I'll add them to trunk. > locale.hpp: > > boundary.hpp: > > line 22: you already #included <iterator> > Noted > line 47: You don't need to use typedef enum { ... } name; > enum name { ... }; is more normal in C++. > Good points, thanks. > Is there a reason that you use four bits for each > flag? Is this to allow the flags to be split > up without breaking binary compatibility? > Exactly. > line 102: You assume that unsigned is at least 20 bits. > It's usually 32, but I don't think the standard > guarantees as much. > I don't think neither ICU nor Boost supports non-32 bit platform. I understand that strictly speaking it is non-standard but de-facto there are no problems with this and this assumption is quite safe. I can change this to uint32_t > line 131: Please be consistent about using unsigned vs. uint32_t. Good point > line 140: shouldn't offset be std::size_t? > Why should the size of the string be limited > to 4GB? std::size_t is guaranteed to be > large enough to handle anything that can > be stored in memory. > You are right, I'll fix this. However, two points: 1. I already know compilers were size_t != sizeof(void *) so actually it must be uintptr_t... but it is not in C++03 but rather in C99. (it is in C++0x) But generally you right. 2. There is no reason on planet earth you need to create boundary mapping for text >= 4G! Even entire War & Peace book takes about 3MB. So if you do this you are probably doing something very-very wrong. But this is different story. > Why do you have explicit specializations of > boundary_indexing? They're all the same. > The compiler is perfectly capable of instantiating > them all from a single definition. > Ok this is something that you'll see all over the Boost.Locale code when it comes to deal with facets generations. It is in order to support (brain damaged) DLL platform. I explain. Each specific type of facet should have its own unique static std::locale::id as "id" member. So you must specialize it, such that its definition would be explicit in DLL, library otherwise if you create a facet in DLL and pass it to user level program or the other way around there would be 2 (or more) instances of std::locale::id. So when you would call std::use_facet<facet_type>(locale) it would throw std::bad_cast because there would be actually two instances of same id. I literally passed 9 circles of hell in order to make everything work for both MSVC and GCC. So finally I had found that specializing each instance of specific facet is most reliable and working solution. > line 435: it's somewhat surprising to see > a templated swap. You should comment that > it requires that the base_iterators must > be the same type. > Good point. I'll add this. > line 447: assignment is strongly exception safe. > you should make this guarantee explititly. > What do you mean? > line 538-539: I don't understand this sentence. > What does "tide" mean? > Doesn't example makes it more clear. Probably I use the word "tide" incorrectly, basically it means whether I want to select everything or I need to select strict subsets. For example if you want to select every word or only thous who actually have letters? So basically it is strict token selection should be. > line 532: entry text? I don't think this is right, > but I don't know what you meant to say. > Spelling mistake entire text. > line 600: token_iterator(mapping_type const &map,bool begin,unsigned mask) > Just make this constructor an implementation detail. > It's too messy to be part of the public interface. > Noted, I'll define it as internal interface. > line 628: if(this!=&other) > I doubt that this actually helps performance, > and the compiler's version of the copy constructor > and assignment operator should work fine. > Yes you are right > line 646: Normally it's undefined behavior to > dereference an iterator like this. I would > prefer an assert to throwing an exception. > The point is that these locations are valid positions just you can't call operator*. I think it is better behavior because it is quite easy to make a mistake with this. (From my experience with this) So such behavior gives a better way to debug the software. > break_iterator: most of the comments from token_iterator > also apply to break_iterator. > Ok, noted. > collator.hpp > > line 74: Please describe the meaning of the return value, > even if you think it's obvious. > Ok, good point. > line 115: You should spell out the post-conditions of > transform more clearly. I presume that the following > holds, but I'd like to see it explicitly: > Given strings s1 and s2 and collation level level, > let t1 = transform(level, s1) and t2 = transform(level, s2). > Then, t1 < t2 iff compare(level, s1, s2) < 0. > Good point. I'll add this to docs. > conversion.hpp: > > Again, why the separate specializations? > Same as above, DLL platform support... > date_time.hpp: > > line 75: this is dangerous. These constants > will be defined separately in every translation unit > using them. This can cause violations of > the one definition rule. (The standard makes > a special exception for integral constants, > but these are of class type.) > So how would you suggest to fix it? Remove static and move their constructors to cpp? In any case they just aliases so you always can write period::month * 0 that is equal to period::january > operator*: > - You missed signed char and long long. Actually for unsigned char as well unsigned cchar!=char!=signed char I don't know I missed this. > - This is one place where a little template metaprogramming would help: > template<class T> > typename enable_if<is_arithmetic<T>, date_time_period>::type > operator*(period::period_type, T); Ok, I'll take a look on this. Good point. > - I think trying to override the built-in enum to int > conversion like this is rather fragile. What happens > when someone tries to use an enum constant: > enum { this_year = 2011 }; > this_year * period::year; I understand this, on-the other hand it seems to be very convenient. Because otherwise I'll have to define for every period class every arithmetic operator and it would be O(n^2) complexity. I'm not sure if it would be good either. How would you suggest to do this? > - the operators need to be defined whenever period_type > is defined to avoid accidentally calling the built-in > operator if the user forgets to #include a header. Good point. > > line 491: The argument to operator[] should be std::size_t > to match std::vector. > Noted. > date_time_facet.hpp: > > encoding.hpp: > > lines 103,112,145,154,218: technically data() would be more appropriate > than c_str(), since you don't need the null terminator. > No, it should be c_str(), because otherwise if the the string is empty the data() is undefined, so it may crash (and it is crashes with MSVC) So c_str() guarantees me that it is valid value. > line 218: Please qualify this call to avoid ADL in > namespace std. (Other places in this file too.) > Can you explain better? > format.hpp: > > line 65: Use static_cast instead of reinterpret_cast. > Actually, you don't need a cast at all. Use static_cast > in write as well. > Noted. > line 104: shouldn't this be get_position()? > Yes... My bad. > Is there precedent for this format string syntax? Actually it is partially based on ICU format, and some others formats like Java and so on. The most important part that this format should be rich and extend-able. > > What are the requirements on the encoding of the format string? ASCII compatible... At least for '{' and '\'' symbols. This covers all supported and frequently used encodings in locale context. > > What happens when the format string's syntax is > invalid? Undefined behavior? An exception? > If the format is invalid it is ignored. Rationale: you don't want the translator that actually provides translated format strings to screw the program. > What exception guarantee does streaming provide? > If an exception is thrown, can the stream's locale > and format flags have changed or are they reset > correctly to their original values. > Currently they are not restored. I thought it is better to to try to restore flags in case of exception as restoring they may actually trigger other one. > lines 259-263: I'm sure this conversion from char works in > practice, but does the standard guarantee it? Character types can be casted as any other integer types so I don't see problems there. The "correct" way is to use narrow() or wide of ctype facet but they are frequently broken so I prefer to use such cast as most reliable (and covered by the unit-tests) > > lines 341,345: You defined named constants earlier. > Why not use them? Good point, I'll take a look on this. > > line 292-293: I'm not sure why you have separate svalue > and value. AFAICT, value is used if the value begins > with a quote and svalue is used otherwise. > Because some flags has localized parameter (may be wide parameter) like {1,strftime='תאריך %c'} and others are actually are ASCII flags like {1,dt=full} They are treated differently. > formatting.hpp: > > line 227: string_set::set is not exception safe. > if new fails, then you'll be left with the new > Char type, the old size, and a null pointer. > size is needed only when ptr!=0 in order to copy its content correctly, so strictly speaking it is exception safe. However it is good to reset size to 0. I'll add this. > generator.hpp: > > line 62: This assumes that unsigned is 32 bits, > which implies that you should be using uint32_t. > Yes... However all supported platforms are at least 32bits. There is one small points. As far as I remember I always used native types over strict types in the interfaces (function calls) because if it is changed between small releases (like stdint.h found) it would not break binary compatibility so native types (as long as they good enough) are better alternative. (We are talking about C++ compilers that does not handle C99 stdint well) So strictly speaking uint32_t is better but does not really important or unrealistic requirement. > line 90: Use locale_category_type? Yes you are right. Should be fixed. > > gnu_gettext.hpp: > > line 79: Why is the argument non-const? Yes, you are right... I don't know why had I missed that. It should be const (and in fact it goes to const constructor under the hood) > > hold_ptr.hpp > > line 29: The constructor should be explicit. Good point. > > line 36: no need to test for NULL. The compiler > already checks for NULL for you. > Yes, right. but does not hurts. > info.hpp: > > #include <map> and <memory> are unnecessary. Right... I'll remove them. > > localization_backend.hpp: > > Missing #include <memory> for auto_ptr. > Inherited for generator, but good point. > line 129: What about different back ends for different > char types? Does that even make sense? > Not really. Generally you may want to have different backends because something is not supported. For example std backed may be suitable for formatting and parsing and for date_time handling (for example parsing numbers in std backend is much faster) But you still may want better ICU based collation and ICU based boundary analysis which is not supported by std::backend. As all backends support char/wchar_t and actually nobody at this point does not really support char16_t/char32_t (huge compiler/std library issues everywhere) So I think it is more then justified. > message.hpp: > > The compiler generated assignment operator of message > only provides basic exception safety. If it fails, > the message object may not be meaningful. Please > document this or change it. > Ok, interesting point, I'll deal with it. > line 776: you could use aggregate initialization. > What do you mean, message.hpp:776 points to #pragma warning(pop) > util.hpp > > line 48: Can you link to the documentation of the > format of name? What happens if name is not > in the correct format? It provides substitutions or ignores it. It is mostly function for backends that generates some shared facets and may be useful for backend developers. > > line 76-77: Please rewrite this sentence. I can't figure > out what it's trying to be say. Yes you right. It should be /// /// This value is returned in case of incomplete input sequence (from_unicode member) or /// too small buffer (to_unicode member functions) /// > > line 108: assert(typeid(*this) == typeid(base_converter))? > Why? To make sure it is overloaded? The point that clone() is used only in case is_thread_safe() == false. I can add this but how really important is that? Thanks for great comments. Artyom


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk