Subject: Re: [boost] [locale] review part 2.2: source
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-04-15 10:04:31
> line 79: If it's an invariant that tz_ is
> the same as the timezone of impl, then
> this is not exception safe if clone throws.
> You'll have to use copy and swap idiom.
> line 161: This is a bit odd, don't you think? Normally
> if assignment is strongly exception safe without
> doing any extra work, there's no reason to go out
> of your way to clear the old state on failure.
Yes, you are right.
> line 350: should you check that v is in range?
> Regardless, the valid range should be documented.
The valid range is actually depends on specific calendar
and its implementation. Sometimes it depends on actual
calendar or for example in Hebrew calendar it
can't be year less then ~5800 now as it is
when the world was born (according to the calendar)
So if the range is not valid the underlying implementation
> line 53: it seems that this doesn't restore the
> float field, base field etc.
Ok... I'll check this.
> line 42: normally copy and swap is the preferred
> way to implement assignment. The main
> advantage is that it guarantees that
> if the operation fails, the state of
> the lhs is unchanged.
Noted... This is something across all the code.
> line 41: From the way domains is used, would a std::set
> be more appropriate?
There should be at lease one that is default domain.
It is the first one so in any case I would need to mark
something as default.
> missing copyright/license.
> (Issues like this will be caught by
> the inspect tool, so not a big deal now.)
Yes you are right.
> line 30 and others,: casting to and from void* should use
> line 56: can you use invalid instead of hard-coding
> this constant?
Opps, you are right.
> line 96: I don't think this is guaranteed to
> be distinct from any valid pointer. Can
> you make invalid the address of a unique
> global object instead?
This method is used quite widely, for example mmap
returns (void *)(-1) in case of failure, and it can't be anything
On the other hand there is enough problems with unique
global objects there.
> lines 200, 206: This is not guaranteed to be thread-safe
> in C++03. You must use call_once for thread-safe
> static inititalization. Note however that this is
> probably safe if you assume that no additional
> threads are started before main.
This mutexes are explicitly globally initialized
So all such things should be solved on library start.
So I think that is would be good enough and call_once
would just inject additional dependency on Boost.Thread
(currently Boost.Thread used in headers only variant
when ICU is not used)
> line 123: I take it that the format guarantees that
> hash_size_ > 2? Please validate this requirement
> when loading.
> line 132: Use null_pair?
> line 152: You require that off is in range, that
> off <= off + len, and that off + len is in range.
> Please check this.
> line 204: You really should check that you actually read
> 4 bytes. Otherwise if the file is too short, you get
> undefined behavior from reading uninitialized memory.
Actually variable magic is initialized (0) and if it was
not read correctly its value wouldn't be valid.
That is why I don't check the returned value.
> line 311: How do you know that there's a null-terminator?
Promised by the file format.
> line 326: suppose that form is 0 and the null-terminator
> is missing...
The different plural forms are separated by 0.
> line 407: This isn't guaranteed to work. The
> standard doesn't guarantee that the letters
> are contiguous. (Does anything use EBCDIC
> these days?)
I'm not sure that something better and simple
enough can be done.
I assume we are ASCII at least.
> lines 440, 442: You're assuming that the string is
> null terminated.
> What if the file is incorrectly
> formatted? It may be possible to read past
> the end of the allocated block. I haven't
> actually worked through the exact code path
> to trigger this condition, but it seems suspicious.
It is the requirement of the format, it is designed to
be loaded as is to memory and be useful.
So I don't check every possible string as it allows
to load files much faster.
In any case when I load file I put 1 extra 0 at the end
so at some point it will be terminated
> line 475: You're assuming that the separator
> actually appears.
std::string::npos is maximal value of size_t
so substr just get bigger sub string. May be
not 100% clear but this code does what it should.
> line 500: I think this can cause alignment problems.
> Does the file format make any alignment guarantees?
> Even if it does, you'll need to handle invalid
> files gracefully. Of course, it'll only be
> inefficient on platforms that allow unaligned
> loads, but still, the standard doesn't guarantee
> that it'll work.
Actually there two types of catalogs:
1. Directly loaded - when the locale's 8 bit charset and
the catalog's one is the same like UTF-8.
2. In memory when the base character is not char (wchar_t or charXX_t)
or the catalog had to be converted to other character set.
So yes, these pointers are going to be aligned.
> line 239: There's no 'c' in tokenizer.
Ooops. I'll fix this.
> Everything except compile can be in the unnamed namespace.
Yes you are right.
> Is this header intended to be used in multiple
> translation units? If so you probably should
> make the functions inline and not use an
> unnamed namespace.
Yes you are right. I don't know why had I done it
> line 49: Don't most Windows functions use DWORD,
> rather than int?
Yes. You are right.
> line 48: I think I just saw code looking just
> like this in another file.
There is no win32/collate.hpp?
> line 51: Check whether this succeeds and uses
> the entire string?
> line 89: I thought double checked locking was
> considered unsafe?
I had a discussion with a professor an expert in
this area, no problems when mutexes are full
memory barriers and they are.
> Anyway you should use
> call_once for initialization like this.
> line 32: I think I saw another copy of these functions.
> Get rid of the duplication?
AFAIR, they had different semantics.
> line 30: I don't think you need a separate
> object file. It isn't used anywhere else.
It does because it is used in different targets
AFAIR is solved this problem.
> line 126: A simpler way to reference another
> library is /boost//thread
> line 211: I would tend to use
> icu-sources = [ glob icu/*.cpp ] ;
> result += <source>$(icu-sources)
> I see you're using something like
> this later on.
The problem comes when you want to put something
temporary there out of project.
Never liked glob
> line 278: Boost.Thread ought to set this flag
> automatically. I'm assuming from the fact
> that you have it, that it doesn't.
> like a bug in the thread Jamfile. (Similarly,
> you should add <define>BOOST_LOCALE_NO_LIB
> in the usage requirements of boost_locale)
I explain, I use boost.thread's mutexes but I don't
want to link with this library. So in many cases
I use header-only boost.thread so I prefer specify
explicitly that I don't need Boost.Thread
(Boost.Thread is linked only for ICU backend where TLS is needed)
> test/Jamfile.v2: You don't actually need
> test-suite. test-suite is exactly the
> same as alias.
What do you mean?
> In Christ,
> Steven Watanabe
Thank you very much, it was really very
good code review.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk