|
Boost : |
Subject: [boost] [locale] review part 2.1: source
From: Steven Watanabe (watanabesj_at_[hidden])
Date: 2011-04-09 23:18:39
AMDG
Okay, I managed to get through the encoding, icu, and posix
subdirectories today.
encoding/codepage.cpp:
line 28: In C++ you should use <cstring> (and std::strlen)
encoding/conv.hpp:
encoding/iconv_codepage.hpp
I think the convention for headers like
this which are really source files in disguise
is to give them the extension .ipp.
line 125: You don't actually care how the vector's
contents are initialized, do you? You can just
specify the size.
line 150: I think sresult.append(out_start, out_ptr);
would be slightly clearer.
line 168: What are the possible errno values
besides EILSEQ, EINVAL, and E2BIG? If
you get an error you don't expect, is
silently stopping the correct behavior?
iconv_between: Inheritance from iconv_from_utf is
a little odd.
iconv_to_utf: The comments from iconv_from_utf apply.
Also, the implementation is almost the same as
iconv_from_utf. I think you could define a single
function template to implement both.
encoding/uconv_codepage.hpp:
line 52: Do you really intend to translate any
std::exception? Including std::bad_alloc?
That isn't consistant with what iconv_codepage does.
encoding/wconv_codepage.hpp:
Missing #include <cstring>, <string>
lines 86,101: The STL is your friend:
size_t remove_substitutions(std::vector<wchar_t> &v)
{
v.erase(std::remove(v.begin(), v.end(), wchar_t(0xFFFD)), v.end());
return v.size();
}
icu/all_generator.hpp:
Why <vector>? You're using locale and string, but not vector.
icu/boundary.cpp:
icu/cdata.hpp:
Missing #include <string>
icu/uconv.hpp:
Needs #include <string> and #include <memory>.
line 8: This #include guard is excessively generic.
icu/codecvt.cpp:
lines 44, 48: If this throws, cvt_ needs to be closed.
icu/codecvt.hpp:
Missing #include <memory>
The #include guard is wrong.
icu/collator.cpp:
line 115: Does do_basic_transform guarantee that
the string contains no embedded null bytes?
It seems that this is required for hash to
work correctly.
icu/conversion.cpp:
icu/date_time.cpp:
In calendar_impl: I can't figure out why you're locking
in some functions and not others. For the most part,
you seem to lock in non-mutating functions and not
in mutating functions, so I was guessing that you
are trying to guarantee that a calender can be read
concurrently, but writes must be exclusive and that
ICU does not guarantee that concurrent reads are
okay. However, get_timezone and is_same don't lock.
Does ICU handle these methods differently from the
others W.R.T. thread-safety?
line 236: Do you need a lock here? You're only using
self which is a different object.
icu/formatter.cpp:
line 69: This seems like a really bad idea. It means
that integers larger than 0x7FFFFFFFFFFFFFFF will
be formatted incorrectly. If this is an unavoidable
limitation of ICU, I think you shoud probably throw
an exception and document the behavior. And don't
think that people won't try to print unsigned numbers
this big. I know I have.
line 263: This comparison is subtly wrong for int64_t.
Consider what happens when date is exactly 2^63.
limits_type::max() will be 0x7FFFFFFFFFFFFFFF.
According to [expr] this will be converted to
double. For most rounding modes, the result
will be 2^63, which is equal to date. Then,
when we get to line 268 and cast back to ValueType,
the result will be 0x8000000000000000, and the
behavior is undefined according to [conv.fpint].
(Note: It's obviously not easy to trigger this,
but I'm guessing that a specially crafted test
case could)
icu/formatter.hpp:
line 44: What about float and long double. long double
may have more precision than double. Also, even for
float, the double rounding caused by working with double
can produce slightly different results from when using
float directly. Does ICU only allow you to use double
and [u]int[32|64]_t?
line 93: I find this comment a bit misleading. It wasn't
clear until I looked at the implementation that what
was being cached was the ICU formatter. The
boost::locale::impl_icu::formatter object is not
cached.
icu/icu_backend.cpp:
line 69: Ideally you should not mark the object
as valid until you know that this function will
not fail.
line 91: I'd kind of like install to be thread-safe...
icu/icu_backend.hpp:
icu/icu_util.hpp:
icu/numeric.cpp:
In num_format: You don't handle bool?
line 290: Alas, the standard doesn't guarantee that
you can put_back an arbitrary number of characters.
This isn't going to work if you're reading from a
non-seekable stream and the line is sufficiently long.
icu/predefined_formatters.hpp:
icu/time_zone.hpp:
icu/time_zone.cpp:
icu/uconv.hpp:
posix/all_generater.hpp:
posix/codecvt.cpp:
mb2_iconv_converter seems to be doing a lot
of the same work as encoding/iconv_codepage.
Is there any way to avoid the duplication?
posix/codecvt.hpp:
Missing #include <memory> and <string>.
posix/collate.cpp:
posix/converter.cpp:
I noticed that you convert UTF8 to wchar_t.
For the posix backend, are there other multibyte
encodings that would need this? Or would
converting everything to UTF32 cause other
problems?
posix/numeric.cpp:
line 54: sizeof(buf) is 64, but you pass 256?
line 57: is 1024 absolutely guaranteed to be
enough? Otherwise you should use a loop
with exponential reallocation. Line 97 too.
lines 68, 75: std::copy?
posix/posix_backend.cpp:
install: The same comment W.R.T. thread-safety applies
as for icu.
line 85: If this fails, don't you still need to call freelocale?
I think this would be simplified if you created a simple RAII
wrapper that holds a locale_t by value.
posix/posix_backend.hpp:
(To be continued)
In Christ,
Steven Watanabe
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk