Boost logo

Boost :

Subject: Re: [boost] [locale] review part 2.1: source
From: Steven Watanabe (watanabesj_at_[hidden])
Date: 2011-04-10 13:14:38


On 04/10/2011 02:12 AM, Artyom wrote:
>> encoding/iconv_codepage.hpp
>> line 168: What are the possible errno values
>> besides EILSEQ, EINVAL, and E2BIG? If
>> you get an error you don't expect, is
>> silently stopping the correct behavior?
> According to opengroup's specifications, Linux
> man pages and others these are the error states that
> can occur.
> Maybe I should throw conversion_error if the method is stop().

That seems reasonable. This situation
shouldn't ever happen, but it's probably
a good idea to detect it.

>> iconv_to_utf: The comments from iconv_from_utf apply.
>> Also, the implementation is almost the same as
>> iconv_from_utf. I think you could define a single
>> function template to implement both.
> Not exactly. I had found that it is much
> safer and readable to have two implementations
> then to create quite complex meta programming
> to make sure it works.

That's sometimes true. I know around here
we tend to err on the side of too much
metaprogramming, but in this case I don't
think you even need much.

> It is much easier to assume that at least
> one side is "char"

Really? In this case the code looks
almost identical.

>> lines 86,101: The STL is your friend:
>> size_t remove_substitutions(std::vector<wchar_t> &v)
>> {
>> v.erase(std::remove(v.begin(), v.end(), wchar_t(0xFFFD)), v.end());
>> return v.size();
>> }
> After reading it carefully I can see how it works but
> generally I see how it works but it isn't so readable.
> In any case interesting and it may save some allocations.

Well, remove/erase is idiomatic, so it should be
clearer to those familiar the STL.

>> icu/numeric.cpp:
>> In num_format: You don't handle bool?
> ICU does not handle it... So I can't do something
> better for this - I pass it to std::locale::classic
> formatting.

Is there any way that you could use the
message translation mechanism for "true"
and "false" if std::boolalpha is set?
Otherwise it should be formatted the same
as an integer.

>> line 290: Alas, the standard doesn't guarantee that
>> you can put_back an arbitrary number of characters.
>> This isn't going to work if you're reading from a
>> non-seekable stream and the line is sufficiently long.
> The problem is that it is the best I can do from-inside
> std::locale's num_fmt facet.
> It has many major limitations. For exaple if you would
> try to parse 12345YX as
> std::cin>> number>> character
> then character would be X.
> This is specially critical when I parse spellout format
> to be able to putback correct part.
> Maybe I should try to see what is the size of the
> buffer and/or document that the stream
> should be able to handle putback well.
> It is actually fairly simple as streambuf has a buffer
> so it can actually receive putback characters without problem.

For a limited number of characters.
If you're reading from a file or from
a stringstream, you'll be okay because
pbackfail can seek to an earlier position
in the stream. If you're reading from
a terminal, it should be okay because it's
usually line buffered, and the lines aren't
usually very long. I could see running
into a problem if you're reading from
a pipe, however.

> Unfortunately it is the best solution I could
> reach so far with the limitation of std::num_get

Yeah. I can't think of anything better
either. This is really more of a limitation
of std::steambuf, though.

>> posix/converter.cpp:
>> I noticed that you convert UTF8 to wchar_t.
>> For the posix backend, are there other multibyte
>> encodings that would need this? Or would
>> converting everything to UTF32 cause other
>> problems?
> - For most encodings (single byte) toupper/tolower
> does better job (faster simpler smaller)
> <snip>
> I can change it for MB encodings but not sure if
> it would have added value.

Ok. I basically know nothing about the domain.
Feel free to take any comments like this with
a grain of salt.

>> line 85: If this fails, don't you still need to call freelocale?
>> I think this would be simplified if you created a simple RAII
>> wrapper that holds a locale_t by value.
> Ok... I see missed this.
> I prefer to have it shared as every facet has its own
> copy (2 characters * 4-5 facets)
> I'll fix this.

Oh, I wasn't suggesting that you go away from

struct posix_locale : boost::noncopyable
     locale_t impl_;

shared_ptr<posix_locale> p(new posix_locale());

It's harder to get this wrong.

In Christ,
Steven Watanabe

Boost list run by bdawes at, gregod at, cpdaniel at, john at