|
Boost : |
Subject: Re: [boost] [locale] review part 2.1: source
From: Steven Watanabe (watanabesj_at_[hidden])
Date: 2011-04-10 13:14:38
AMDG
On 04/10/2011 02:12 AM, Artyom wrote:
>> encoding/iconv_codepage.hpp
>>
>> line 168: What are the possible errno values
>> besides EILSEQ, EINVAL, and E2BIG? If
>> you get an error you don't expect, is
>> silently stopping the correct behavior?
>>
>
> According to opengroup's specifications, Linux
> man pages and others these are the error states that
> can occur.
>
> Maybe I should throw conversion_error if the method is stop().
>
That seems reasonable. This situation
shouldn't ever happen, but it's probably
a good idea to detect it.
>> iconv_to_utf: The comments from iconv_from_utf apply.
>> Also, the implementation is almost the same as
>> iconv_from_utf. I think you could define a single
>> function template to implement both.
>>
>
> Not exactly. I had found that it is much
> safer and readable to have two implementations
> then to create quite complex meta programming
> to make sure it works.
>
That's sometimes true. I know around here
we tend to err on the side of too much
metaprogramming, but in this case I don't
think you even need much.
> It is much easier to assume that at least
> one side is "char"
>
Really? In this case the code looks
almost identical.
>> lines 86,101: The STL is your friend:
>> size_t remove_substitutions(std::vector<wchar_t> &v)
>>
>> {
>> v.erase(std::remove(v.begin(), v.end(), wchar_t(0xFFFD)), v.end());
>> return v.size();
>> }
>>
>
> After reading it carefully I can see how it works but
> generally I see how it works but it isn't so readable.
>
> In any case interesting and it may save some allocations.
>
Well, remove/erase is idiomatic, so it should be
clearer to those familiar the STL.
>> icu/numeric.cpp:
>>
>> In num_format: You don't handle bool?
>>
>
> ICU does not handle it... So I can't do something
> better for this - I pass it to std::locale::classic
> formatting.
>
Is there any way that you could use the
message translation mechanism for "true"
and "false" if std::boolalpha is set?
Otherwise it should be formatted the same
as an integer.
>> line 290: Alas, the standard doesn't guarantee that
>> you can put_back an arbitrary number of characters.
>> This isn't going to work if you're reading from a
>> non-seekable stream and the line is sufficiently long.
>>
>
> The problem is that it is the best I can do from-inside
> std::locale's num_fmt facet.
>
> It has many major limitations. For exaple if you would
> try to parse 12345YX as
>
> std::cin>> number>> character
>
> then character would be X.
>
> This is specially critical when I parse spellout format
> to be able to putback correct part.
>
> Maybe I should try to see what is the size of the
> buffer and/or document that the stream
> should be able to handle putback well.
>
> It is actually fairly simple as streambuf has a buffer
> so it can actually receive putback characters without problem.
>
For a limited number of characters.
If you're reading from a file or from
a stringstream, you'll be okay because
pbackfail can seek to an earlier position
in the stream. If you're reading from
a terminal, it should be okay because it's
usually line buffered, and the lines aren't
usually very long. I could see running
into a problem if you're reading from
a pipe, however.
> Unfortunately it is the best solution I could
> reach so far with the limitation of std::num_get
>
Yeah. I can't think of anything better
either. This is really more of a limitation
of std::steambuf, though.
>> posix/converter.cpp:
>>
>> I noticed that you convert UTF8 to wchar_t.
>> For the posix backend, are there other multibyte
>> encodings that would need this? Or would
>> converting everything to UTF32 cause other
>> problems?
>>
>
> - For most encodings (single byte) toupper/tolower
> does better job (faster simpler smaller)
>
> <snip>
>
> I can change it for MB encodings but not sure if
> it would have added value.
>
Ok. I basically know nothing about the domain.
Feel free to take any comments like this with
a grain of salt.
>> line 85: If this fails, don't you still need to call freelocale?
>> I think this would be simplified if you created a simple RAII
>> wrapper that holds a locale_t by value.
>>
>
> Ok... I see missed this.
>
> I prefer to have it shared as every facet has its own
> copy (2 characters * 4-5 facets)
>
> I'll fix this.
Oh, I wasn't suggesting that you go away from
shared_ptr:
struct posix_locale : boost::noncopyable
{
posix_locale();
~posix_locale();
locale_t impl_;
};
shared_ptr<posix_locale> p(new posix_locale());
It's harder to get this wrong.
In Christ,
Steven Watanabe
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk