Boost logo

Boost :

Subject: Re: [boost] [locale] Review results for Boost.Locale library
From: Ryou Ezoe (boostcpp_at_[hidden])
Date: 2011-04-25 00:13:23


On Mon, Apr 25, 2011 at 6:04 AM, Artyom <artyomtnk_at_[hidden]> wrote:
>> From: Ryou Ezoe <boostcpp_at_[hidden]>
>
>> Number and Date  formatting:
>> There are so many possible ways to express numbers.
>> Some  people want comma separation by 3 digits, other want 4 digits.
>> Some want      to be 100万(万 means 10000). some want 百万(百 means 100)。
>> Formatting  based on locale doesn't work because there is no uniform  format.
>>
>
> Have you actually read the manuals?
>
> This is the output of :
>
>   std::cout << bl::format("{1}\n{1,num}\n{1,spell}\n") % 1000000 ;
>
> in ja_JP.UTF-8 locale
>
>   1000000
>   1,000,000
>   百万
>
> Not so bad, isn't it?
Not bad.
Still I doubt anybody want to use Boost.locale just for that.

>
>
>> Collation and Conversions:
>> Japanese doesn't have concepts of  case and accent.
>> Since we don't have these concepts, we never need  it.
>>
>
> Irrelevant, even when this feature not required
> for CJK it is required like many other things (spaces,
> plural forms for other languages)
>
>> Boundary analysis:
>> What is the definition of boundary and how does  it analyse?
>> It sounds too smart for such a small things it actually  does.
>> I'd rather call it strtok with hard-coded delimiters.
>> Japanese  doesn't separate each words by space.
>> So unless we perform really complicated  natural language
>> processing(which is impossible to be perfect since we never  have
>> complete Japanese dictionary),
>> we can't split Japanese text by  words.
>
> Ok this is word splitting
>
>   |私|は|日本|の|東京都|に|住|んでいます|。|私|は|大|きな|家|に|住|んでいます|。
>
> of the text:
>
>   私は日本の東京都に住んでいます。私は大きな家に住んでいます。

To me, it looks like splitting by contiguous kanas and kanzis.
I don't think I ever need that kind of splitting.

>
> I assume it is not perfect and I don't know Japanese to
> say but I can see at lease that words like:
>
>  私 - I
>  日本 - Japan
>  東京都 - City of Tokyo
>
> But this is not only defined by "space-based" separation.
> Also for some languages like Thai ICU uses dictionaries.
>
> So it is not naive algorithm that separates text by
> spaces.
>
>> Also, Japanese doesn't have a concept of word wrap.
>> So "find  appropriate places for line breaks" is unnecessary.
>> Actually, there are some  rules for line break in Japanese.
>> These rules are too complicated and it  requires more than text processing.
>> Same for Chinese and Korean.
>
> This is possible line-break separation of the same sentences above.
>
>
> |私|は|日|本|の|東|京|都|に|住|ん|で|い|ま|す。|私|は|大|き|な|家|に|住|ん|で|い|ま|す。|
>
> At least I can see that it does not allows to start a line with "。" .
We have a lot of characters that should not be the initial character of a line.
But there is no uniform rule.
And it must be work along with font rendering.
Simple text processing doesn't suffice.

>
>
>>
>> Of  course, strtok is still a handy tool and I appreciate yet another design.
>> But  I think it's better be handled by more generic library, like Boost
>> String  Algorithms.
>>
>
> It far more complicated then strtok.
>
> Bottom line I see that you hadn't really try
> to use this library or understand how it
> works.
>
> I'm sorry but it makes me doubt about the review
> you had sent.
>
> Artyom
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-- 
Ryou Ezoe

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk