Boost logo

Boost :

Subject: Re: [boost] [locale] Review results for Boost.Locale library
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-04-24 17:04:41


> From: Ryou Ezoe <boostcpp_at_[hidden]> > Number and Date formatting: > There are so many possible ways to express numbers. > Some people want comma separation by 3 digits, other want 4 digits. > Some want to be 100万(万 means 10000). some want 百万(百 means 100)。 > Formatting based on locale doesn't work because there is no uniform format. > Have you actually read the manuals? This is the output of : std::cout << bl::format("{1}\n{1,num}\n{1,spell}\n") % 1000000 ; in ja_JP.UTF-8 locale 1000000 1,000,000 百万 Not so bad, isn't it? > Collation and Conversions: > Japanese doesn't have concepts of case and accent. > Since we don't have these concepts, we never need it. > Irrelevant, even when this feature not required for CJK it is required like many other things (spaces, plural forms for other languages) > Boundary analysis: > What is the definition of boundary and how does it analyse? > It sounds too smart for such a small things it actually does. > I'd rather call it strtok with hard-coded delimiters. > Japanese doesn't separate each words by space. > So unless we perform really complicated natural language > processing(which is impossible to be perfect since we never have > complete Japanese dictionary), > we can't split Japanese text by words. Ok this is word splitting |私|は|日本|の|東京都|に|住|んでいます|。|私|は|大|きな|家|に|住|んでいます|。 of the text: 私は日本の東京都に住んでいます。私は大きな家に住んでいます。 I assume it is not perfect and I don't know Japanese to say but I can see at lease that words like: 私 - I 日本 - Japan 東京都 - City of Tokyo But this is not only defined by "space-based" separation. Also for some languages like Thai ICU uses dictionaries. So it is not naive algorithm that separates text by spaces. > Also, Japanese doesn't have a concept of word wrap. > So "find appropriate places for line breaks" is unnecessary. > Actually, there are some rules for line break in Japanese. > These rules are too complicated and it requires more than text processing. > Same for Chinese and Korean. This is possible line-break separation of the same sentences above. |私|は|日|本|の|東|京|都|に|住|ん|で|い|ま|す。|私|は|大|き|な|家|に|住|ん|で|い|ま|す。| At least I can see that it does not allows to start a line with "。" . > > Of course, strtok is still a handy tool and I appreciate yet another design. > But I think it's better be handled by more generic library, like Boost > String Algorithms. > It far more complicated then strtok. Bottom line I see that you hadn't really try to use this library or understand how it works. I'm sorry but it makes me doubt about the review you had sent. Artyom


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk