Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] [locale] Review results for Boost.Locale library
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2011-04-26 07:31:12

Next message: Mathias Gaunard: "Re: [boost] [locale] Review results for Boost.Locale library"
Previous message: Luke Camden: "Re: [boost] [date_time] Time zone improvements"
In reply to: Ryou Ezoe: "Re: [boost] [locale] Review results for Boost.Locale library"
Next in thread: Brent Spillner: "Re: [boost] [locale] Review results for Boost.Locale library"

On 24/04/2011 22:01, Ryou Ezoe wrote:

> Collation and Conversions:
> Japanese doesn't have concepts of case and accent.
> Since we don't have these concepts, we never need it.

I believe all CJK characters can be decomposed to radicals, which are
equivalent, so you could want to do normalization.

Also, converting between halfwidth and fullwidth katakana could have
some uses.

> Boundary analysis:
> What is the definition of boundary and how does it analyse?
> It sounds too smart for such a small things it actually does.

It uses the boundary analysis algorithms defined by the Unicode
standard, which doesn't use heuristics or anything like that.

Remember Boost.Locale is just a wrapper of ICU, which is the real smart
library.

> I'd rather call it strtok with hard-coded delimiters.
> Japanese doesn't separate each words by space.
> So unless we perform really complicated natural language
> processing(which is impossible to be perfect since we never have
> complete Japanese dictionary),
> we can't split Japanese text by words.
> Also, Japanese doesn't have a concept of word wrap.
> So "find appropriate places for line breaks" is unnecessary.
> Actually, there are some rules for line break in Japanese.

You can still break at punctuation marks, and there are places where you
should definitely not break.

Thai, Lao, Chinese and Japanese do require the use of dictionaries or
heuristics to correctly distinguish words. However, the default
algorithm provided by Unicode still provides a best effort
implementation without those things.

Next message: Mathias Gaunard: "Re: [boost] [locale] Review results for Boost.Locale library"
Previous message: Luke Camden: "Re: [boost] [date_time] Time zone improvements"
In reply to: Ryou Ezoe: "Re: [boost] [locale] Review results for Boost.Locale library"
Next in thread: Brent Spillner: "Re: [boost] [locale] Review results for Boost.Locale library"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk