Boost logo

Boost :

Subject: Re: [boost] Boost.Unicode (was Re: Boost.Locale)
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2010-12-16 07:15:08


On 16/12/2010 12:32, Artyom wrote:

> Then don't do case conversion!

I already parse the data that provides that information, I might as well
forward it to the user.

Unicode provides two levels of casing, one in its main character
mapping, and one in the SpecialCasing supplement.

> Do just case folding. For such "simple" and incorrect
> case conversion I don't need sophisticated Unicode library, I can use use
> standard
> operating system API and even std::locale::ctype very successfully
> (which I do in Boost.Locale if user prefers to use non-icu based backend)
>
> Case conversion is:
>
> - context dependent: Greek letter "Σ" is converted to "σ" or to "ς", according
> to position in the word.
> - locale dependent: Turkish i goes to Ä°
> - not 1-to-1: German ß goes to SS in upper case.

Right, and the reason I'm not doing it right now is because I don't want
to look into the context thing before I take a look at more complex
things that I think are more immediately useful.

> I'm not sure about case-folding but AFAIK it is not 1-to-1 as well - but I may
> be wrong.

No it isn't.
It also needs special treatment of Turkish, but nothing context-dependent.

> For general collation that works "well" in most languages I can use strcmp... I
> don't
> need Unicode library for this.

Doesn't allow to search for a substring regardless of case, accentuation
or punctuation.
The thing that really interests me with collation is collation folding.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk