Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] Boost.Unicode (was Re: Boost.Locale)
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2010-12-16 07:15:08

Next message: Mateusz Loskot: "Re: [boost] [OT] Open Source Forking and Boost (was Re: [SQL-Connectivity] Is Boost interested in CppDB?)"
Previous message: Dean Michael Berris: "Re: [boost] [OT] Open Source Forking and Boost (was Re: [SQL-Connectivity] Is Boost interested in CppDB?)"
In reply to: Artyom: "Re: [boost] Boost.Unicode (was Re: Boost.Locale)"
Next in thread: Scott McMurray: "Re: [boost] Boost.Unicode (was Re: Boost.Locale)"
Reply: Scott McMurray: "Re: [boost] Boost.Unicode (was Re: Boost.Locale)"

On 16/12/2010 12:32, Artyom wrote:

> Then don't do case conversion!

I already parse the data that provides that information, I might as well
forward it to the user.

Unicode provides two levels of casing, one in its main character
mapping, and one in the SpecialCasing supplement.

> Do just case folding. For such "simple" and incorrect
> case conversion I don't need sophisticated Unicode library, I can use use
> standard
> operating system API and even std::locale::ctype very successfully
> (which I do in Boost.Locale if user prefers to use non-icu based backend)
>
> Case conversion is:
>
> - context dependent: Greek letter "Î£" is converted to "Ïƒ" or to "Ï‚", according
> to position in the word.
> - locale dependent: Turkish i goes to Ä°
> - not 1-to-1: German ÃŸ goes to SS in upper case.

Right, and the reason I'm not doing it right now is because I don't want
to look into the context thing before I take a look at more complex
things that I think are more immediately useful.

> I'm not sure about case-folding but AFAIK it is not 1-to-1 as well - but I may
> be wrong.

No it isn't.
It also needs special treatment of Turkish, but nothing context-dependent.

> For general collation that works "well" in most languages I can use strcmp... I
> don't
> need Unicode library for this.

Doesn't allow to search for a substring regardless of case, accentuation
or punctuation.
The thing that really interests me with collation is collation folding.

Next message: Mateusz Loskot: "Re: [boost] [OT] Open Source Forking and Boost (was Re: [SQL-Connectivity] Is Boost interested in CppDB?)"
Previous message: Dean Michael Berris: "Re: [boost] [OT] Open Source Forking and Boost (was Re: [SQL-Connectivity] Is Boost interested in CppDB?)"
In reply to: Artyom: "Re: [boost] Boost.Unicode (was Re: Boost.Locale)"
Next in thread: Scott McMurray: "Re: [boost] Boost.Unicode (was Re: Boost.Locale)"
Reply: Scott McMurray: "Re: [boost] Boost.Unicode (was Re: Boost.Locale)"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk