|
Boost : |
Subject: Re: [boost] Boost.Unicode (was Re: Boost.Locale)
From: Artyom (artyomtnk_at_[hidden])
Date: 2010-12-16 06:32:21
> > 2. case conversion - is locale dependent - for example if the locale is
>Turkish
> > then upper("i")=="Ä°" while upper("i")="I" for other languages.
>
> Simple case conversions are the easy 1:1 language- and context-agnostic
>mappings.
>
> I can't do the more complex conversions because they depend on specific
>languages and contexts.
>
> Thankfully case folding is not language- nor context-dependent, and is
>probably what most
> people want rather than case conversion.
Then don't do case conversion!
Do just case folding. For such "simple" and incorrect
case conversion I don't need sophisticated Unicode library, I can use use
standard
operating system API and even std::locale::ctype very successfully
(which I do in Boost.Locale if user prefers to use non-icu based backend)
Case conversion is:
- context dependent: Greek letter "Σ" is converted to "Ï" or to "Ï", according
to position in the word.
- locale dependent: Turkish i goes to Ä°
- not 1-to-1: German à goes to SS in upper case.
So if you don't do this right, just don't do it.
I'm not sure about case-folding but AFAIK it is not 1-to-1 as well - but I may
be wrong.
> Yes, it definitely is; but you could still have a "general" collation that
>would work
> well enough for most languages.
For general collation that works "well" in most languages I can use strcmp... I
don't
need Unicode library for this.
Artyom
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk