|
Boost : |
From: Erik Wien (wien_at_[hidden])
Date: 2004-10-20 12:41:52
"Miro Jurisic" <macdev_at_[hidden]> wrote in message news:macdev-
>> My plan was to decompose all characters in unicode::string. This makes
>> manipulation of diacritics easier. Correct me if I'm wrong, but your
>> example of finding "ü" in a string would come down to finding the
>> codepoint sequence "U+0075 U+0308" and checking whether it is not
>> followed by another combining character, pretty trivial still.
>
> You have to not only decompose them but put them in a canonical decomposed
> order
> in order for that to work.
You could also do a Canonical Composition after the decompsition.
(Normalization form C) Either way this is not something you would like to do
on every assigment of a string, but rather when it is needed. (i.e. on
comparison.)
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk