From: Miro Jurisic (macdev_at_[hidden])
Date: 2004-10-20 11:20:22
In article <e094f9eb04102006096b92c870_at_[hidden]>,
Rogier van Dalen <rogiervd_at_[hidden]> wrote:
> > > So unicode::string<unicode::codepoint string<std::string> > would be a
> > > UTF8-encoded string that is manipulated using its characters.
> > Encoded characters or abstract characters? (See section 2.4 of Unicode
> > standard
> > for definitions)
> I mean a base character with its combining characters. I don't think
> this is the same as "abstract character", is it?
That is an abstract character, yes.
> My plan was to decompose all characters in unicode::string. This makes
> manipulation of diacritics easier. Correct me if I'm wrong, but your
> example of finding "ü" in a string would come down to finding the
> codepoint sequence "U+0075 U+0308" and checking whether it is not
> followed by another combining character, pretty trivial still.
You have to not only decompose them but put them in a canonical decomposed order
in order for that to work.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk