|
Boost : |
From: Rogier van Dalen (rogiervd_at_[hidden])
Date: 2004-10-20 08:09:46
> > So unicode::string<unicode::codepoint_string<std::string> > would be a
> > UTF8-encoded string that is manipulated using its characters.
>
> Encoded characters or abstract characters? (See section 2.4 of Unicode standard
> for definitions)
I mean a base character with its combining characters. I don't think
this is the same as "abstract character", is it?
My plan was to decompose all characters in unicode::string. This makes
manipulation of diacritics easier. Correct me if I'm wrong, but your
example of finding "ü" in a string would come down to finding the
codepoint sequence "U+0075 U+0308" and checking whether it is not
followed by another combining character, pretty trivial still.
Regards,
Rogier
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk