Boost logo

Boost Users :

From: Allan Odgaard (gusixpl02_at_[hidden])
Date: 2008-08-25 15:58:10


It looks like the traits aspect of Xpressive is geared toward
characters, so I assume that Xpressive is not directly usable with
UTF-8 encoded text, am I correct?

It might work by having the character type be a 32 bit integer and
then use iterator adapters which expose the sequence as ucs-4 code
points (after all, the sequence is “encoded”), but that leads me to
the next question: diacritics.

For example something like é in decomposed unicode is two code points
(e followed by a combining ´ mark), so even when the sequence is
iterated as ucs-4 code points, a regexp of “.” will match just the e,
not the actual (rendered) character.

Since I was unable to find any discussion of this while searching for
Xpressive, I am curious to hear if any thoughts have gone into these
issues.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net