|
Boost Users : |
From: Allan Odgaard (gusixpl02_at_[hidden])
Date: 2008-08-25 15:58:10
It looks like the traits aspect of Xpressive is geared toward
characters, so I assume that Xpressive is not directly usable with
UTF-8 encoded text, am I correct?
It might work by having the character type be a 32 bit integer and
then use iterator adapters which expose the sequence as ucs-4 code
points (after all, the sequence is encoded), but that leads me to
the next question: diacritics.
For example something like é in decomposed unicode is two code points
(e followed by a combining ´ mark), so even when the sequence is
iterated as ucs-4 code points, a regexp of . will match just the e,
not the actual (rendered) character.
Since I was unable to find any discussion of this while searching for
Xpressive, I am curious to hear if any thoughts have gone into these
issues.
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net