Boost Users :
From: John Maddock (john_at_[hidden])
Date: 2005-07-20 05:05:26
> I also saw a post http://lists.boost.org/boost-users/2003/09/5095.php
> where John answered that it
> is better to convert these character sequences on-the-fly to char. Somehow
> I don't like this
> approach, since I believe that with wrong encoding set on the system some
> information might get
> Is it possible to use XMLCh as character traits in the regular expression
> if XMLCh* points to a
> null-terminated 2 bytes character sequence?
There are several options:
1) Convert the characters on the fly to *wchar_t* and use boost::wregex,
it's a trivial widening of your 16-bit characters, so nothing will get lost.
You could probably use transform_iterator for such a task.
2) In Boost 1.33 there will be more [optional] support for Unicode, but it
requires that you use the ICU library
(http://www.ibm.com/software/globalization/icu/) to provide some of the
basics. You can then correctly scan 16-bit Unicode code sequences, and have
surrogate pairs correctly handled, as well as have access to the Unicode
property names in regexes etc. However the character type for 16-bit code
points is either unsigned short or wchar_t depending upon the platform (this
is a requirement for interoperablity with ICU), so you may have to fiddle
with your XMLCh setup to get everything working smoothly. See
3) You could define your own regex traits class for the character type that
you're using: if you go down this road then make sure that you start with
Boost-1.33 as it has better docs in this area, as well as redesigned traits
class requirements compared to 1.32.
Hope this helps,
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net