Boost logo

Boost Users :

Subject: Re: [Boost-users] Boost Regex: Use boost::uint32_t as charactertype.
From: Etienne Philip Pretorius (icewolfhunter_at_[hidden])
Date: 2009-07-14 06:23:19


John Maddock wrote:
>>>> I just like to know, if you can use a std::vector<boost::uint32_t> as a
>>>> source to match regular expressions against it.
>>>
>>> Yes but... not right out of the box, you would need to provide a traits
>>> class so that regex_traits<uint32_t> knows how to interpret unint32_t's
>>> as characters.
>>>
>>> What precisely did you want to do?
>>>
>>
>> Convert UTF-8/UTF-16 to unint32_t then use Regular Expressions as a
>> means to parse xml.
>
> If you don't mind depending upon ICU then the regex ICU wrappers will do
> that for you, *and* let you operate directly on the UTF-8 byte stream as
> well:
> http://www.boost.org/doc/libs/1_39_0/libs/regex/doc/html/boost_regex/ref/non_std_strings/icu.html.
>
>
> However, ICU is a big library to depend upon :-(
>

Agreed. ICU is big.

> A more lightweight alternative if you don't need true Unicode character
> classification and case-conversion, would be to implement a lightweight
> traits class for basic_regex that either "does nothing" or forwards to
> the same methods in regex_traits<char> etc, see:
> http://www.boost.org/doc/libs/1_39_0/libs/regex/doc/html/boost_regex/ref/concepts/traits_concept.html.
> This is obviously more work, but reduces the code footprint, your call :-)

Excellent. Just what I need. Thank you John.

>
> HTH, John.
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net