6 Jul
2006
6 Jul
'06
6:33 a.m.
> This is very close to what I have in mind. The main difference is that > the functions/algorithms in my mind take ranges instead of iterators. > Thus: > > to_lower(src, dest) > to_upper(src, dest) So long as you don't require ranges (or a pair of iterators makes a valid range and dest can still be an output iterator). That's fine - these should work on char* as well as container types. I don't know what kind of ranges you have for dest which allow dest to change size - seems a bit problematic. I want iterators that can handle the encoding transform. I want to be able to write items like the following: std::string s = get_some_utf_8_xml_data(); // Find the BOM character as a UTF-32 character utf_iterator_t i = std::find(utf_iterator_t(s.begin()), utf_iterator_t (s.end()), UL0x0000FEFF); assert(*i.base() == U0xEF); // base iterator points to start of UTF-8 character Sean > With these, I could make Fusion like wrappers that transform them into > something like: > > some_string s1 = to_lower(src); > some_string s2 = to_upper(src); > > where to_lower and to_upper return cheap views that are in and by > themselves valid strings/ranges. They are cheap because the actual > conversions/transformations are done on demand-- think lazy > evaluation. > So, like those done by expression template techniques, there are > no expensive temporaries when you perform seemingly expensive tasks > like: > > some_string s = f1(f2(f3(f4(src)))); > > And yes, because they are generic, those string algorithms can work > on any string type that satisfy some basic requirements. > > Regards, > > -- > Joel de Guzman > http://www.boost-consulting.com > http://spirit.sf.net
6 Jul
6 Jul
7:05 a.m.
New subject: Comment on string / unicode discussion
Sean Parent wrote:
>> This is very close to what I have in mind. The main difference is that
>> the functions/algorithms in my mind take ranges instead of iterators.
>> Thus:
>>
>> to_lower(src, dest)
>> to_upper(src, dest)
> So long as you don't require ranges (or a pair of iterators makes a
> valid range and dest can still be an output iterator). That's fine -
> these should work on char* as well as container types. I don't know
> what kind of ranges you have for dest which allow dest to change size
> - seems a bit problematic.
Yeah. A bit problematic. This is not a problem with the pure functional
approach where you return a lazily evaluated view:
to_lower(src) // returns a view
> I want iterators that can handle the encoding transform. I want to be
> able to write items like the following:
>
> std::string s = get_some_utf_8_xml_data();
>
> // Find the BOM character as a UTF-32 character
>
> utf_iterator_t i = std::find(utf_iterator_t(s.begin()), utf_iterator_t
> (s.end()), UL0x0000FEFF);
>
> assert(*i.base() == U0xEF); // base iterator points to start of UTF-8
> character
Contrast that with:
std::string s = get_some_utf_8_xml_data();
utf_range r = boost::find(utf_range(s), UL0x0000FEFF);
assert(*r.begin().base() == U0xEF);
Regards,
--
Joel de Guzman
http://www.boost-consulting.com
http://spirit.sf.net
7:43 a.m.
Sean Parent wrote: >> This is very close to what I have in mind. The main difference is that >> the functions/algorithms in my mind take ranges instead of iterators. >> Thus: >> >> to_lower(src, dest) >> to_upper(src, dest) > So long as you don't require ranges (or a pair of iterators makes a > valid range and dest can still be an output iterator). That's fine - > these should work on char* as well as container types. I don't know > what kind of ranges you have for dest which allow dest to change size > - seems a bit problematic. > > I want iterators that can handle the encoding transform. I want to be > able to write items like the following: > > std::string s = get_some_utf_8_xml_data(); > > // Find the BOM character as a UTF-32 character > > utf_iterator_t i = std::find(utf_iterator_t(s.begin()), utf_iterator_t > (s.end()), UL0x0000FEFF); > > assert(*i.base() == U0xEF); // base iterator points to start of UTF-8 > character Boost has (unofficially?) such iterators. Look into <boost/regex/pending/unicode_iterator.hpp> -- Shunsuke Sogame
7058
Age (days ago)
7058
Last active (days ago)
2 comments
3 participants
participants (3)
-
Joel de Guzman -
Sean Parent -
Shunsuke Sogame