|
Boost : |
Subject: Re: [boost] [string] proposal
From: Dean Michael Berris (mikhailberis_at_[hidden])
Date: 2011-01-21 13:05:12
On Sat, Jan 22, 2011 at 1:51 AM, Dave Abrahams <dave_at_[hidden]> wrote:
> At Sat, 22 Jan 2011 01:14:38 +0800,
> Dean Michael Berris wrote:
>>
>> >> 4. Looks like a real STL container except the iterator type is smarter
>> >> than your average iterator.
>> >>
>> >> Encoding is a matter of external interpretation and I think should not
>> >> be part of a string's interface. You can have wrappers that interpret
>> >> a string as a UTF-* string.
>> >
>> > What does it iterate over? Â chars? Â code points? Â characters?
>> > Something else?
>>
>> I can see basically a way of saying what you want when you want to get
>> an iterator from it -- by default though a call to '.begin()' will
>> return an iterator characters (just so you don't break compatibility
>> with std::string).
>
> Then you mean an iterator over chars, not characters.
>
Yeah, over chars. :)
>> The iterator can store a reference to the original string and when
>> advanced, can do the appropriate interpretation of the string in
>> context. If you wanted a code point iterator, you'd get the code point
>> iterator. If you wanted a character based on a certain encoding then
>> you can have a special iterator for that. An iterator would also know
>> whether it was out of bounds.
>>
>> This allows people to write code that dealt with code points,
>> characters (based on the encoding), and raw data if absolutely
>> necessary.
>
> Hmm, I'm just not sure whether these are useful. Â The iterators to be
> supplied (if any) should IMO be dictated by the needs of real
> algorithms.
>
I thought about it a little more too, and there should be a way of
just crafting the appropriate iterator from the outside -- much like
how the current Iterators library allows you to create different kinds
of iterators.
Algorithms that deal with text, like rendering characters for example
in a GUI, would basically need to iterate over code points or glyphs.
Typesetting algorithms would pretty much need the same kind of
traversal. Also things like instance counting (building a histogram
based on character counts) for example for compression and all the
cool things like that would need to have access to individual
"elements" of a given text -- in the pre-Unicode days this was just a
simple table of 255 characters, unfortunately it's gotten a lot more
complex than that ;).
-- Dean Michael Berris about.me/deanberris
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk