Boost logo

Boost :

Subject: Re: [boost] [UTF String] UTF String library 1.5 ready for perusal
From: Chad Nelson (chad.thecomfychair_at_[hidden])
Date: 2011-02-10 12:39:19

On Thu, 10 Feb 2011 15:18:27 +0100
Mathias Gaunard <mathias.gaunard_at_[hidden]> wrote:

> On 10/02/2011 14:40, Chad Nelson wrote:
>>>> This version is substantially better than the original. The design
>>>> has been somewhat simplified, removing extraneous features like
>>>> null-string emulation. Each of the classes now contain as many of
>>>> the std::string functions as I could efficiently add (essentially
>>>> all of them in utf32_t), including I/O stream functions
>>> Bad design, IMHO.
>> Not very constructive. *Why* do you think it's a bad design?
> It's generally agreed on that std::string is a bad design.
> See GotW #84 for example. That must be a good ten years old...

Maybe so, but irrelevant in this case. The goal was to make
transitioning from std::string to the UTF types as painless as
possible, for those who want to do it, and that means duplicating as
many of std::string's functions as can efficiently be done.

>>>> and also features code-point iterators.
>>> That code point iterator uses pointers and indexes instead of
>>> iterators, which means it cannot work as an arbitrary iterator
>>> adaptor even though it could with virtually no change, especially
>>> since it only requires a forward iterator.
>> Sorry, I don't understand the reasoning behind that assertion. Please
>> enlighten me.
> There is no need for any reasoning: look at the code of your code
> point iterator. It uses a pointer and indexes, and is therefore not a
> generic iterator adaptor.

It wasn't meant to be generic. It was meant to be exactly what it is:
an iterator specific to the UTF type where it's defined. For that
purpose, it's designed exactly as it should be, IMHO.

> Iterating through code points is fully generic and should work for
> any forward iterator or bidirectional iterator, not just a pointer.

I could make it fully generic, but it wouldn't be nearly as efficient
that way. I chose to do the extra work to make it efficient.

> Making your iterator random access when it obviously isn't is also a
> terrible idea. The special case thing has nothing to do there either,
> it should be a different iterator.

It could be a bidirectional iterator, as it has all of the abilities of
one. And it could be a random access iterator, as it has all but one of
the requirements for that (and in many cases has all of them). Given
that choice, I chose to make it a random access iterator.

> I'm not a fan of returning a reference in operator* as well.

No choice in that, I ran into at least one STL algorithm under GCC that
wouldn't compile if it wasn't a reference, even when it was only being
read. I don't remember which one, but it was something important and
commonly-used enough that breaking it was not an option.

> I also don't understand what mIndex and mEndIndex are for (nor why
> you compute the size in code points of the string before constructing
> iterators),

They give me a way to prevent my iterator code from walking off the
beginning or end of the underlying string. The only other way to do it
would be to store a pointer to the string object in every iterator, or
a pair of iterators or pointers to the underlying type, which I
considered worse. As an important side benefit, they also provide an
efficient way to calculate the difference in code points for operator-,
which I feel is important.

And the size in code-points is (supposed to be) stored at all times. If
it isn't then it's a leftover from an earlier iteration, which I'll be
happy to correct, but which should be harmless (because even if that's
the case, it's only calculated once per string, then stored and

> since you seem to check data is valid beforehand. And if data is
> invalid, you have lots of potential for unsafety in your iterators
> anyway (in _value for example).

All the UTF types were very carefully designed so that there's no
chance of invalid data in them, barring extraordinary measures to
deliberately corrupt it.

Anything else? :-)

Chad Nelson
Oak Circle Software, Inc.

Boost list run by bdawes at, gregod at, cpdaniel at, john at