Boost logo

Boost :

Subject: Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2011-08-14 10:12:44


Soares Chen Ruo Fei wrote:

>> with non-Unicode CJK encodings
>> like Shift-JIS or GBK there is no
>> way to go backward

> Ahh I see so that's quite nasty, but actually it still can be done
> with the sacrifice on efficiency. Basically since the iterator already
> has the begin and end boundary iterators it can simply reiterate all
> over from the beginning of the string. Although doing so is roughly
> O(N^2) it shouldn't make significant impact as developers rarely use
> this multi-byte encoding and even seldom use the reverse decoding
> function.

As a general point, I believe it's a bad idea to hide a surprise like
O(N^2) instead of O(N) complexity in a "rare" case. Doing so means
that users will implement something that seems to work, and then get
bitten later when it doesn't work in the field. (For example, the
first time that a customer in Japan tries to process a 1 MB file and it
takes a million times longer than expected.)

It would be better to not provide the inefficient case at all. Compare
with how std::list doesn't provide random access, even though it could
do so in O(N). Looking at your character set iterator, it seems to me
that you could have a forward-only iterator and a bidirectional
iterator for UTF, but only the former for these other encodings. Not
storing the begin iterator when only forward iteration is needed also
saves space.

Regards, Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk