Boost logo

Boost :

Subject: Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2011-08-16 18:43:02


Soares Chen Ruo Fei wrote:
> On Tue, Aug 16, 2011, Phil Endecott wrote:
>> I'm not familiar with the algorithms requiring bidirectional access that
>> Artyom mentions, but a standard way to make them work with iterators for
>> various different encodings would be to specialise the algorithms. ?You
>> would have a main implementation that requires the bidirectional (or random
>> access) iterator, and a forwarding implementation that looks like this:
>>
>> template <typename FORWARD_ITER>
>> void algorithm(FORWARD_ITER begin, FORWARD_ITER end)
>> {
>> ?// Make a copy of the range into a bidirectional container:
>> ?std::vector< typename FORWARD_ITER::value_type > v(begin,end);
>> ?// Call the other specialisation:
>> ?algorithm(v.begin(),v.end());
>> }
>>
>> That is the standard time-vs-space complexity trade-off.
>
> Well I don't think forcing all generic Unicode algorithms to provide
> specialization version for forward-only iterators is any better than
> providing a less-efficient bidirectional iterator. Such a burden is
> too high for the algorithm developers. Or perhaps a better decision is
> to simply let the compiler yield a (friendly?) error when the generic
> algorithm uses the decrement/random access operator, and find a way to
> inform the user to convert the string to standard UTF strings before
> passing to the Unicode algorithms.

The "less-efficient" O(N^2) bidirectional iterator is completely
unreasonable. Algorithms are not being "forced" to do anything.

Have a look at how the standard library does things.
std::lower_bound() and std::rotate(), for example, have specialisations
that select different algorithms depending on the type of iterator that
is supplied; on the other hand, std::random_shuffle() only takes random
access iterators and it would be the user's responsibility to choose
what to do if they had some other kind of range.

> Or perhaps I could find a way to let template instances of
> unicode_string_adapter with MBCS encoding to store convert the string
> to UTF string during construction and store the UTF encoded string
> instead. The only problem for this is that during conversion back to
> the raw string, the string adapter would have to reconvert the
> internally stored UTF-encoded string back to the MBCS-encoded string.
> This can be expensive if the user regularly wants access the raw
> string, unless we store two smart pointers within the string adapter -
> one for the MBCS string and one for the converted UTF string, but
> doing so would waste storage space as well.

No, don't do that. Just provide the iterators that can be provided efficiently.

Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk