Boost :

Date view	Thread view	Subject view	Author view

From: Eric Niebler (eric_at_[hidden])
Date: 2004-10-20 15:37:19

Next message: Peter Dimov: "Re: [boost] Re: Re: Re: Any interest in adding unicode support toboost?"
Previous message: Rob Stewart: "Re: [boost] Re: Any interest in adding unicode support to boost?"
In reply to: Peter Dimov: "Re: [boost] Re: Any interest in adding unicode support to boost?"
Next in thread: Rogier van Dalen: "Re: [boost] Re: Any interest in adding unicode support to boost?"

Peter Dimov wrote:
> Eric Niebler wrote:
>
>> Erik Wien wrote:
>>
>>> The iterators used are bidirectional, not random access (impossible
>>> on UTF-8 and UTF-16)
>>
>>
>>
>> No. Andrei Alexandrescu explained a scheme to me whereby a UTF-16
>> encoded string can have a random-access iterator, and I think it
>> should. The basic idea is you keep a plain array of 16-bit integers
>> which are the 16-bit characters and the first 16 bits of surrogate
>> pairs. Then you have a data structure which maps from string offsets
>> to the second 16 bits of surrogate pairs. Random access involves a
>> simple index and a map look-up. Sequential access requires no map
>> look-up. And since surrogate pairs are very rare, the map will almost
>> always be empty and the look-up is skipped.
>
>
> Nice! But this seems to make c_str O(N) operation. If I need to speak to
> a library in the common extern "C" language of interoperability, and
> that library happens to need UTF-16 encoded wchar_t const [], which by
> coincidence has the same representation as char16_t const [], I won't be
> very happy if The C++ string seems to ignore this common scenario.

Two points. First, keep in mind that surrogates are exceedingly rare.
The common case is that there are no surrogates, and c_str is O(1).
Second, in the rare case where there are surrogates, there can be a
mutable cache that c_str can return, building it on demand only when the
cache is dirty.

IMO the advantages of having a random access iterator are worth the
trouble, especially considering how rare surrogates are.

Oh, and I agree that it should be a const iterator. :-)

-- 
Eric Niebler
Boost Consulting
www.boost-consulting.com

Next message: Peter Dimov: "Re: [boost] Re: Re: Re: Any interest in adding unicode support toboost?"
Previous message: Rob Stewart: "Re: [boost] Re: Any interest in adding unicode support to boost?"
In reply to: Peter Dimov: "Re: [boost] Re: Any interest in adding unicode support to boost?"
Next in thread: Rogier van Dalen: "Re: [boost] Re: Any interest in adding unicode support to boost?"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk