Boost logo

Boost :

From: Peter Dimov (pdimov_at_[hidden])
Date: 2004-10-20 15:14:49


Eric Niebler wrote:
> Erik Wien wrote:
>> The iterators used are bidirectional, not random access (impossible
>> on UTF-8 and UTF-16)
>
>
> No. Andrei Alexandrescu explained a scheme to me whereby a UTF-16
> encoded string can have a random-access iterator, and I think it
> should. The basic idea is you keep a plain array of 16-bit integers
> which are the 16-bit characters and the first 16 bits of surrogate
> pairs. Then you have a data structure which maps from string offsets
> to the second 16 bits of surrogate pairs. Random access involves a
> simple index and a map look-up. Sequential access requires no map
> look-up. And since surrogate pairs are very rare, the map will almost
> always be empty and the look-up is skipped.

Nice! But this seems to make c_str O(N) operation. If I need to speak to a
library in the common extern "C" language of interoperability, and that
library happens to need UTF-16 encoded wchar_t const [], which by
coincidence has the same representation as char16_t const [], I won't be
very happy if The C++ string seems to ignore this common scenario.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk