From: Erik Wien (wien_at_[hidden])
Date: 2005-04-04 06:31:04
Sundell Software wrote:
> Each UTF-8/16/32 has its own iterator type, but all output UTF-32 when
> accessed. Look at std::istream_iterator/std::ostream_iterator for
> design. There would propably be helper functions for the most common
> tasks and i think you should be able to do all the nessesary tasks
> with just iterators.
Yep. That is basically how the current implementation works. It's all
(bi-directional) iterators. A unicode string is by nature a
bi-directional sequence, so your basically forced to work that way.
> typedef basic_string<utf_8> ustring8;
> typedef basic_string<utf_16> ustring16;
> ustring8 u8;
> ustring16 u16;
> // Would propably make .begin() default.
> unicode_iterator i8(u8, u8.begin());
> // This would be a slow way of doing operator. the assignment would
> // insert/remove elements from the basic_string if nessesary.
> *std::advance(unicode_iterator(u16, u16.begin()), 5) = *(i8++);
> Note that the client is responible for giving a valid iterator to
An implementation like this is already in place, but not locked to
basic_string. A mutable code_point_iterator (unicode_iterator in your
code) can be created from any random access sequence. You won't be
getting random access to the unicode sequence though, like I mentioned
> BTW, is using UTF-8/16 in the container really overall cheaper than
> UTF-32. Since if the client changes a character, and it happens to be
> larger/smaller then all the elements behind it would need to be moved.
> Does that happen rarely enough? Though the client should propably know
> that themselves.
UTF-8, no. That is for people who require small size above all. But
UTF-16 usually is, unless you are using some obscure language that is
not within the BMP (Basic Multilingual Plane).
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk