Boost logo

Boost :

From: Peter Dimov (pdimov_at_[hidden])
Date: 2004-10-20 07:28:10


Vladimir Prus wrote:
> Second question is if operator==, operator< or 'find' should operate
> on vector<char_XX> or on abstract characters, using Unicode rules, or
> there should be two versions. I don't really understand why
> 'unicode-unaware' semantic is ever needed, so we should have only
> 'unicode-aware' one.

Look at 21.3/2: "The class template basic_string conforms to the
requirements of a Sequence, as specified in (23.1.1).

Additionally, because the iterators supported by basic_string are random
access iterators (24.1.5), basic_string conforms to the the requirements of
a Reversible Container, as specified in (23.1)."

Now look at Table 65, Container requirements, operator==:

"== is an equivalence relation.

a.size()==b.size() && equal(a.begin(), a.end(), b.begin())"

The question is now, what do begin(), end() and size() return for our
hypothetical string16?

I maintain that the library design is much cleaner if begin(), end() and
size() are random access iterators over the underlying _storage_, not over
the codepoint representation or abstract character representation.

Codepoint iterators and abstract character iterators would still be
provided, but they would be constant bidirectional with char32_t as the
value_type.

Codepoint and abstract character operations would be provided by algorithms,
taking an iterator range.

The user should remember and honor the encoding (UTF-16, UCS-2, other) of a
particular container of char16_t, not the container itself.

This is straightforward STL-style container-iterator-algorithm
orthogonalization.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk