Boost logo

Boost :

From: Peter Dimov (pdimov_at_[hidden])
Date: 2004-10-20 07:47:27


Erik Wien wrote:
> Peter Dimov wrote:
>> It appears that there are two schools of thought when it comes to
>> string design. One approach treats a string purely as a sequential
>> container of values. The other tries to represent "string values" as
>> a coherent whole. It doesn't help that in the simple case where the
>> value_type is char the two approaches result in mostly identical
>> semantics. My opinion is that the std::char_traits<> experiment failed
>> and
>> conclusively demonstrated that the "string as a value" approach is a
>> dead end, and that practical string libraries must treat a string as
>> a sequential container, vector<char>, vector<char16_t> and
>> vector<char32_t> in our case.
>>
>> The interpretation of that sequence of integers as a concrete string
>> value representation needs to be done by algorithms.
>
> That is kinda what my current implementation does, but the container
> is not directly accessible by the user. (Nor do I think it should be)
> Instead I wrap the vector of code points in a class and provide different
> types
> of iterators to iterate though the vector at different "character
> levels", instead of external algorithms.

That's what external algorithms take, iterators. I don't understand what you
mean by that.

> You can therefore access the string on a code unit level, but the casual
> user would not neccesarily know (or
> care) about that. Instead he would use the "string as a value"
> approach, using strings to represent a sentance, word, or some other
> language construct. When most people think of a string, they think of
> text, and not the
> underlying binary representation, and therefore that is, in my
> opinion, the notion a library should be designed around.

That may be so. But I don't see how the user can be isolated from the binary
representation if he needs to pick one of utf8_string, utf16_string,
ucs2_string, ucs4_string to store his strings. Perhaps I misunderstand your
idea. Can you post a sketch of your spec? How many string classes do you
have? What encoding do they use? What do begin(), end(), size() return? Are
the iterators random access? Bidirectional? Constant? How can the user
obtain the underlying element sequence to persist it somewhere or to pass it
to an external library?

> In my opinion a good unicode library should hide as much as possible of
> the complexity of the actual character representation from the user.

Hiding intrinsic complexity isn't necessarily a good idea. Sometimes users
need to accomplish a specific task and the abstraction layer, in its
attempts to "hide the complexity", just gets in the way. This should never
happen.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk