|
Boost : |
From: Vladimir Prus (ghost_at_[hidden])
Date: 2004-04-07 04:10:07
Miro Jurisic wrote:
>> so the point is that when using string-as-code-point-container, even
>> searching and removing a character/substring might get invalid string?
>> E.g. even looking for string 'foo' you theoretically can find string
>> 'foo' followed by composing character, and removing just 'foo' will be
>> invalid?
>
> Yes, and this is true of all Unicode encodings. Essentially,
> transformations that select or remove portions of a string require you to
> be aware of character boundaries. Searching, substrings, and character
> removal are such transformations, whereas concatenation isn't, so if you
> have to strings in the same encoding, you can concatenate them without
> dealing with character boundaries, and that's about it.
Okay.
>> > basic_string is not the abstraction you are looking for, but it's also
>> > the only one that is readily available in STL/boost today. It may serve
>> > as a good starting point (questionable, IMNSHO), but it should most
>> > definitely not be treated as the right thing to use for Unicode in the
>> > long term.
>>
>> I wonder what's the right abstraction then? Is it necessary to have a
>> class to represent abstract character, with all composing characters?
>
> That's one way to go, yes; note that the moment you utter those words, you
> put yourself into the position of designing a Unicode API :-) which you
> said you don't want to do at this time.
You almost caugth me ;-) I've changed the message subject on purpose -- to
indicate that I'm not longer talking about program_options.
I'm interested how 'right' unicode string can be implemented, but I don't
think sure it's possible to design such a string now, so program_options
will still have to use much simpler approach.
- Volodya
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk