Boost logo

Boost :

Subject: Re: [boost] [string] proposal
From: Matus Chochlik (chochlik_at_[hidden])
Date: 2011-01-21 09:25:43


On Fri, Jan 21, 2011 at 2:39 PM, Ivan Le Lann <ivan.lelann_at_[hidden]> wrote:
>
> ----- "Matus Chochlik" <chochlik_at_[hidden]> wrote :
>
>> Also I've uploaded into the vault file string_proposal.zip
>
> Not that I have anything against UTF-8, but is there no way
> to build this utf8 string class out of an encoding generic one ?
>
> Keeping your "utils" vocabulary, I see something like :
>
>
> template <typename encoding_utils> class encoded_string;
> typedef encoded_string <utf8_utils> string; // and/or utf8_string
>
>
> In your code, that would mean to at least move various typedefs
> into encoding_utils, such as code point and code unit type.
> And to rename various methods, of course.

You will get no argument from me, here. I understand the people's
need to handle text in various other encodings (I have to do it myself
in several apps that are talking via serial ports, to old hardware using
for example IBM CP ###). But there are *lots* of applications that
do not need to do such things and I don't see why should the general
and everyday text handling be burdened by things like explicit
transcoding, why there should be dual interfaces in OS APIs
and standard libraries, like fstream vs. wfstream, etc.

If someone needs for his/her application do have O(1) indexed
access to individual code-points. (S)he can easily convert the
text to string32_t do whatever is needed and then convert the text
info a format the rest of the world is using. I think even in such
cases having a single common encoding is much better than
to have support conversion from ANSI/IBM/KOI8/etc. to UTF-32.

BR,

Matus.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk