Boost logo

Boost :

Subject: Re: [boost] [general] What will string handling in C++ look like in the future [was Always treat ... ]
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2011-01-19 11:51:30


On 19/01/2011 11:33, Matus Chochlik wrote:
> The string-encoding-related discussion boils down
> for me to the following: What fill the string handling
> in C++ look like in the (maybe not immediate) future.
>
> *Scenario A:*
>
> We will pick a widely-accepted char-based encoding
> that is able to handle all the writing scripts and alphabets
> that we can think of, has enough reserved space for
> future additions or is easily extensible and use that
> with std::strings which will become the one and only
> text string 'container' class.
>
> All the wstrings, wxString, Qstrings, utf8strings, etc. will
> be abandoned. All the APIs using ANSI or UCS-2 will
> be slowly phased out with the help of convenience
> classes like ansi_str_t and ucs2_t that will be made
> obsolete and finally dropped (after the transition).
>
> *Scenario B:*
>
> We will add yet another string class named utf8_t to the
> already crowded set named above. Then:
>
> library a: will stick to the ANSI encodings with std::strings
> It has worked in the past it will work in the future, right ?
>
> library b[oost]: will use utf8_t instead and provide the (seamles
> and straightforward) conversions between utf8_t and std::string
> and std::wstring. Some (many but not all) others will follow
>
> library c: will use std::strings with utf-8
> ...
> library [.]n[et]: will use String class
> ...
> library q[t]: will use Qstrings
> ..
> library w[xWidgets]: will use wxStrings and wxChar*
> library wi[napi]: will use TCHAR*
> ...
> library z: will use const char* in an encoding agnostic way
>
> Now an application using libraries [a..z] will become
> the developers nightmare. What string should he use for
> the class members, constructor parameters, who to do
> when the conversions do not work so seamlesly ?
>
> Also half of the cpu time assigned to running that
> application will be wasted on useless string transcoding.
> And half of the memory will be occupied with useless
> transcoding-related code and data.
>
> *Scenario C:*
>
> This is basically the status quo; a mix of the above.
> A sad and unsatisfactory state of things.
>

*Scenario D:*

Use Ranges, don't care whether it's std::string, whatever_string, etc.
This also allows maximum efficiency, with lazy concatenation,
transformations, conversion, filtering etc.

My Unicode library works with arbitrary ranges, and you can adapt a
range in an encoding into a range in another encoding.
This can be used to lazily perform encoding conversion as the range is
iterated; such conversions may even be pipelined.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk