Boost logo

Boost :

Subject: Re: [boost] [string] --> [text] ?
From: Sebastian Redl (sebastian.redl_at_[hidden])
Date: 2011-01-28 05:02:53


On 28.01.2011 08:56, Dean Michael Berris wrote:
> On Fri, Jan 28, 2011 at 6:47 AM, Gregory Crosswhite
> <gcross_at_[hidden]> wrote:
>> Since there has been a lot of talk about what the name of a new immutable
>> string class should be, may I toss the name "boost::text" into the ring?
> Hmm... Unfortunately it denotes the wrong thing for my case.
That's why "text" is the proposed name for the other case. +1 from me.
> This was the point for my 'view' template idea. That the view would
> give some semblance of encoding appropriately.
I really don't like the name "view". It has strong connotations of
non-ownership. It's not meaningful for the actual purpose of a text
type: storing text. A text type should store text, not provide a view on
a raw sequence of bytes. A view<some_encoding> would be something I
would look for if I wanted to get the bytes that make up a text in
some_encoding. Not something I would look for if I wanted to store the text.

Calling a text type "view<utf_8>" feels very much to me like calling int
"view<little_endian_32_bit>".

As I said before, encoding is a property of interfacing with things
external to my code. 3rd party libraries, files, network protocols.
>> That is, given a boost::text object "t",
>> one could convert it into a UTF-8 string by calling "t.utf8_c_str()", a
>> UTF-16 string by calling "t.utf16_c_str()", and so on, depending on what the
>> underlying API is expecting.
> And then you run into the problem of having a ton of member functions
> that do encapsulate the logic instead of having multiple types to do
> the conversion instead. The member functions idea will not scale
> appropriately and would be a hell to manage.

True. How about t.c_str<desired_encoding>()? Put the actual logic for
the conversion into the encoding type.

>> Some of these calls might require recoding the
>> text to a different encoding, so the internal encoding of boost::text could
>> be optimized to whatever is most likely to be needed on that platform so
>> that it is least likely to need recoding. Alternatively, the encoding could
>> be specified as a parameter to the constructor and be carried around as a
>> runtime parameter since nobody needs to know what it is until the final
>> encoding of the string.
>>
> Hmmm... So why isn't boost::text just a typedef to `view<some_encoding>`?
>

boost::text should store text. The encoding of the underlying bytes in
memory shouldn't matter so much.

> And more to the point, why do you need to make the final encoding a
> runtime choice when it can easily be made a compile-time choice?
Various situations may be most efficient with different encodings. If
the text type hides the actual encoding from the user and can switch at
runtime, it can adapt to the situation.

Sebastian


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk