Boost logo

Boost :

Subject: Re: [boost] [string] --> [text] ?
From: Matus Chochlik (chochlik_at_[hidden])
Date: 2011-01-28 05:21:34


On Fri, Jan 28, 2011 at 11:02 AM, Sebastian Redl
<sebastian.redl_at_[hidden]> wrote:
> On 28.01.2011 08:56, Dean Michael Berris wrote:
>>
>> On Fri, Jan 28, 2011 at 6:47 AM, Gregory Crosswhite
>> <gcross_at_[hidden]>  wrote:
>>>
>>> Since there has been a lot of talk about what the name of a new immutable
>>> string class should be, may I toss the name "boost::text" into the ring?
>>
>> Hmm... Unfortunately it denotes the wrong thing for my case.
>
> That's why "text" is the proposed name for the other case. +1 from me.
>>
>> This was the point for my 'view' template idea. That the view would
>> give some semblance of encoding appropriately.
>
> I really don't like the name "view". It has strong connotations of
> non-ownership. It's not meaningful for the actual purpose of a text type:
> storing text. A text type should store text, not provide a view on a raw
> sequence of bytes. A view<some_encoding> would be something I would look for
> if I wanted to get the bytes that make up a text in some_encoding. Not
> something I would look for if I wanted to store the text.
>
> Calling a text type "view<utf_8>" feels very much to me like calling int
> "view<little_endian_32_bit>".

*Exactly*

>
> As I said before, encoding is a property of interfacing with things external
> to my code. 3rd party libraries, files, network protocols.
>>>
>>> That is, given a boost::text object "t",
>>> one could convert it into a UTF-8 string by calling "t.utf8_c_str()", a
>>> UTF-16 string by calling "t.utf16_c_str()", and so on, depending on what
>>> the
>>> underlying API is expecting.
>>
>> And then you run into the problem of having a ton of member functions
>> that do encapsulate the logic instead of having multiple types to do
>> the conversion instead. The member functions idea will not scale
>> appropriately and would be a hell to manage.
>
> True. How about t.c_str<desired_encoding>()? Put the actual logic for the
> conversion into the encoding type.

+1 although I would not be against

c_str<encoding_tag>(my_text)

if someone shows that this is better than the member function.

NOTE.1: But I would like to see a special encoding tag for the native
encoding i.e. something like native_char_encoding, native_wchar_encoding
or platform_encoding_tag<char>/platform_encoding_tag<wchar_t>

NOTE.2: UTF-8 is assumed by default.

> boost::text should store text. The encoding of the underlying bytes in
> memory shouldn't matter so much.

Yes, I basically don't care what the internal encoding of the string
is if the interface 'plays' with Unicode/UTF-8.

[snip/]

Matus


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk