Boost logo

Boost :

Subject: Re: [boost] [string] --> [text] ?
From: Dean Michael Berris (mikhailberis_at_[hidden])
Date: 2011-01-28 02:56:50


On Fri, Jan 28, 2011 at 6:47 AM, Gregory Crosswhite
<gcross_at_[hidden]> wrote:
>
> Since there has been a lot of talk about what the name of a new immutable
> string class should be, may I toss the name "boost::text" into the ring?

Hmm... Unfortunately it denotes the wrong thing for my case.

>  The advantage of this name is that it explicitly conveys what it is meant
> for: working with human-readable text encoded in some
> implementation-specific form.  The name "string" would then continue to have
> its current interpretation as a string of contiguous 8-bit chars.
>

Right, so then I can keep saying 'string' and meaning it in the
computer science context. :)

> It has also been suggested that different classes be created for different
> UTF encodings.  I propose that boost::text have the internal encoding be an
> implementation (and potentially platform-specific) detail.  Since at the end
> of a serious of manipulations with the rope-like data structure one will
> have to do a final transformation to convert the text into a string of bytes
> anyway, that provides a natural point at which the desired encoding of the
> string of bytes can be specified.

This was the point for my 'view' template idea. That the view would
give some semblance of encoding appropriately.

> That is, given a boost::text object "t",
> one could convert it into a UTF-8 string by calling "t.utf8_c_str()", a
> UTF-16 string by calling "t.utf16_c_str()", and so on, depending on what the
> underlying API is expecting.

And then you run into the problem of having a ton of member functions
that do encapsulate the logic instead of having multiple types to do
the conversion instead. The member functions idea will not scale
appropriately and would be a hell to manage.

> Some of these calls might require recoding the
> text to a different encoding, so the internal encoding of boost::text could
> be optimized to whatever is most likely to be needed on that platform so
> that it is least likely to need recoding.  Alternatively, the encoding could
> be specified as a parameter to the constructor and be carried around as a
> runtime parameter since nobody needs to know what it is until the final
> encoding of the string.
>

Hmmm... So why isn't boost::text just a typedef to `view<some_encoding>`?

And more to the point, why do you need to make the final encoding a
runtime choice when it can easily be made a compile-time choice? Even
if you needed to switch appropriately you can always linearize it into
a character buffer at some point in time.

-- 
Dean Michael Berris
about.me/deanberris

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk