Subject: Re: [boost] [string] --> [text] ?
From: Dean Michael Berris (mikhailberis_at_[hidden])
Date: 2011-01-28 02:56:50
On Fri, Jan 28, 2011 at 6:47 AM, Gregory Crosswhite
> Since there has been a lot of talk about what the name of a new immutable
> string class should be, may I toss the name "boost::text" into the ring?
Hmm... Unfortunately it denotes the wrong thing for my case.
> Â The advantage of this name is that it explicitly conveys what it is meant
> for: working with human-readable text encoded in some
> implementation-specific form. Â The name "string" would then continue to have
> its current interpretation as a string of contiguous 8-bit chars.
Right, so then I can keep saying 'string' and meaning it in the
computer science context. :)
> It has also been suggested that different classes be created for different
> UTF encodings. Â I propose that boost::text have the internal encoding be an
> implementation (and potentially platform-specific) detail. Â Since at the end
> of a serious of manipulations with the rope-like data structure one will
> have to do a final transformation to convert the text into a string of bytes
> anyway, that provides a natural point at which the desired encoding of the
> string of bytes can be specified.
This was the point for my 'view' template idea. That the view would
give some semblance of encoding appropriately.
> That is, given a boost::text object "t",
> one could convert it into a UTF-8 string by calling "t.utf8_c_str()", a
> UTF-16 string by calling "t.utf16_c_str()", and so on, depending on what the
> underlying API is expecting.
And then you run into the problem of having a ton of member functions
that do encapsulate the logic instead of having multiple types to do
the conversion instead. The member functions idea will not scale
appropriately and would be a hell to manage.
> Some of these calls might require recoding the
> text to a different encoding, so the internal encoding of boost::text could
> be optimized to whatever is most likely to be needed on that platform so
> that it is least likely to need recoding. Â Alternatively, the encoding could
> be specified as a parameter to the constructor and be carried around as a
> runtime parameter since nobody needs to know what it is until the final
> encoding of the string.
Hmmm... So why isn't boost::text just a typedef to `view<some_encoding>`?
And more to the point, why do you need to make the final encoding a
runtime choice when it can easily be made a compile-time choice? Even
if you needed to switch appropriately you can always linearize it into
a character buffer at some point in time.
-- Dean Michael Berris about.me/deanberris
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk