Boost logo

Boost :

Subject: [boost] [string] --> [text] ?
From: Gregory Crosswhite (gcross_at_[hidden])
Date: 2011-01-27 17:47:24


Hey everyone,

Since there has been a lot of talk about what the name of a new
immutable string class should be, may I toss the name "boost::text" into
the ring? The advantage of this name is that it explicitly conveys what
it is meant for: working with human-readable text encoded in some
implementation-specific form. The name "string" would then continue to
have its current interpretation as a string of contiguous 8-bit chars.

It has also been suggested that different classes be created for
different UTF encodings. I propose that boost::text have the internal
encoding be an implementation (and potentially platform-specific)
detail. Since at the end of a serious of manipulations with the
rope-like data structure one will have to do a final transformation to
convert the text into a string of bytes anyway, that provides a natural
point at which the desired encoding of the string of bytes can be
specified. That is, given a boost::text object "t", one could convert
it into a UTF-8 string by calling "t.utf8_c_str()", a UTF-16 string by
calling "t.utf16_c_str()", and so on, depending on what the underlying
API is expecting. Some of these calls might require recoding the text
to a different encoding, so the internal encoding of boost::text could
be optimized to whatever is most likely to be needed on that platform so
that it is least likely to need recoding. Alternatively, the encoding
could be specified as a parameter to the constructor and be carried
around as a runtime parameter since nobody needs to know what it is
until the final encoding of the string.

Thoughts?

Cheers,
Greg


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk