Boost logo

Boost :

Subject: Re: [boost] [string] proposal
From: Matus Chochlik (chochlik_at_[hidden])
Date: 2011-01-26 02:47:44


On Fri, Jan 21, 2011 at 1:07 PM, Dean Michael Berris
<mikhailberis_at_[hidden]> wrote:
>
> Mostly I'm interested in seeing a string class that is:
>
> 1. Immutable. No if's or but's about it. I don't want a string to be
> modifiable. Period. You can create it, and once it's created, that's
> it.
>
> 2. Has real value semantics. This means, once you've copied it, that's
> really copied. No funky copy-on-write reference-counting mumbo-jumbo.

I also prefer nothing too fancy. But most of these things
are implementation details, let us get the interface
right first and focus on the optimizations afterwards.
>
> 3. Has all the algorithms that apply to it defined externally.
>
[snip/]
> Encoding is a matter of external interpretation and I think should not
> be part of a string's interface. You can have wrappers that interpret
> a string as a UTF-* string.

I am all for a generalized-*string* class
in the pedantic interpretation of the word
i.e. a sequence of chars, char16_ts, bytes,
octets, words, dwords, etc. without any enforced
encoding for use-cases that call for it, but again,

the reason why I participate in this whole discussion
is because I think that C++ deserves also a class
focused on the "everyday", *nice* and *convenient*
handling of text, without having to worry about how
do I need to "view" that raw-chunk-of-binary-data
in this call to an OS API function and how
do I have to "view" it in that other library call,
explicitly specifying to which encoding I want
to convert it using *ugly* :-) tag types, etc.
(as much as this is possible).

Another important concern for me is portability.
I'd like (being very self-centered :-P) for example
the following:

boost::string s = "Mat" + code_point(0x00FA/*u with acute*/) +
code_point(0x0161/*s with caron*/);
std::cout << s << std::endl;

(everywhere where the terminal can handle it) to print:
Matúš // hope your email client can handle that :)

instead of:
Mat$#@!%
or completely upsetting the terminal.

Also, while I see that for example this
> auto it = encoded<utf8_encoding>(original_string), end =
> encoded<utf8_encoding>();
is perfectly generic and well-designed
for some use-cases the first reaction of
the-average-joe-programmer-inside-me's
when seeing it was, *yuck*. Sorry :-)

Sometimes it is more important for the code
and people writing/maintaining it to be nice
and easy to understand than to be
really-really-generic and smart.
That said, it *is* perfectly valid if someone
uses the generic version above. Let's do both.

The reason why I want to call it (std::)string
is that many not-so-pedantic people would react
to the question "What is your first thought when
you hear 'string type'?" with "Some kind of type
for handling text, eh?" and not with "Some kind
of generalized sequence of elements without any
intrinsic encoding having the following
properties...". But if there is so much resistance
to calling it that then I vote for (boost|std)::text
(however this sounds a little awkward to me, I don't
know why).

Let us keep the basic_string<CharT> as that
generalized string (I never suggested to dump it,
just that std::string would be an another type and
not defined as typedef std::basic_string<char>).

Regarding #1 above and the following ...
> x = "Hello,";
> x = x ^ " World!";

... would you be against, if the interface in addition also
included a few convenience/backward compatibility
member functions like ...

string& append(const string& s)
{
        *this = *this ^ s;
        return *this;
}

string& prepend(const string& s)
{
        *this = s ^ *this;
        return *this;
}

... etc? For the same reasons as above: clarity,
simplicity (it may not be obvious what a fancy
operator expression does, it is more obvious
when using names like append, prepend, ...) and
people are used to that programming style.

BR,

Matus


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk