Boost logo

Boost :

Subject: Re: [boost] [string] proposal
From: Matus Chochlik (chochlik_at_[hidden])
Date: 2011-01-26 04:01:29


On Wed, Jan 26, 2011 at 9:25 AM, Dean Michael Berris
<mikhailberis_at_[hidden]> wrote:
> On Wed, Jan 26, 2011 at 3:47 PM, Matus Chochlik <chochlik_at_[hidden]> wrote:
>> On Fri, Jan 21, 2011 at 1:07 PM, Dean Michael Berris
>> <mikhailberis_at_[hidden]> wrote:
>>>
[snip/]
>> I also prefer nothing too fancy. But most of these things
>> are implementation details, let us get the interface
>> right first and focus on the optimizations afterwards.
>
> Actually, it's not an implementation detail. Value semantics has
> everything to do the interface and not the implementation.
>
> It's just that, at the time I was thinking about and writing this
> reply, I was just really wanting something lightweight and allowed for
> unbridled cross-thread access. That original assumption of mine that
> reference counting was a bad thing has since been clarified by others
> in the ensuing threads.

I didn't say that I regard the immutability or value semantics to
be an implementation detail. But some part of the discussion
focused on if we should employ COW, how to implement it,
etc. Value semantics - a part of the interface specification -
can be implemented in a number of ways.

>
>>>
>>> 3. Has all the algorithms that apply to it defined externally.
>>>
>> [snip/]
>>> Encoding is a matter of external interpretation and I think should not
>>> be part of a string's interface. You can have wrappers that interpret
>>> a string as a UTF-* string.

OK, I give up :) I do not insist any more on calling it 'string'.

>>
[snip/]
>>
>
> But I we already have these everyday nice and convenient text handling
> algorithms in Boost.Algorithm's String_algo library.

But still it is encoding agnostic, which is bad in many cases.

>
> As a matter of fact, *all* the implementations cited about dealing
> with UTF-8 and UTF-16 have everything to do with wrapping raw data
> into a view of it that (unfortunately) allows for mutating
> transformations.
>
> Note also that I wasn't even going into the generic point of stringsdo not
> being a sequence of anything other than characters to be read. That's
> a different topic that I don't want to get into at this time. But even
> the pedantic definition of a string doesn't include mutability as an
> intrinsic requirement.

I really do not have anything against the immutability
and the value semantics, see above. I think you
misunderstood me :)

>
>> Another important concern for me is portability.
>> I'd like (being very self-centered :-P) for example
>> the following:
>>
>> boost::string s = "Mat" + code_point(0x00FA/*u with acute*/) +
>> code_point(0x0161/*s with caron*/);
>> std::cout << s << std::endl;
>>
>> (everywhere where the terminal can handle it) to print:
>> Matúš // hope your email client can handle that :)
>>
>> instead of:
>> Mat$#@!%
>> or completely upsetting the terminal.
>>
>
> A few things here:
>
> 1. This is totally fine with an immutable string implementation. I
> don't see any mutations going on here.

Me neither :-) What I see however is that it fails because
of encoding.

>
> 2. A string class that "works correctly while immutable" allows for
> dealing with arbitrary data interpreted as some thunk that is obtained
> from a given source (as long as you have a length of the data that
> is).

Agreed

>
> 3. String I/O can be defined independently of the string especially if
> you're dealing with C++ streams. I don't see why the above would be a
> problem with an immutable string implementation.

Agreed, but again it has to be convenient.

[snip/]
>> Also, while I see that for example this
>>> auto it = encoded<utf8_encoding>(original_string), end =
>>> encoded<utf8_encoding>();
>> is perfectly generic and well-designed
>> for some use-cases the first reaction of
>> the-average-joe-programmer-inside-me's
>> when seeing it was, *yuck*. Sorry :-)
>>
>
> So you'd say yuck to any STL algorithm that dealt with iterators? Have
> you used the Boost.Iterators library yet because then you'd be calling
> all those chaining/wrapping operations "yucky" too. ;)

Some of them ? Yes, in many situations.

[snip/]
>
> But the problem there is "nice" is really subjective. I absolutely
> abhor code like this:
>
>  boost::string s = "Foo";
>  s.append("Bar").append("Baz");
>
> When I can express it entirely with less characters and succinctly
> with this instead:
>
>  boost::string s = "Foo" ^ "Bar" ^ "Baz";

Agreed, this is a matter of opinion and while
I see the beauty of what you propose, it may
not be clear what you mean by "Foo" ^ "Bar".
If I learned something from this whole discussion,
then it is that it's not nice to shove anything (programming
style included) down anyones throat :-)

>
>> The reason why I want to call it (std::)string
>> is that many not-so-pedantic people would react
>> to the question "What is your first thought when
>> you hear 'string type'?" with "Some kind of type
>> for handling text, eh?" and not with "Some kind
>> of generalized sequence of elements without any
>> intrinsic encoding having the following
>> properties...". But if there is so much resistance
>> to calling it that then I vote for (boost|std)::text
>> (however this sounds a little awkward to me, I don't
>> know why).
>>
>
> I think you're missing something here though.
>
> The point of creating a new string implementation is so that you can
> generalize a whole family of string-related algorithms around a
> well-defined abstraction. In this case there's really no question that
> a string of characters is used to represent "text" -- although it can
> very well represent a lot of other things too. However you cut it
> though the abstraction bears out of algorithms that have something to
> do with strings like: concatenation, compression, ordering, encoding,
> decoding, rendering, sub-string, parsing, lexical analysis, search,
> etc.

And I think you misunderstand me, I *do not* want to stop us
from doing such implementation of string. But just as it is important
for you to have the generic string class, it is important for me to have
the "nice" 'text' class :) I even don't have anything against
boost::text to be implemented as a special case of boost::string
if it is possible/wise.

>
[snip/]
>
> Like I said though, I think we're talking in different levels.

I have exactly the same feeling :)

>
> I for one think that solving the std::string problem brings more to
> the world than just solving the encoding problem. Bold statement I
> know. ;)

For you (and others) not for me (and others).

>
> Also, last time I checked, there are already a ton of Unicode-encoding
> libraries out there, I don't see why there's a need for
> yet-another-encoding-library for character strings. This is why I
> think I'm liking the way Boost.Locale is handling it because it
> conveys that the library is about making a common interface through
> which different back-ends can be plugged into. If Boost.Locale dealt
> with iterators then I think having a string library that is better
> than std::string in more ways than one gives us a good way of tackling
> the cross-platform string encoding issue. But there I stress, I think
> C++ needs a better than the standard string implementation.

And what is their level of acceptance by different APIs ?

>
>> Regarding #1 above and the following ...
>>> x = "Hello,";
>>> x = x ^ " World!";
>>
>> ... would you be against, if the interface in addition also
>> included a few convenience/backward compatibility
>> member functions like ...
>>
[snip/]
>>
>> ... etc? For the same reasons as above: clarity,
>> simplicity (it may not be obvious what a fancy
>> operator expression does, it is more obvious
>> when using names like append, prepend, ...) and
>> people are used to that programming style.
>>
>
> I think this is a slippery slope though. If we make the boost::string
> look like something that is mutable without it being really mutable,
> then you have a disconnect between the interface and the semantics you
> want to convey.
>
> Having member functions like 'append' and 'prepend' makes you think
> that you're modifying the string when in fact you're really building
> another string. I've already pointed out that string construction can
> very well be handled by the string streams so I don't think we want to
> encourage people to think of strings as state-ful objects with mutable
> semantics because that's not the original intention of the string.
>
> By forcing users of the string to make it look like they're building a
> string instead of "modifying and existing string" *should* be conveyed
> in the interface. This is largely an issue of documentation though.

Again, this is a matter of taste.
Is the enforcing of our "superior" interface design really that much
more important then level of acceptability by other people which
do not share the same opinion ? Nobody forces you to use append/
prepend and you should not force others to use the operator ^.
IMO in this case you are even in an advantage, because append/
prepend/etc. would be wrappers around "your" :) interface.
And, yes, they should be clearly documented as such.

Best,

Matus


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk