Subject: Re: [boost] [string] proposal
From: Matus Chochlik (chochlik_at_[hidden])
Date: 2011-01-26 10:19:47
On Wed, Jan 26, 2011 at 3:22 PM, Dean Michael Berris
> On Wed, Jan 26, 2011 at 5:01 PM, Matus Chochlik <chochlik_at_[hidden]> wrote:
>> I didn't say that I regard the immutability or value semantics to
>> be an implementation detail. But some part of the discussion
>> focused on if we should employ COW, how to implement it,
> Sure, which is also where the reference counting implementation lies.
> Details like that are deal-breakers in performance-critical code and
> if we're talking about replacing std::string or implementing a
> competing string, it would have to beat the std::string performance
> (however bad/good that is).
>> Value semantics - a part of the interface specification -
>> can be implemented in a number of ways.
> I don't see though how else value semantics can be implemented aside
> from having: a default constructor, an assignment operator, a copy
> constructor, and later on maybe an optimal move constructor. That's
> really all there is to value semantics -- some would argue that swap
> is "necessary" too but I'm not convinced that swap is really required
> for value semantics.
I may be wrong, but my idea when I hear you say that a type has
a default constructor, assignment operator, is you talking
about the interface of the type. When you explain how the assignment
operator, etc. is implemented then you are talking about implementation
> So it's the algorithms that are the problem -- for being encoding
> agnostic -- and not really the string is that what you're implying?
>>> 1. This is totally fine with an immutable string implementation. I
>>> don't see any mutations going on here.
>> Me neither :-) What I see however is that it fails because
>> of encoding.
> I still don't understand this though. What does encoding have to do
> with the string? Isn't encoding a separate process?
Hm, my ability to express myself obviously totally su*ks :)
you are completely right, that the encoding is a completely
separate process, and I'm saying that I want it *completely*
to be hidden from my sight, unless it is absolutely necessary
for me to be concerned about it :-)
The means for this would be: Let us build a string, that may
(or may not) be based on your general (encoding agnostic)
string. And this string would handle the transcoding in most
cases without me viewing the underlying byte sequence
by functors that need me *everytime* to specify what encoding
I want explicitly. By default I want UTF-8, if I talk to the OS I
say I want the string in an encoding that the OS expects, not
that I want it in UTF-16, ISO-8859-2, KOI8-R, etc.
If and only if I want to handle the string in another encoding
than Unicode should I have to specify that explicitly.
> How about Boost.RangeEx-wrapped STL algorithms?
> I for one like the simplicity and flexibility of it which may explain
> why I think we have different interpretations of "convenient". For me,
> iterators and layering operations on iterators, and then feeding them
> through algorithms is the convenient route. Anything that resembles
> Java code or Smalltalk-like "OOP message-passing" inspired interfaces
> just don't seem enticing to my brain anymore.
This is a different matter, Again I may be wrong but I live
under the expression that RangeEx has been implemented
to hide the ugliness of complex STL iterator-based algorithms.
> To be more "complete" about it though the semantics of "+" on strings
> is really a misnomer. The "+" operator signifies associativity which
> string concatenation is not -- and you're really not adding string
> values either. What you want is an operator that conveys "I'm joining
> the string on the left with the one on the right in the specified
> order" -- because the "^" operator is left associative and can be used
> as a joining symbol, it fits the use case for strings better.
> So you read it as: "Foo" joined with "Bar" joined with ...
I know that of course because we are having this discussion,
but will it be clear to someone is not participating. It may become
clear when the string gets wider adoption.
> I still don't understand what "nice" is. I think precisely because
> "nice" is such a subjective thing I fear that without any objective
> criterion to base the definition of an interface/implementation on, we
> will keep chasing after what's "nice" or "convenient".
> OTOH if we agree that algorithms are as important as the abstraction,
> then I think it's better if we agree what the correct abstraction is
> and what the algorithms we intend to implement/support are. In that
> discussion what's "nice" is largely a matter of taste. ;)
OK, I think that it is pointless to discuss "nice" :) exactly because
it is very subjective.
>>> Also, last time I checked, there are already a ton of Unicode-encoding
>>> libraries out there, I don't see why there's a need for
>>> yet-another-encoding-library for character strings. This is why I
>>> think I'm liking the way Boost.Locale is handling it because it
>>> conveys that the library is about making a common interface through
>>> which different back-ends can be plugged into. If Boost.Locale dealt
>>> with iterators then I think having a string library that is better
>>> than std::string in more ways than one gives us a good way of tackling
>>> the cross-platform string encoding issue. But there I stress, I think
>>> C++ needs a better than the standard string implementation.
>> And what is their level of acceptance by different APIs ?
> I think we need to qualify what you refer to as APIs. If just judging
> from the amount of code that's written against Qt or MFC for example
> then I'd say "they're pretty well accepted". If you look at the
> libraries that use ICU as a backend I'd say we already have one in
> Boost called Boost.Regex. And there's all these other libraries in the
> Linux arena that have their own little niche to play in the Unicode
> game -- there's Glib, the GNOME and KDE libraries, ad nauseam.
Besides what you mentioned an API for me is for example
WINAPI, POSIX API, OpenGL API, OpenSSL API, etc.
Basically all the functions "exported" by the various C/C++
libraries that I cannot imagine my life without :) and which
expect not a generic iterator range or a view or whatnot
but plain and simple pointer (const char*) pointing to a contiguous
block in memory containing a zero terminated C string,
or if we are luckier expects std::string.
> What opinion is there to be had? If the string is immutable why would
> you want to make it look like it is mutable?
>> Nobody forces you to use append/
>> prepend and you should not force others to use the operator ^.
> Well, the primitive data types force you to use the operators defined
> on them. Spirit forces you to define rules using the DSEL. So does the
> MSM library. The BGL forces you to use the graph abstraction if you
> intend to deal with that library.
> I don't see why it's unreasonable to force operator^ for consistency's sake.
>> IMO in this case you are even in an advantage, because append/
>> prepend/etc. would be wrappers around "your" :) interface.
>> And, yes, they should be clearly documented as such.
> But the point of the thing being immutable is lost in translation.
> More to the point, operator^ has simple semantics as opposed to
> 'append' and 'prepend' which are two words for the same operation with
> just the order of the operands switched around.
> Am I missing something here?
I see your point of view. You imagine this new string class
to be a completely new beast. Me and I expect that there
are few others, view it as the next std::string. I don't see
any big point in creating another-uber-string, that is *so much*
better in performance, etc. etc. if it does not get wide adoption.
There already are dozens of such strings already.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk