Boost logo

Boost :

Subject: Re: [boost] [string] proposal
From: Dean Michael Berris (mikhailberis_at_[hidden])
Date: 2011-01-26 11:06:29

On Wed, Jan 26, 2011 at 11:19 PM, Matus Chochlik <chochlik_at_[hidden]> wrote:
> On Wed, Jan 26, 2011 at 3:22 PM, Dean Michael Berris
> <mikhailberis_at_[hidden]> wrote:
>> I don't see though how else value semantics can be implemented aside
>> from having: a default constructor, an assignment operator, a copy
>> constructor, and later on maybe an optimal move constructor. That's
>> really all there is to value semantics -- some would argue that swap
>> is "necessary" too but I'm not convinced that swap is really required
>> for value semantics.
> I may be wrong, but my idea when I hear you say that a type has
> a default constructor, assignment operator, is you talking
> about the interface of the type. When you explain how the assignment
> operator, etc. is implemented then you are talking about implementation
> details :)

Right, but others seem to want to know about the implementation
details to try and work out whether the overall interface being
designed is actually going to be a viable implementation. So while I
say "value semantics" others have asked how that would be implemented
and -- being the gratuitous typer that I am ;) -- I would respond. :D

>> I still don't understand this though. What does encoding have to do
>> with the string? Isn't encoding a separate process?
> Hm, my ability to express myself obviously totally su*ks :)
> you are completely right, that the encoding is a completely
> separate process, and I'm saying that I want it *completely*
> to be hidden from my sight, unless it is absolutely necessary
> for me to be concerned about it :-)

So what would be the point of implementing a string "wrapper" that
knew its encoding as part of the type if you didn't want to know the
encoding in most of the cases? I think I'm missing the logic there.

> The means for this would be: Let us build a string, that may
> (or may not) be based on your general (encoding agnostic)
> string. And this string would handle the transcoding in most
> cases without me viewing the underlying byte sequence
> by functors that need me *everytime* to specify what encoding
> I want explicitly. By default I want UTF-8, if I talk to the OS I
> say I want the string in an encoding that the OS expects, not
> that I want it in UTF-16, ISO-8859-2, KOI8-R, etc.
> If and only if I want to handle the string in another encoding
> than Unicode should I have to specify that explicitly.

So we're obviously talking about two different strings here -- your
"text" that knows the encoding and the immutable string that you may
or may not build upon. How then do you design the algorithms if you
*didn't* want to explicitly specify the encoding you want the
algorithms to use?

In one of the previous messages I laid out an algorithm template like so:

  template <class String>
  void foo(String s) {
    view<encoding> encoded(s);
    // deal with encoded from here on out

Of course then from foo's user perspective, she wouldn't have to do
anything with his string to be passed in. From the algorithm
implementer perspective you would know exactly what encoding was
wanted and how to go about implementing the algorithm even potentially
having something like this as well:

  template <class Encoding>
  void foo(view<Encoding> encoded) {
    // deal with the encoded string appropriately here

And you get the benefits in either case of being able to either
explicitly or implicitly deal with strings depending on whether they
have been explicitly encoded already or whether it's just a raw set of

> [snip/]
>> How about Boost.RangeEx-wrapped STL algorithms?
>> I for one like the simplicity and flexibility of it which may explain
>> why I think we have different interpretations of "convenient". For me,
>> iterators and layering operations on iterators, and then feeding them
>> through algorithms is the convenient route. Anything that resembles
>> Java code or Smalltalk-like "OOP message-passing" inspired interfaces
>> just don't seem enticing to my brain anymore.
> This is a different matter, Again I may be wrong but I live
> under the expression that RangeEx has been implemented
> to hide the ugliness of complex STL iterator-based algorithms.

Right. So what's the difference between the RangeEx way and the STL
way when they both deal with iterators? What makes the STL version
"yuckier" than the RangeEx version? I might have my answer to that but
hearing your answer to this question might give me a better idea of
what you might mean when you say "nice" or "convenient". ;)

>> So you read it as: "Foo" joined with "Bar" joined with ...
> I know that of course because we are having this discussion,
> but will it be clear to someone is not participating. It may become
> clear when the string gets wider adoption.

Of course the proof will be in the pudding. ;)

>> I still don't understand what "nice" is. I think precisely because
>> "nice" is such a subjective thing I fear that without any objective
>> criterion to base the definition of an interface/implementation on, we
>> will keep chasing after what's "nice" or "convenient".
>> OTOH if we agree that algorithms are as important as the abstraction,
>> then I think it's better if we agree what the correct abstraction is
>> and what the algorithms we intend to implement/support are. In that
>> discussion what's "nice" is largely a matter of taste. ;)
> OK, I think that it is pointless to discuss "nice" :) exactly because
> it is very subjective.

Agreed. :)

>> I think we need to qualify what you refer to as APIs. If just judging
>> from the amount of code that's written against Qt or MFC for example
>> then I'd say "they're pretty well accepted". If you look at the
>> libraries that use ICU as a backend I'd say we already have one in
>> Boost called Boost.Regex. And there's all these other libraries in the
>> Linux arena that have their own little niche to play in the Unicode
>> game -- there's Glib, the GNOME and KDE libraries, ad nauseam.
> Besides what you mentioned an API for me is for example
> Basically all the functions "exported" by the various C/C++
> libraries that I cannot imagine my life without :) and which
> expect not a generic iterator range or a view or whatnot
> but plain and simple pointer (const char*) pointing to a contiguous
> block in memory containing a zero terminated C string,
> or if we are luckier expects std::string.

So, if there was a way to "encode" (there's that word again) the data
in an immutable string into an acceptably-rendered `char const *`
would that solve the problem? The whole point of my assertion (and
Dave's question) is whether c_str() would have to be intrinsic to the
string, which I have pointed out in a different message (not too long
ago) that it could very well be an external algorithm.

>> Am I missing something here?
> I see your point of view. You imagine this new string class
> to be a completely new beast. Me and I expect that there
> are few others, view it as the next std::string. I don't see
> any big point in creating another-uber-string, that is *so much*
> better in performance, etc. etc. if it does not get wide adoption.
> There already are dozens of such strings already.

Right. This is Boost anyway, and I've always viewed libraries that get
proposed to an accepted into Boost are the kinds of libraries that are
developed to eventually be made part of the C++ standard library.

So while out of the gate the string implementation can very well be
not called std::string, I don't see why the current std::string can't
be deprecated later on (look at std::auto_ptr) and a different
implementation be put in its place? :D Of course that may very well be
C++21xx so I don't think I need to worry about it having to be a
std::string killer in the outset. ;)


Dean Michael Berris

Boost list run by bdawes at, gregod at, cpdaniel at, john at