Boost logo

Boost :

Subject: Re: [boost] [Potentially OT] String Concatenation Operator
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2010-08-27 17:30:01


On 27/08/2010 03:50, Dean Michael Berris wrote:

> Actually, not *just* a range of characters.
>
> If your range dealt with ownership of data and abstracted the means by
> which the data is actually manipulated (even through iterators) then
> maybe a string is just a range of characters. If your range made sure
> that the data is moved instead of copied in certain situations, or
> whether there is an optimization that can be made on certain
> operations (like concatenation) then yeah I'll agree that a string is
> just a range of characters.
>
> If you think about a string as a special beast that does a lot of things:
>
> 1. Allocates memory which holds the characters
> 2. Offers an abstraction unique to strings (token_iterator,
> line_iterator, char_iterator, wchar_iterator)
> 3. Has value semantics similar to built-in types (copyable, movable, "regular")
>
> Then it doesn't look like just a range.

What I call a string is just the data, not a container of said data.

I don't see what 2 is doing in there. Surely those mechanisms are
independent of the container.

> Sure, but if the string type already does that for you at the time you
> need it, then it's inherently supported by the type, no?

If you can iterate through a sequence of copyable elements, then you can
copy those elements into a new sequence of the structure of your choosing.
This doesn't require any particular support from the first sequence type.

> That's cool, but in cases where you deal with potentially huge strings
> (i.e. more than a memory page's worth of data) you start looking for
> ways of moving some of the work out of runtime and into compile time

This doesn't make sense.
If the data is too big to fit into memory, then it's going to be
completely out of the range (no pun intended) of what the compile-time
world can deal with.

> And especially in
> cases where you need to abstract the string from being something that
> is exclusively in memory to something that refers to data that is not
> in memory (retrieved from a socket, from a file, from user input) you
> run into issues like buffer management, demand-driven/lazy-loading
> data, etc. that a range is not the end-all and/or best solution for.
>
> For instance, think about a forward iterator (or a single-pass
> iterator?) that the string can expose to allow for one-time traversal
> of the data it holds or it refers to. The string doesn't have to have
> the data all in memory if it doesn't yet, and it can then lazily load
> the data and expose it through that forward iterator.

I don't get your argument.
Basically you're saying "wouldn't it be cool if you could not actually
store your data in memory, but generate it as you're traversing or get
it from I/O on demand?". That's what ranges are.
Ranges *are* iterators.

> If the string handle knew that some part
> of the string was a conglomeration of statically-sized literals then
> it can hold that data in a boost::array of the correct size

No it cannot.
string_handle is a single type defined in advance, and it must have a
finite size. It cannot guess how many conglomerations of
statically-sized literals you're going to want to put into it, and
therefore can't guarantee enough storage.

> then you can correctly
> allocate enough space at the point where the operator= is implemented.

At runtime; (operator= is a function that executes code, not a type
definition that provides automatic storage) which makes the memory
dynamically allocated, as I said.

> Of course the type erasure might be an issue, but knowing the eventual
> size at compile time allows you a lot of optimizations you otherwise
> can't or won't do.

Here, I can't see it bringing any benefit compared to knowing it at
runtime, since the allocation can only happen at runtime.

> Yes, but then you have a left-leaning tree of iterator pairs. ;)

Just like you have a tree of a bounded segments, or an AST if you go for
a proto solution.

> If the new string type implemented iterators (several types of
> iterators in fact) and manages the memory for you in a configurable
> manner, *and* allows you to convert it to either an std::string or an
> std::wstring, why wouldn't it be inter-operable?

Because your code will expect your string type, not the string type of
the user, meaning he'll have to convert to it.
Converting is not actually needed, since you could just treat your type,
the type of the user, or any smart lazy evaluated type the same.

> But the range adaptors don't solve the issue of multiple allocations

How do they not?

Your problem is that you reallocate multiple times with a classic binary
operator+ implementation: (pseudo-code)

buffer = a + b + c

is

buffer = allocate(size(a))
copy(buffer, a)

buffer2 = allocate(size(buffer)+size(b))
copy(buffer2, buffer)
cat(buffer2, b)
free(buffer)

buffer = allocate(size(buffer2)+size(c))
copy(buffer, buffer2)
cat(buffer, c)
free(buffer2)

Instead of that you could just evaluate that expression as

r = join(a, b, c)
buffer3 = allocate(size(r))
copy(buffer3, r)

As you can see, it solves the problem just fine.

> Ah, because 'auto' is C++0x and I was still thinking that maybe people
> not moving to C++0x might as well want to be able to use this new
> string type even at C++03.

You can write the type out yourself or use a result_of-like protocol,
like Fusion does.

> Yes! Well, technically not stateful operations -- they're functionally
> pure operations: concatenation doesn't munge with the existing
> strings, it just returns a new string made of the contents of the
> other strings (or string generators).

If you end up re-storing the result into the same variable, it basically
is stateful.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk