Boost logo

Boost :

Subject: Re: [boost] [Potentially OT] String Concatenation Operator
From: Dean Michael Berris (mikhailberis_at_[hidden])
Date: 2010-08-26 03:09:41


On Wed, Aug 25, 2010 at 5:38 PM, Mathias Gaunard
<mathias.gaunard_at_[hidden]> wrote:
> On 24/08/2010 17:11, Dean Michael Berris wrote:
>>
>>
>> 1. Efficiently allocate space to contain a string being built from
>> literals and variable length strings.
>> 2. Be able to build/traverse the string lazily (i.e., it doesn't
>> matter that the string is contiguous in memory as in the case of
>> C-strings, or whether they are built/backed by a stream as in Haskell
>> ByteString).
>
> It seems to be what you're looking for is a range (or a slight refinement).
> I.e. an entity that can be iterated.

Almost... That's just one part of it.

> Arrays, tuples, strings, vectors, lists, a pair of istream_iterator, a pair
> of pointers, a concatenation of ranges, a transformed range, etc. are all
> ranges.
>

Indeed.

However, I am looking for a means of representing a string -- a
collection of characters on which you can implement algorithms on.
They might as well be ranges, but things like pattern matching and
templates (as in string templates, where you have placeholders and
other things) can be applied to or used to generate them.

>
>> 3. As much as possible be "automagically" network-safe (i.e. can be
>> dealt with by Boost.Asio without having to do much acrobatics with
>> it).
>
> I suppose you'd have to linearize it.
> Sending it in multiple chunks would have different behaviour on different
> types of sockets.
>

Yes, but something that is inherently supported by the type.

Why strings are important has a lot to do with being able to perform
domain-specific optimization on string algorithms. Things like
capitalization, whitespace removal, encoding/decoding, transformations
like breaking up strings according to some pattern (tokenization,
parsing, etc.). Because you can specialize the memory-management of
strings (as opposed to just ranges of char's) the "win" in treating a
string as a separate type are practical rather than conceptual.

>
>> What I wanted to be able to do (and am reproducing at the moment) is a
>> means of doing the following:
>>
>>   string_handle f = /* some means of building a string */;
>>   string_handle s = str("Literal:") ^ f ^ str("\r\n\r\n");
>
>>    std::string some_string = string_handle; // convert to string and build
>> lazily
>
> How about:
>
> boost::lazy_range<char> f = /* some means of building a string */
>
> boost::lazy_range<char> s = boost::adaptors::join(
>   boost::as_literal("Literal"),
>   f,
>   boost::as_literal("\r\n\r\n")
> );
>
> std::string some_string;
> boost::copy(s, std::back_inserter(some_string));
>

That's fine if I can control the memory allocation of the lazy range.
As it is, a lazy range just represents a collection of iterator pairs
-- the data has to live somewhere still. What I'm looking for is a
combined ownership+iteration mechanism. Right now the problem of
allocating a chunk of memory every time a you concatenate two strings
is the problem I'm trying to solve with metaprogramming and knowing at
compile time how much memory I'm going to need to allocate.

Of course if you're dealing with strings that have an unknown length
(as in my example, we really can't tell the length of 'f' at the point
's' is defined) at least getting to know the parts that have a known
length at compile time (the literals) allows me to allocate enough
space ahead of time with the compiler's help. Maybe instead of having
multiple concatenations, what happens is I allocate a chunk "just
large enough" to hold the multiple concatenated strings, and just
traverse the lazy string as in your example. The copy happens at
runtime, the allocation of a large enough buffer (maybe a
boost::array) happens at compile-time (or at least the determination
of the size).

> boost::lazy_range is not actually in boost, but that would be a type erased
> range, and I've got an implementation somewhere.

Maybe a lazy_range would be nice to have in Boost. Or even just a join iterator.

> Of course, not using type erasure at all (i.e. replacing lazy_range by auto
> or the actual type of the expression) would allow it to be quite faster.
>

Definitely.

The hope was to be able to determine at least the length of the whole
string from the concatenation of multiple strings, so that effective
memory allocation can be done at the time it's needed, which is
usually at the time a copy of the whole string is required.

-- 
Dean Michael Berris
deanberris.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk