Boost logo

Boost :

Subject: [boost] [Potentially OT] String Concatenation Operator
From: Dean Michael Berris (mikhailberis_at_[hidden])
Date: 2010-08-24 12:11:07


Good day everyone,

I am currently taking some time to implement some functionality into
cpp-netlib [0] (shameless plug) and somehow I've stumbled into a sort
of conundrum.

First, some background: I'm trying to abstract away the string
building routines of the network library to come up with the most
efficient way of doing the following:

1. Efficiently allocate space to contain a string being built from
literals and variable length strings.
2. Be able to build/traverse the string lazily (i.e., it doesn't
matter that the string is contiguous in memory as in the case of
C-strings, or whether they are built/backed by a stream as in Haskell
ByteString).
3. As much as possible be "automagically" network-safe (i.e. can be
dealt with by Boost.Asio without having to do much acrobatics with
it).

At the heart of the issue is the semantics of the '+' operator to
signify string concatenation. Trying not to sound pedantic about it,
the addition operator in traditional mathematical notions is both
commutative and associative, while string concatenation is not
commutative but right associative. Trying to remember my C++ operator
precedence and associativity rules, it looks like operator% and/or
operator^ might be good candidates for this, but only in expression
templates where you fold from the right.

Now I don't want to start beating on the STL's standard string
implementation, but I'd like to know if anyone is already working on a
string implementation that meets the above requirements? I'd be happy
to wait on compile times with Proto, if it means I can save big at
runtime.

What I wanted to be able to do (and am reproducing at the moment) is a
means of doing the following:

  string_handle f = /* some means of building a string */;
  string_handle s = str("Literal:") ^ f ^ str("\r\n\r\n");
  std::string some_string = string_handle; // convert to string and build lazily

If for instance f were also a literal, then s can efficiently already
hold the string in some fixed sized byte array whose size is
determined at compile time. Somehow the function str() would only be
able to take a literal and look something like this:

  template <size_t N>
  inline
  bounded_fragment<N> str(char const s[N]) {
    return bounded_fragment<N>(s);
  }

The evaluation of the assignment (or copy constructor) of the
string_handle will then evaluate the expression template and already
know at compile time:

A. Whether the string is just a long literal and allocate enough space
to effectively hold the whole string at compile time, or at least
reserve enough space statically (a boost::array perhaps) so that a
simple range copy can be done (and optimized by the compiler as well)

B. Whether the string is a list of variable length strings, having a
list of handles built

C. Whether it is a mix and have all the adjacent literals joined
effectively at compile time and those variable sized strings retrieved
when required

Pointers to ongoing work would be most appreciated -- I'm currently
too preoccupied to chase this particular rabbit down the hole (I'm
chasing a different rabbit in a different hole) but maybe this is an
interesting enough problem for the template metaprogramming guru's to
look into?

Thanks in advance and I look forward to any thoughts/pointers.

-- 
Dean Michael Berris
deanberris.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk