Boost logo

Boost :

From: Pavol Droba (droba_at_[hidden])
Date: 2005-05-15 16:02:22


Hi Boosters,

There was a discussion about char[] support in the Boost.Range
library. The issue seems important and I'd like to express my
ideas about a possible solution.

First lets sumarize problems and goals.
The problems:
  char[] and possibly any other type that can be used as a c-string
  (this includes wchar_t, but also int, long and etc when used as a
  unicode code-point) might represent two different things:
  1.) c-string literal
  2.) arbitrary c-array

  Both views differ in lenght calculation, which is totaly
  incompatible and what's worse, it can lead to casual access
  violation when used improperly.

  An example:
  char str[] = "Hello";
  // typeof(str) is char[6], str=={'h','e','l',l','o',0}

  In the c-string view, str have 5 letters and ends at the 'o'.
  So the range should be <'H','o')
  In c-array view str is 6 elements long and ends with '\0'
  The range is <'H','\0')

  From the user perspective, both views are equaly important,
  however according to the usage scenarion, one might be preferable
  over the other one. Important aspect to keep in mind is this strict
  relativnes. For example for string algorithms c-string literal is
  obvious default, while for a data processing library the second
  choice is better.

  Current implementation is not ideal. First of all, there is a
  difference between char,wchar_t[] and the rest of the types.
  This brings some confusion. Secondly, it is not possible to use
  char[] as an ordinary array.

The goals:
  From the problem analysis above, following goals can be implied

  1. we need to support both views equaly
  2. a user must be always able to explicitly specify what type
     of view he requires
  3. it should be possible for a library writer to select default
     view for his library.
     However point (2) must hold, so the user must be able
     to override this default.
  4. Support must be present in the Boost.Range library.
     It is not feasible to ask library writer to provide
     specific workarounds/hacks. It would simply break the idea
     of Boost.Range library as a unified interface to range-like
     data structures.

The solution:

  I propose to have two free-standing functions
  as_string() and as_array() (naming is not important now).

  Both should have the same generic signature:

  template<typename RangeT>
  boost::sub_range<RangeT> as_string(RangeT& aRange);
  template<typename RangeT>
  boost::sub_range<RangeT> as_array(RangeT& aRange);
  
  By default, the functions only copy the input range to the target.
  However for the types like char[], the result will differ.
  For as_string() will create a sub_range delimiting string
  literal (using char_type<char>::length for instance), while as_array()
  will use compile-time boundaries.

  In addition we might consider to open this interface for
  user-defined type, even if I'm not sure how it can be used.
  
  Please note, that once any of these manipulators is applied to a
  range following application will have no effect.

Lets see how this faicility can be used:

  A library writer can set the default by writting algorithm like
  this:

  template<typename RangeT>
  ... AnAlgorithm(const RangeT& aRange)
  {
     boost::sub_range<RangeT> StrRange=as_string(aRange);

     // Do something with StrRange
  }

  If a user calls AnAlgorithm directly:
  char str[]="hello";
  AnAlgorithm(str);

  str will be converted to a range, delimiting a string_literal.
  However he can alse use as_array():

  char str[]={'h', 'e', 'l', 'l', 'o'};
  AnAlgorithm(as_array(str));

  This time no conversion will take place, since as_array() returns
  sub_range.

  Note, that for the AnAlgorithm it does not matter what default is
  used in the Range library.

Open questions:

- I have intentionaly not included a proposal for the default view
  that the Range library should provide.
  Goal of this solution is to provide a way, that is not dependant
  on this.
  I'd like to leave it for the discussion. Right now it seems, that
  most of the people that entered discussion prefer c-array view.
  I would prefer c-string view, but I'm probably biased by the fact
  that I'm the author of StringAlgo library.

- There is a space for possible extentions to the basic proposal.
  For instance, as_string() migh have the second parameter that
  will identify a terminator.

- String literal lenght can be calculated in two ways. Either by
  using strlenght() (or alike), or using compile-time size (N)
  decreased by 1 (N-1).

  Later approach is faster and allows to specify literals like this:
  char str[]="hello\0bye";
  But it is different from char* handling and therefor it might
  be confusing.

  
Best Regards,
Pavol

    


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk