Boost logo

Boost :

Subject: [boost] RangeEx review
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2009-03-01 08:26:03


My RangeEx review follows. I'm afraid that I have not been following
much of the ongoing discussion of this library on the list so apologies
if I simply repeat what other people are saying.

Algorithms
----------

I have always found the verbosity of things like
std::sort(container.begin(),container.end()) to be an embarrassing
misfeature of the std library so it's good to be able to now write the
much more sensible sort(container). Just a couple of comments:

- It would be good if the docs were explicit about this being a thin
wrapper around your existing std::sort, and identifying those
algorithms that are not so thin.

- Do we really want this in the top-level boost:: namespace? There
have been discussions about sorting libraries for Boost who might also
want boost::sort. Would it not be better to put these algorithms in a
boost::range namespace?

Then we move on to the algorithms where you can select what is returned
using a template parameter. This is an interesting idea, but it's a
little verbose and I can't think of any precedent for it in the std
library or elsewhere in Boost. Being unfamiliar increases the learning
curve for a feature. I was wondering if it would be possible to have
something more like:

   boost::unique(rng).found instead of boost::unique<boost::return_found>(rng)

Or maybe not; just a thought.

You then have this example for sorting and removing duplicates:

   boost::erase( vec, boost::unique<boost::return_found_end>(
boost::sort(vec) ) );

Err... yuk. Why can't I write:

   unique(vec);

Or maybe

   unique(sort(vec));

or

   sort(vec);
   unique(vec);

My point is that std::unique only has its "move duplicates to the end"
behaviour because the conventional interface with a pair of iterators
doesn't let you change the size of the container. Now that we have a
unique() that takes a container we don't need that restriction.

I recently tried to write a starts_with algorithm:

template <typename CONTAINER>
bool starts_with(const CONTAINER& a, const CONTAINER& b) {
   // does a start with b ?
   return std::equal(b.begin(), b.end(), a.begin());
}

The problem with this is that it will crash if a is shorter than b,
because std::equal takes only three not four iterators. The fix is to
add an additional test:

   return a.size() >= b.size() && std::equal(b.begin(), b.end(), a.begin());

Unfortunately, for containers where size() is O(N) this makes the
algorithm O(N+M), which is especially problematic if the common case if
for a difference to be found early in the sequences. So I was curious
to see if range::equal addresses this problem: it has the potential to
do so since it has access to all four ends of the containers.
According to the docs the complexity is "Linear. Exactly distance(rng)
comparisons." which doesn't look hopeful, but what "rng" is it
referring to? Looking at the source, however, it seems that it does
have an optimal implementation. I think this needs a documentation fix.

This also makes me wonder if these would be useful algorithms:

bool longer(rng1,rng2);
bool equal_length(rng1,rng2);
bool size_greater_than(rng,size_t);
bool size_equal_to(rng,size_t);

Adaptors
--------

The author seems to be very enthusiastic about chaining functions using
operator|. I'm not inherently opposed to this idea but I don't think
it should be the role of this library, RangeEx, to provide it. In my
opinion this feature should be a general-purpose one. Example:

   y = sqrt ( sin (x) );

In natural language we may choose to describe that as "the square root
of the sine of x" or "take the sine of x and then take the square root
of that". If the second of those seems more "natural" to you then you
may prefer to write:

   y = x | sin | sqrt;

or maybe even

   y = (sin | sqrt) (x);

I guess that someone who is better at metaprogramming than me could
write a general-purpose operator| that can be used in this way. Here's
a start:

template <typename T1, typename T2, typename T3>
struct chained_func {
   typedef function<T1(T2)> f1_t;
   typedef function<T2(T3)> f2_t;
   f1_t f1;
   f2_t f2;
   chained_func(f1_t f1_, f2_t f2_): f1(f1_), f2(f2_) {}
   T1 operator()(T3 arg) { return f1(f2(arg)); }
};

template <typename T1, typename T2, typename T3>
chained_func<T1,T2,T3> operator|(function<T1(T2)> f1, function<T2(T3)>
f2) {
   return chained_func<T1,T2,T3>(f1,f2);
}

template <typename T1, typename T2, typename T3>
T1 operator|(T3 val, chained_func<T1,T2,T3> cf) {
   return cf(val);
}

This feature could then be available for any purpose that the user
prefers, including range adaptors, and RangeEx users who don't want to
use it can ignore it entirely. (For all I know, maybe this already
exists somewhere.)

Even if you choose to keep operator| I suggest making the documentation
less "evangelical". Quote: "Why do I prefer the operator| syntax? The
answer is readbility:" and you go on to show one version that uses
operator| and one that doesn't. I would argue that the version using
operator| is _less_ readable simply because it uses operator|, and the
reader will wonder why you're doing a bitwise OR between a vector and
a, umm, a something else. Of course anyone who reads the documentation
will understand it, but one of the best features of this library is
that, as in the example of sort(container), the functions do "exactly
what they say on the tin", and a casual reader of code using it doesn't
need to have digested the docs first.

Quote:
   * boost::copy_if( rng, pred, out )
   * boost::count_if( rng, pred )
   These algorithms are not defined and the reason is that both of them
can be
   expressed by using the original algorithm togther with a Adaptor Generator:

The fact that something can be expressed using a combination of other
things is not a sufficient justification for leaving it out. For
example, sort(container) could be expressed as
sort(container.begin(),container.end()) yet you have still provided
it. Why should I have to write:

   int n_zeros = count( container | boost::adaptors::filtered(_1==0) );

instead of

   int n_zeros = count_if(container, _1==0);

?

Summary
-------

- What is your evaluation of the design?

I would prefer that operator| be factored out. Otherwise I am
satisfied with the design.

- What is your evaluation of the implementation?

I have not properly studied the implementation, but I have not seen any
problems in the code that I have looked at.

- What is your evaluation of the documentation?

Good.

- What is your evaluation of the potential usefulness of the library?

The basic feature of being able to pass containers to algorithms makes
it useful enough to justify inclusion.

- Did you try to use the library? With what compiler? Did you have any
problems?

I did some very basic experiments with gcc 4.1 on linux, with no problems.

- How much effort did you put into your evaluation? A glance? A quick
reading? In-depth study?

A couple of hours.

- Are you knowledgeable about the problem domain?

Not especially.

- Do you think the library should be accepted as a Boost library?

I would prefer to see the operator| stuff factored out, as described
above. But given the choice between including the library as-is or not
including it at all I would choose to include it as is, since the
algorithms alone are sufficiently useful to justify inclusion.

Regards, Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk