Subject: Re: [boost] Boost.Algorithm design question
From: Stephan T. Lavavej (stl_at_[hidden])
Date: 2011-10-06 18:36:28
> So, what do people prefer (and why?):
> template<typename InputIterator, typename V>
> bool none_of_equal ( InputIterator first, InputIterator last, V const &val )
> template<typename InputIterator, >
> bool none_of_equal ( InputIterator first, InputIterator last, iterator_traits<InputIterator>::value_type const &val )
> #1 is better. It follows the STL's conventions (e.g. look at std::find()) and permits heterogeneous comparisons.
> Sorry, STL. The other STL's conventions are wrong. :-)
I guess it's a question of design philosophy - like how users of Loki's smart pointers say "yay policies" and users of boost/std::shared_ptr say "boo policies".
The design philosophy that I'm advocating is "preserve the user's types whenever possible". Here, a range of T and a value of V are involved. #1 says "all I need is t == v to return something convertible to bool, and I'll tell you whether this is 'false' for all elements in the range." It doesn't attempt to convert T to V or V to T. It doesn't attempt to say v == t, or !=, or anything else. It passes no judgment on what t == v "means". Perhaps t == v does attempt to convert T to V or V to T, but that's up to the types involved.
Here's another example. Consider vector<unsigned char> and val = 0x1234UL. For all i, v[i] == 0x1234UL is false (the usual arithmetic conversions widen unsigned char to unsigned long). #1 will report that none of the unsigned chars are equal to 0x1234UL. But #2 smashes unsigned long to unsigned char, which must compile (possibly emitting a warning), and then the unsigned chars will be compared to 0x34.
The STL isn't perfect, of course. As a slightly related example, std::multiplies<T> is deficient. It requires T to be specified up front (yet this is typically redundant, because the functor is being given to an algorithm that knows the types involved), and it has a homogeneous T op()(const T&, const T&). It would be better for the struct to be a non-template, with a templated op() (this may not have been possible in the 90s due to compiler limitations), with the signature C op()(A&&, B&&) where A&& and B&& are perfectly forwarded and C is determined through decltype. Doing this for all of the helper functors would be less verbose (no specifying T up front), potentially more efficient (no converting types before performing the operation - consider using plus on string and const char *), and more generic (now you can handle unit libraries where speed * time = distance).
> Heterogeneous comparisons are surprisingly popular, and the STL is
> moving to support them better. For example, std::lower_bound() didn't
> support them in C++03, and does in C++11.
> /me realizes he is responsible for this change
> But that's *totally* different.
> /me fumbles around for the explanation
> For the version that uses "<", it's an unintentional side-effect of
> making the specification uniform. The function signature already took
> an unconstrained object by reference, and my paper was not trying to
> change that. It just happened to make the /semantics/ of using a type
> different from the value type of the sequence meaningful in that case.
> It also make the specification *simpler* overall and easier to
Heh. The new specification is great (as a Standard Library maintainer, I *really* appreciate the clarity of thought in the C++11 wording). I'm just saying I think it's greater than you realize.
> The _purpose_ was to support heterogeneity for the version that uses a
> function. The description in terms of partitioning is a more principled
> and generic way to describe the actual requirements of the algorithm.
> There was never any intention to promote the use of heterogeneous "<",
> which again, doesn't make sense.
It does make sense. For example, consider string and const char * (and it's not like I prefer const char *, it's just extremely common and users use it all the time).
I could have a lexicographically sorted sequence of const char *. I have a std::string (say its contents are "meow") and I'd like to perform a lower_bound() with op<(). The sequence is partitioned with respect to e < string("meow"), so C++11 lower_bound() handles this just fine. (C++03 technically did not, and VC8/9 would complain in debug mode that the sequence was not sorted by op<(), which is why I've thought about lower_bound() a lot...)
The trick here is that op<() means something very different for (const char *, const char *) than it does for (const char *, string), (string, const char *), and (string, string).
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk