Boost logo

Boost :

Subject: Re: [boost] [GSoC] [Boost.Hana] Formal review request
From: Louis Dionne (ldionne.2_at_[hidden])
Date: 2014-08-01 13:38:42


pfultz2 <pfultz2 <at> yahoo.com> writes:

>
>
> > I could rip out all methods except their minimal complete definition from
> > every type class, and have those other methods be "algorithms that require
> > instances of the type class". However, you can now only have a single
> > minimal
> > complete definition for each type class, which is less flexible. Let's
> > pretend
> > that you can somehow have multiple MCDs. Now, the default definition of
> > the
> > algorithms that were ripped out can't depend on that MCD, because they are
> > out
> > of the type class. That's another loss of flexibility. All in all, I see
> > no
> > gains and only losses in ripping out methods from the type classes.
>
> Of course, you don't have to rip out every default method. It makes sense to
> keep some when you have multiple MCDs.
>
> > If you can explain to me what your proposal allows that can't be done
> > right now, I'm open for discussion.
>
> Of course, you can achieve that already in your library, but what I'm
> proposing is to separate the two purposes you have for implementing
> typeclasses. Instead, the user would implement a typeclass to fulfill type
> requirements, and would overload an algorithm to provide optimizations.

If the methods are separated from the type classes, they can't have a default
definition which depends on the MCD that's used. For example, let's pretend I
have two MCDs for Foldable; `fold_mcd` and `unpack_mcd`. `fold_mcd` requires
both `foldl` and `foldr`, and `unpack_mcd` requires `unpack`. Now, there are
multiple ways to implement `sum`. Two of them are:
    
    auto sum = [](auto xs) {
        return foldl(_+_, int_<0>, xs);
    };

    auto sum = [](auto xs) {
        return unpack(some_very_fast_sum_on_variadic_packs, xs);
    };

where `some_very_fast_sum_on_variadic_packs` would put the `xs` in an array
and then return a `int_<sum_of_the_array>`. However, in `fold_mcd`, `unpack`
is implemented inefficiently, and in `unpack_mcd`, `fold` is decent but it
is still likely not as efficient as a user-provided one. Which implementation
for `sum` do I choose? If I pick the first one, it's going to be suboptimal
with objects that used `unpack_mcd`. If I pick the second one, it's going to
be suboptimal with objects that used `fold_mcd`.

But it's not only an implementation issue. Conceptually, I see no problem
in bundling together related operations inside a type class and documenting
the fact that some groups of these operations can be used as a basis to
generate the others (which is done via MCDs). However, I _do_ see a problem
in introducing a third class of methods which would be associated to a type
class, yet outside of it, and for reasons I do not yet understand. If you
could tell me why it's _better_ (i.e. conceptually cleaner or advantageous
for the implementation), then I might be less closed to the idea.

> > I really don't think the same advice can be applied here, and I don't see
> > why it should. Type classes and regular structs are conceptually very
> > different entities, even if they are implemented in the same way.
>
> Perhaps, the advice doesn't directly apply. However, bloated typeclasses
> seems to be bad design. C++ concepts are never bloated like this nor are
> haskell typeclasses.

Honestly, I fail to see why that would be bad design __as long as__:

  1. The type class is not actually two type classes (i.e. the type class is
     really _one_ bundle of related operations).

  2. The MCD(s) are kept minimal.

If you have a type class with a lot of methods but these two points are
respected, IMO you just found a sweetspot where you could express a whole
lot of things with only a couple of base methods (the MCDs).

If you go look at the Foldable type class in Haskell, you'll see that there
are a bunch of related functions provided with the type class, yet they are
not included in it. My opinion is that they might just as well be included
in the type class, as you could then redefine them for improved performance.
I just searched online for a rationale or at least some insight about this
decision, but I did not find anything.

> [...]
>
> The definition of a monad from wikipedia is a structure that represents
> computations defined as sequences of steps, so calling it `Computation`
> makes sense. Also the definition and the name is simple and clear to a lot
> of non-FP programmers. In contrast, calling it a monad and then defining it
> as an applicative with the ability to flatten values that were lifted more
> than once, is pretty meaningless to most C++ programmers. Monads as
> computations is just one metaphor, but it is the metaphor most familiar to
> C++ programmers. Or do you think there is a better metaphor for C++
> programmers?

Right, but then Monads can also be seen as an abstract category theoretical
construction (a Functor with two natural transformations). I'm not saying
this is the right way for programmers to see it (probably isn't), but I do
think that dumbing it down to "it's a computation" is blindly taking refuge
in a metaphor. Lists are Monads, Maybes are Monads; I definitely don't think
of these as some kind of computation.

That being said, Monads are a common hurdle for people learning FP (I myself
am super new to this stuff, BTW) and I'm not sure changing the name would do
any good. To grok Monads, you have to bang your head a bit, not think of them
as one particular metaphor[1]. Also, FWIW, I think that defining Monads with
`join` (as in Hana) instead of `bind` (as in Haskell) makes them easier to
understand, but that's just me.

> Here are some other thoughts after looking at it more:
>
> - A lot of functions can be implemented with just `Iterable`, such as fmap,
> concat, cons, filter, init, take, take_while, take_until, etc. Or am I
> missing something?

Yup, you're missing `cons` and `nil`. But you're right that `List` can be
refactored, and I plan to do it. For example, `filter` can be implemented if
you give me `nil` and a `Monad`, which makes a `MonadZero` (a Monad with a
neutral element):
    
    // pseudo code
    auto filter = [](auto pred, auto xs) {
        auto go = [=](auto x) { return pred(x) ? lift(x) : nil; };
        return join(fmap(go, xs));
    };

You get `fmap` from Functor, `lift` from Applicative and `nil` from MonadZero.
Then you can filter Maybes, with `nil` being `nothing`!

> - The parameters to the functions are backwards to what most C++ programmers
> are used to, which will be the source of unendless confusion. I assume
> they are in this order because of its usefulness for curring inside
> applicatives.
> Perhaps instead, you could have another function adapter that rotates the
> first parameter to the last parameter, and then still keep the C++
> convention of putting the sequence first.

The reason this was done at the beginning is that I respected the order of
the arguments of Haskell functions with the same name. I had no reason to
change it, so I left it that way. I assume the Haskell order is optimized
to make currying easier. That being said, I now see it as a major PITA
because it turns out that I don't curry much and I have to write stuff like:

    // ewww
    all([](auto x) {
        some predicate maybe on many lines
    }, the_sequence);

So I plan on reversing the order of most function arguments in the next days:

    // ahhh! much better!
    all(the_sequence, [](auto x) {
        some predicate maybe on many lines
    });

> - It's important to note that the `decltype_` function will only work for
> constexpr-friendly types.

I don't get it? Can you please expand?

> - There seems to be a lot of copying by value, where it should use perfect
> forwarding. Has this been tested with non-copyable types and expensive to
> copy types as well?

Nope, that's in my TODO list. I plan on writing runtime benchmarks, but for
now I have put all runtime performance considerations on the side. For the
time being, if you have expensive-to-copy types, you might want to use
`std::ref` if that gives you the semantics you're looking for, and otherwise
you're out of luck.

Regards,
Louis

[1]: http://byorgey.wordpress.com/2009/01/12/abstraction-intuition-and-the-monad-tutorial-
fallacy/


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk