Here's an example of what might happen if a composed operation doesn't maintain work guards properly:
https://wandbox.org/permlink/aqsGDNJWTmFd7PdC

Without the work_guard the coroutine never completes. If you add the work_guard, everything works correctly.

In general, the Executors in the TS and ASIO don't do anything fancy with the information about pending work, because they have wide contracts. In theory, the TS allows Executors to have much narrower contracts.
In principle, the last call to `on_work_finished()` is allowed to delete all shared state related to an operation, thus hijacking the work counting mechanism to replace the need for `shared_ptr`.

Note that I'm describing the behavior of the current reference implementation of the TS (ASIO) and I'm currently trying to figure out why the behavior differs. Analyzing standardeese is hard :).

On Mon, Dec 3, 2018 at 7:06 PM Cristian Morales Vega <cristian@samknows.com> wrote:
On Sat, 24 Nov 2018 at 23:46, Damian Jarek <damian.jarek93@gmail.com> wrote:
>
> > Don't really know why if they are equal the executor_work_guard is not
> > needed (I have some suspicious, but they break if multiple threads use
> > the same io_context). But I see that in the case of echo_op they can
> > potentially be different and so the executor_work_guard is there,
> > fine.
> The work guard is not necessary in such a case because the operation at the bottom maintains a work guard for the handler's executor (which also happens to be the same as the IO object's one). Note that nobody takes advantage of this because it's in general not possible to determine this at compile time and having 2 instantiations of the template outweighs any gains from not maintaining the work count for 1 redundant work item.
>
> > I guess that's my question. Why only "primitive" async operation need
> > the executor_work_guard for the CompletionHandler's executor?
> Composed operations are allowed to maintain additional work guards, but it's not necessary. The work counting mechanism indicates to the executor that "there is an operation pending that you can't see, trust me it will complete sooner or later". The operation at the bottom is responsible for suspending the composed operation and calling into the "OS" (or an abstraction layer on top of it), therefore it's the one that has knowledge about pending work.
>
> > I guess the example is fine no matter if the proposal is accepted or not since it uses `decltype(std::declval<AsyncStream&>().get_executor()`, right?
> Correct.

I actually think I understood everything you said, and I do agree with
all of it. But I still have a bad feeling of not completing
understanding how work guards work as they work.

I guess my main issue is that I see _one_ single io_context.run()
which needs to know it should not return, but there are _two_
Executors, both with the need for work guards.

When trying to find an example I end up seeing that any obvious
CompletionHandler Executor's on_work_started() simply ends up
delegating the call to the IO object's one (for example, strand:
https://github.com/boostorg/asio/blob/5ac54042c99d4f1595d4041b00b9b28752eda16e/include/boost/asio/strand.hpp#L159)
or does nothing (use_future,
https://github.com/boostorg/asio/blob/5ac54042c99d4f1595d4041b00b9b28752eda16e/include/boost/asio/impl/use_future.hpp#L218).
So I'm struggling to see why there would ever, in practice, be the
need for two work guards for one single asynchronous operation, even
if the CompletionHandler Executor is different to the IO object's one.
I guess potentially the CompletionHandler Executor could do
"something" with that information, but... what?

You said "the operation at the bottom maintains a work guard for the
handler's executor". But isn't Networking TS 13.2.7.10 saying the
operation at the bottom maintains a work guard for both the handler's
executor and the IO object's one? If so, echo_op doesn't need any
work_guard at all, does it?

You said "Composed operations are allowed to maintain additional work
guards, but it's not necessary.". I would agree to this, and in the
specific case of echo_op is not necessary, is it? If it's not
necessary, why does the echo_op example use one? If it's going to use
one, shouldn't it use two to be coherent with 13.2.7.10? The comments
in lines 91-94 of the example seem to reference Networking TS to say
that only the IO object's one is necessary (and that it *is*
necessary).