> Don't really know why if they are equal the executor_work_guard is not
> needed (I have some suspicious, but they break if multiple threads use
> the same io_context). But I see that in the case of echo_op they can
> potentially be different and so the executor_work_guard is there,
> fine.

The work guard is not necessary in such a case because the operation at the bottom maintains a work guard for the handler's executor (which also happens to be the same as the IO object's one). Note that nobody takes advantage of this because it's in general not possible to determine this at compile time and having 2 instantiations of the template outweighs any gains from not maintaining the work count for 1 redundant work item.

> I guess that's my question. Why only "primitive" async operation need
> the executor_work_guard for the CompletionHandler's executor?

Composed operations are allowed to maintain additional work guards, but it's not necessary. The work counting mechanism indicates to the executor that "there is an operation pending that you can't see, trust me it will complete sooner or later". The operation at the bottom is responsible for suspending the composed operation and calling into the "OS" (or an abstraction layer on top of it), therefore it's the one that has knowledge about pending work.

> I guess the example is fine no matter if the proposal is accepted or not since it uses `decltype(std::declval<AsyncStream&>().get_executor()`, right?

Correct.