​This might be a slightly off-topic question, not sure, but it's related I think:
Does it seam useful to people using signals2 or similar library to consider another kind of signal type
which would take an Executor [1] concept instance on construction, system's executor (thread-pool)
by default, and would then use that executor for dispatching calls? (one task pushed by listener).
That is, when a signal is called, it pushes as many tasks in the executor as there is listeners and returns.
It makes the dispatching either "in order" if the executor is a strand or not if it's not.
This signal type would not make any guarantee about the which thread would call the listening objects.

It seems to me that in some cases I have encountered (highly concurrent task-based systems with a lot of message dispatching
between "actors"-like objects), 
this kind of model might be more interesting performance-scalability-wise than the current mutex-based dispatching of signals2.
I didn't try to compare and measure performance though and I might be totally wrong.

[1] https://github.com/chriskohlhoff/executors