> What you are asking for is more or less possible, but what do you plan on doing with this information?
Thanks for asking! After thinking more about the metric, it does not seem helpful anymore.
There were two scenarios which I wanted to improve:
1. Estimate service capacity. For example, if io_context load goes above 80%, we should add new nodes to avoid latency spikes. But if we measure io_context load every second, then 80% means that there may be very busy 800ms and idle 200ms. During busy 800ms there may be large latency spikes (up to 800ms), io_context is overloaded but the metric does not show that.
2. In investigations of user-facing latency issues, knowing that io_context was overloaded would be very helpful but the metric may not show that.
Scenario #2 is partially solved by the metric you suggested before (except for cases with very short operations that start and end between metric measurements).
Scenario #1 - for now I have no ideas for it.
Regards,
Dmitry