To asio users,
 
  I am confused by the results from my experiments with asio recently.

  Firstly, am I correct that, when OS I/O buffer is ready to be read, io_service will only pick an available thread from its thread pool to notify the socket's complete handler? For example, if I have a single io_service with 1 thread, and 100 udp sockets executed async_read_from. Hypothetically, if they all simultaneously ready to read from OS kernel buffer, would the io_service notify each socket's complete handler one per time because I only allocated 1 thread to it?

  Secondly, is it possible to assign workload to different cpu cores through asio? I did an experiment by creating 2 io_services in a program, and each associated with 50 udp sockets for receive same data at the same rate. My theory was that, 2 cores from my 8 cores i7 cpu would be utilize with similar rate, but the result was, only the first core has been fully utilized. Could it be because I/O operation is control by OS only?
 
  Thirdly, at what situations should I use multiple threads in a io_service? I increased 50 threads for both io_service in my last experiment setup, and perform the same test. My theory was, when more threads allocated in a io_service , this should help io_service to find a free thread to trigger sockets' complete handler concurrently, therefore, sockets didn't have to wait too long to make the next async read call, and minimize the risk of buffer overflow. The result was completely opposite; the cpu usage has obviously increased, but performance was dramatically decreased because many packet loss was detected. Would it be because of context switching? I thought doesn't matter how many thread were allocated to a io_service, same number of context switching are still required (switching new context from the next complete handler to core to execute after the previous one completed), am I right?
 
B.R
Bryan