because the review of boost.fiber has been announced I believe it could help in your scenario.

With boost.fiber you could create as many threads as cores are on your system (lets say 2 threads because of 2 cores).
On each thread you create 500 fibers which run concurrently on the threads - in fact fibers are some lightwight userland threads.
On benefit is that boost.fiber integrates into boost.asio's async result framework, e.g. you don't need to scatter your code with callbacks, e.g. you can merge start_send() and handle_send() into one function,
start_receive() and handle_receive() in one function etc.

// fiber gets suspended until message was read
boost::asio::async_read(
            socket_,
            boost::asio::buffer( channel),
            yield[ec]);
    if ( ec) throw std::runtime_error("some error");

// fiber gets suspended until message was written
boost::asio::async_write(
                socket_,
                boost::asio::buffer( data_, max_length),
                yield[ec]);


You can find more detailed infos in boost.fiber's docu: http://ok73.funpic.de/boost/libs/fiber/doc/html/fiber/asio.html
The library itself contains several examples demonstrating the usage together with boost.asio.