Try the following to see if the (probably small) reduction in memory allocation / deallocation helps (I really doubt it will help much - I think there's something else that is causing problems, since you're seeing some major performance problems that probably have nothing to do with the code sample you posted):
 
Change:
 
IProtocolPacket* packet = iter->Next();
std::vector<asio::const_buffer> buffs(2);
buffs.push_back(asio::const_buffer(packet->GetHeader(), packet->GetHeaderSize());
buffs.push_back(asio::const_buffer(packet->GetBody(), packet->GetBodySize());
 
To:
 
IProtocolPacket* packet = iter->Next();
boost::array<asio::const_buffer, 2> buffs = {
  asio::buffer(packet->GetHeader(), packet->GetHeaderSize()),
  asio::buffer(packet->GetBody(), packet->GetBodySize())
};
 
(Warning - uncompiled and untested.)
---
 
Hmmm - I see something that could be suspicious: "packet" is a local variable (short lifetime) - are the pointers obtained from packet valid (point to good memory) until the handle_send function is called? Asio does not own the data that the buffer objects wrap.
 
Otherwise, you might want to post your io_service run code to see if there's something suspicious there.
 
Cliff