Boost logo

Boost Users :

Subject: Re: [Boost-users] asio stability and scalability
From: Eric Twietmeyer (zimbus26_at_[hidden])
Date: 2009-10-19 16:37:41


"Igor R" <boost.lists_at_[hidden]> wrote in message
news:cfe0a3cf0910191307t22644a7r48ead0315e3f82ba_at_mail.gmail.com...
>We use asio in a windows-based project, which involves a lot of networking,
>including tens or even hundreds of simultaneous data-streams - and it works
>well.
>However, as Marat said, with asynchronous design one should be very careful
>about the lifetimes of the involved objects, because any flaw can cause in
>various kinds of memory/stack
>corruption. That's why asio-based designs usually extensively utilize
>shared_ptr's and shared_from_this idiom, instead of attempts to manage
>objects' lifetime manually.

I understand completely. I have written quite a bit of IOCP based
asynchronous networking code in the past, but directly, not using asio.
However, all of my previous experience was with TCP. This current project
requires UDP as well as TCP, and it seems somehow to be the UDP
communication in particular that is initiating the problem (we can sort of
remove the UDP portion and then this problem does not occur).

We have of course gone over this with a fine tooth comb, and I did use
everywhere the shared_ptr and shread_from_this idiom. Very perversely the
stack corruption is occuring "out of line", as it were. Using VMware's
Replay Debugging we have witnessed in each case that the memory corruption
occurs while the user code is doing something completely innocuous (for
instance reading memory), or during an internal call to sysenter in the guts
of one of the WSARecv/Send functions (where this function is known to have
valid inputs and completes successfully with valid output). Speaking with a
Microsoft Support Tech, the only explanation we can come up with is that an
IO interrupt is occuring between user space instructions and this IO
interrupt processing is somehow trashing user space memory. The MS guy
indicates that this can happen (buggy drivers, etc), but he has never seen
it manifest in this way. But as we have seen this occur on several
different Windows OS boxes (with various OS versions and various hardware),
it seems unlikely to be a driver issue. Also this same behavior has been
witnessed when asio is compiled with BOOST_ASIO_DISABLE_IOCP.

After these several weeks of intense debugging efforts we are basically at a
loss. I'm very reluctant to move from asio to ACE or some other framework,
I like the way asio is structured, and it will take quite a bit of time to
reimplement this.

Thanks for the input.

-Eric


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net