Hello all,


I'm during proxy application development.
I'm forced to use external application protocol over TCP.

I'm running application on windows and linux (ubuntu).

Actors:
- client app (sends data)
- server app (receives data and sends this same data back)
- MIM - my proxy

Server receives all data, tries to send it back, but variously proxy stuck waiting for complete data.

When issue appears:
- The issue I'm struggling with, appears only on linux.
- The issue shows up in various cases but always when transferring bulk data (bigger then my buffer size (essentially - 20kB).

When issue doesn't appears:
- The issue doesn't exist without my proxy (direct client-server connection). Both of applications are allocating this bulk data.
So they send and receive data in single step.
- during step debugging :)


One problem I see is that protocol of that system is designed in a way that it:
- uses non blocking sockets
- does not check each time is socket ready (only on "handshake") which causes EAGAIN error.

How i do networking:

CLIENT -> PROXY -> SERVER
---------------------
client_send(big_data) -> async_read(proxy_cli_sock ...) -> async_write(proxy_serv_sock ...) -> server_select() -> server_recv()
= * = (until whole big data will be transfered)
I haven't observed any issues here
Client stack trace:

fcntl64(3, F_SETFL, O_RDONLY) = 0
send(3, "\1\1\0\0\0315\f0"..., 800025, 0) = 800025

Server stack trace:
(This is repeated many times, with some EAGAING errors on server)
select(5, [0 3 4], [], NULL, {0, 665119}) = 1 (in [4], left {0, 664454})
recv(4, "\1\1\0\0\0315\f\0", 8, 0) = 8
recv(4, "\0\0\60"..., 800017, 0) = 59992
recv(4, 0xb36d7a90, 740025, 0) = -1 EAGAIN (Resource temporarily unavailable)
gettimeofday({1257845153, 280644}, NULL) = 0
select(5, [0 3 4], [], NULL, {0, 663455}) = 1 (in [4], left {0, 662128})
recv(4, "@\0\0\5"..., 740025, 0) = 56384
..
..

(but finally whole message is received by server)


SERVER -> PROXY -> CLIENT
---------------------
server_select()-> serv_send(proxy_fd) -> async_read(proxy_serv_sock ...) -> async_write(proxy_cli_sock ...) -> client_select() -> client_recv(proxy_fd)

select(5, [0 3 4], [4], NULL, {0, 651671}) = 1 (out [4], left {0, 651666})
send(4, "\1\00"..., 800015, 0) = 65536
send(4, "\0\\0"..., 734478, 0) = 114688
send(4, "\0\\0"..., 619790, 0) = -1 EAGAIN (Resource temporarily unavailable)
gettimeofday({1257845153, 293482}, NULL) = 0
...
select(5, [0 3 4], [4], NULL, {0, 645928}) = 1 (out [4], left {0, 644732})
send(4, "\0\0\0"..., 46350, 0) = 46350

*** Above is saying that transfer is complete, but I'm missing some (unfortunately I haven't inspect which one) data packet.
And here is where my proxy is waiting for a packet, (ass well as client is)

gettimeofday({1257845153, 299914}, NULL) = 0
select(5, [0 3 4], [], NULL, {0, 644185}) = 0 (Timeout)
....
....

I've attached whole stack trace of client and server.

Please point me any kind of solution.