|
Boost Users : |
Subject: [Boost-users] boost multithread C++ program design for low-latency large-data exchange
From: Jack Bryan (dtustudy68_at_[hidden])
Date: 2011-09-25 00:17:34
Hi,
I am trying to solve a network flow problem by C++ multithreading.
Given a network (all nodes are connected by arcs, each arc is
connected to 2 and only 2 ending nodes, one is input node and another is
output node, each node can have multiple input arcs and output arcs),
each node needs to do some computing and then exchange
computing result data to its connected input and output nodes.
Multiple nodes can be grouped into one task, which is run by one thread. In this way, the
whole network computing workload can be partitioned into multiple tasks. All these tasks
are pushed into a boost thread pool such that all threads can run the tasks at the same
time.
But, if a node (in a thread task) needs to do data exchange with another node (in another
thread task), there is a synchronization problem. Data receiver needs to wait for data
available in the data buffer of the data sender.
My proram needs to partition the network such that each thread's task workload is assigned
as evenly as possible.
If all threads share the one-large data buffer structure, the program parallelism is not
good because the critical section is too large. Some threads have to wait the the
one-large data buffer structure unlocked even though the part of the data structure (which
is useful to them ) has been available for read or write.
For example, the one-large data buffer structure has the following buffer cells:
cell1 , cell2, cell3 , cell4.
When thread 1 is trying to write cell 1, it must lock the whole data buffer structure so
that thread 2 cannot read or write cell 2 and so on.
So, I want to break the one-large data buffer structure into multiple distinct data cells
according to the thread number so that each cell holds the data only needed by one thread
task.
For example, if we have 2 threads, we create 2 data cells that hold data needed by the 4
thread separately.
If we have 4 threads, we create 4 data cells that hold data needed by the 4 thread separately.
and so on.
My questions are:
(1) How to design the data cell ? You can see that its size is based on the number of threads.
(2) How to reduce synchronization overhead ? The critical section is small but the
overhead of geting and releasing mutex may be very high if the inter-node data exchange frequency is high.
(3) When a node's computing is done and data is written to its cell how to notify the data
receiver node such that the notification messgae is only received by the waiting thread
that run the receiver node computing task. All other unrelated nodes and threads are not
impacted.
The program is very time-sensitive and the latency of message exchange should be
controlled very toughly and reduced as much as possible.
Any help is really appreciated.
Thanks
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net