Hi,

I'm using Boost.ASIO, not for networking, but simply to parallelize work items,
with a simple graph of "processing nodes" as detailed below. It's working fine,
but uses too much memory. I'd like insights on limiting memory use via "throttling",
or what I've also seen called "back-pressure".

At a high level, I process two files (A and B) composed of several "entries" each,
extracting a subset or a transformation of those entries, that I then write into an output file (C).
(those 3 files reach into the many GBs, thus the need for parallelism, and limiting memory use).

Extracting entries from A and B are independent operations, implemented single-threaded,
producing independent work items (for subsetting or transforming each item), A#1...A#n, and B#1..B#m.
That's the "fan-out" part, with each work-item (task) scheduled on any thread of the ASIO pool, since independent.

Writing to C is also single-threaded, and needs to "fan-in" the work posted by the A#n and B#m functors,
and I serialize that via posting to a C-specific strand (still on any thread, as long as serialized, doesn't matter).
Lets call all those tasks writing to C the C#n+m tasks, which are posted to the strand via the A#s and B#s.

My issue is that Boost.Asio seems to schedule an awful lots of A# and B# tasks, before getting to the C# tasks,
which results in accumulating in memory too many C# tasks, and thus using too much memory.

I don't see a way to force more C "downstream" tasks to be scheduled, before processing so much A and B tasks,
and accumulating pending C tasks in the work queue, thus using too much memory again.

Could someone please recommend a way to have a more balanced flow of tasks in the graph?
Or alternative designs even, if what I do above is not ideal. Is Boost.Asio even suitable in this case?

Thanks, --DD