Boost logo

Boost Users :

Subject: Re: [Boost-users] boost::interprocess, shared memory and multi-core
From: OvermindDL1 (overminddl1_at_[hidden])
Date: 2008-12-19 13:12:45


On Fri, Dec 19, 2008 at 10:47 AM, QPlace <quiteplace_at_[hidden]> wrote:
> OvermindDL1 <overminddl1 <at> gmail.com> writes:
>
>>
>> On Thu, Dec 18, 2008 at 4:55 PM, QPlace <quiteplace <at> mail.ru> wrote:
>> > Andreas Masur <amasur <at> gmx.de> writes:
>> >
>> >>
>> >>
>> >> On Dec 17, 2008, at 10:09 AM, QuitePlace wrote:
>> >>
>> >> > Another look at this question is - should one program the inter-
>> >> > process/inter-
>> >> > thread communication first and worry about multi-core later? Or
>> >> > something
>> >> > should be planned at the development stage?
>> >>
>> >> As with other areas such as exception handling etc., I would advise
>> >> you to take multi-core technology into account at design time.
>> >> Programing efficiently for more than one core is certainly not as easy
>> >> as making cookies and thus if you don't plan up front you are likely
>> >> to lose efficiency later on. Especially in your example about
>> >> interprocess/interthread communications.
>> >>
>> >> Ciao, Andreas
>> >>
>> >
>> > This is exactly what worries me. Should that planning be done "outside" of
>> > boost framework? For example, if I am using features provided
> by ::interconnect
>> > library - how am I suppose to take multi-core technology into account if,
> say,
>> > "shared memory" already locks me into a solution where I don't have much
>> > control on anything multi-core and where "multi-core" is not even present
> as a
>> > concept?
>> >
>> > Your comments are very much appreciated
>>
>> Just some side comments on multi-cpu program design. If you want a
>> program truly multi-cpu, without slowing down the more cpu's you
>> access, then the programming style you would need to use is the Actor
>> style, or one of its kin. Basically you need to treat every Actor as
>> its own state, pretend there is no such thing as global state, so do
>> not use statics, no globals, etc... It is not easy to do in C++, but
>> it can be replicated well enough. Your problem domain will also need
>> to be easily separated so it can be operated in parts, if it cannot
>> then you have a bigger problem then just the design, and almost all
>> programs can be split up to some extent. Read up on the Actor model,
>> it will give you plenty of ideas. Perhaps work with Erlang a bit to
>> get a feel for the Actor style. The knowledge you come away with is
>> invaluable for designing scalable multi-threaded apps.
>>
>
>
> Thank you for your comments, I will definitely follow your advise. But, coming
> back to "shared memory" issue and usage of it - what is your opinion on the
> following scenario: Say, there is a producer of data on one core and multiple
> consumers on other cores. "Shared memory" should have some sort of exclusive
> lock on it in order to support write/read ops, shouldn't it? If it is true
> then there might be some sort of a bottleneck using "shared memory" for data
> pumping in multi-core system like that? May be a network data exchange between
> the cores is better from the scalability standpoint?

I use shared memory in my Actor-style libraries, but they do not use
locks, rather they use atomic CAS assembly (I have to use assembly for
it since it is not in the current C++ standard, they are in the next
C++ standard though, will be glad to drop the assembly then).
Compared to using locks, a circular linked list in shared memory that
is managed with non-locking primitives is very fast. Have to change
the backend coding style a touch, but I have noticed definite
performance enhancements, and since nothing blocks there should be no
speed hit for any number of threads. The only issue is if there is a
contention (multiple threads writing to the same memory at the *exact*
same time, and I mean exact near to the nanosecond, in which case the
atomic CAS fails causing a stall in the pipeline for about 12 cycles
on AMD cpu's and about 40 cycles on Intel CPU's, although that might
be changed with the core 2 duo's, have not tried those yet), but just
reissue the command after reading in and accounting for the new data
and all is good.

If you intend to want to scale arbitrarily, then there are two main
things I have learned. First, make sure that what you are coding
*can* be split up in the first place. Second, do not use locks except
in a few very rare circumstances, use atomic CAS (which is what
kernel's use for example). There is plenty of info on Atomic
Compare-And-Swap on the 'net, and a library or two if you do not fancy
writing the assembly yourself (or just wait till C++09 comes out with
supporting compilers).


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net