Boost logo

Boost :

Subject: Re: [boost] [graph][heap][coroutine] interruptable dijkstra shortest path function
From: Giovanni Piero Deretta (gpderetta_at_[hidden])
Date: 2012-10-05 11:47:33


On Fri, Oct 5, 2012 at 4:32 PM, Oliver Kowalke <oliver.kowalke_at_[hidden]>wrote:

> Am 05.10.2012 14:07, schrieb Alex Hagen-Zanker:
>
> Have you seen
>>> http://cpp-next.com/archive/**2012/09/portable-stackless-**
>>> coroutines-in-one-header/<http://cpp-next.com/archive/2012/09/portable-stackless-coroutines-in-one-header/>
>>> ?
>>>
>>> This approach might be sufficiently lightweight that it would allow us
>>> to maintain one version of the algorithm with some macros.
>>>
>>>
>> It is very nice indeed. I did not try it with the shortest path
>> algorithm, but instead a simple sum algorithm where yield is used to get
>> the running total.
>>
>> Using the stackless boost asio coroutine from the link you provided is
>> just as simple as using the proposed boost coroutine.
>>
>> Summing 10,000,000 times the number 2 while yielding intermediate sums
>> gave me the following results:
>>
>> sum plain : 20000000 time: 0.01
>> sum fragmented : 20000000 time: 0.232
>> sum boost coro : 20000000 time: 0.448
>> sum asio coro : 20000000 time: 0.034
>>
>> Without yielding intermediate results, I got the following:
>>
>> sum plain : 20000000 time: 0.008
>> sum fragmented : 20000000 time: 0.029
>> sum boost coro : 20000000 time: 0.027
>> sum asio coro : 20000000 time: 0.012
>>
>> I'd say it it definately worth investigating to use the boost asio method
>> as a means of creating algorithm objects. From this example it appears that:
>>
>> 1. It is an efficient way of interrupting and resuming algorithms.
>> 2. It leaves the algorithms coded in a readable state, although it seems
>> that all local variables should become member variables.
>> 3. If you decide not to interrupt the algorithm, the additional cost due
>> to being part of an algorithm object is less than for the alternatives.
>>
>> I am not working on this anymore, but please see the source for above
>> trial in the attachment.
>>
>
> did you compile the code with optimization switched on?
> preserving fpu state has a significant influence on performance too (at
> least for boost.coroutine I measure a significant influence).
>
> stack-less coroutines (preprocessor trick with switch statement) will
> always be faster than stack-full coroutines (boost.coroutine) because
> stack-full coroutines have to preserver
> the stack-frame and some registers and because it exchanges the stack and
> instruction pointer it kicks the CPU predictions (shadow stack pointer ...).
>

That's true; the cost could be reduced by providing a buffered coroutine
variant (i.e. stack switching is actually done only after n yields); the
apropriateness of buffering depends on the use case.

But the biggest reason for the performance difference is that the stackless
coroutine code can be completely inlined inside the caller as the compiler
can "see thru" the yield and resume. That's the reason while, even without
yielding, the stackless code is faster. Having inlineable stack-full
coroutines would require heroic compiler help.

-- gpd


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk