Boost logo

Boost :

Subject: Re: [boost] [context review] Several Questions
From: Oliver Kowalke (oliver.kowalke_at_[hidden])
Date: 2011-03-21 05:36:13


-------- Original-Nachricht --------
> Datum: Mon, 21 Mar 2011 01:26:31 -0700 (PDT)
> Von: Artyom <artyomtnk_at_[hidden]>
> An: boost_at_[hidden]
> Betreff: [boost] [context review] Several Questions

> Hello,
>
> Before I start working with this library for review I need
> a clarification of several statements to understand
> actual usefulness of this library:
>
> >
> > A context switch between threads costs usually thousands of CPU
> > cycles on x86 compared to a user-level context switch with few
> > hundreds of cycles.
> >
>
> Performance
> -----------
>
> One of the first things I did when read this statement
> and take a look on measurements is to write my own
> benchmark of context switch.
>
> I've run it on:
>
> - Intel i5 2.5GHz CPU 2 cores 4 threads.
> - Linux x86_64, Ubuntu 10.10
>
> And compared context switch performance using
> sched_yeild and jump_to. I used default boost::context<>
> settings and default build (against Boost-1.46.0)
> and used dummy switch to measure actual operations
> beyond the switching
>
> There were two threads giving each one a time quanta and measured
> how much time context switch takes (of course included some warm up)
>
> sched_yeild - 377us
> Boost.Contex - 214us
> Dummy - 10us
>
> All tests done on a single CPU using taskset 0x1 ./test params
>
> So finally I can see that Boost.Context does not behave **much**
> better then OS context switch?
>
> I understand that I probably used by default ucontext but it
> is default and this is how it is going to be shipped by
> most distributions as it would probably be the safest.
>
> I need to see rationale, limitations and so on,
> in very explicit way.

Hmm - did you see the performance measuring test coming with the lib (libs/context/performace)?
It counts the CPU cycles taken by a switch. The tool was used to compare ucontext_t (which does some system calls for preserving the signal mask) and fcontext_t (which doesn't perserve the signal mask and is which is implemented in assembler).

I modified it and I measured following (AMD Athlon(tm) 64 X2 Dual Core Processor 4400+):

cmd: performace -s -i 100

sched_yield(): 2108 cycles
ucontext_t: 1795 cycles
fcontext_t: 156 cycles

Maybe you can send me your test code?

> Usefulness of N:M model (or even 1:M model)
> -------------------------------------------
>
> Long time ago OS developers used N:M threading
> model where several kernel threads were mapped
> to several user-space thread:
>
> - Solaris < 9 or 10
> - Linux <= 2.4,
> - FreeBSD < 8
>
> All these OSs today moved to 1:1 model as most efficient
> one, so I can't buy it that N:M or even 1:N model
> in this case would give performance advantage.
>
> As you know, even POSIX 2008 deprecated ucontext at all.
>
> So I would like to see some very good and based rationale
> with description of specific use cases and examples.

Indeed the OS develoeprs moved from N:M to 1:1 (so as Solaris did) and the N:M paradigm may not usefull for OS but may for user-land apps.

But I use it for instance in boost.task to allow a task create and wait on sub-tasks using boost.context. This allows work-stealing by other worker-threads.

I'm working on boost.strand and it does things like StratifiedJS ( http://onilabs.com/stratifiedjs - thanks to Fernando Pelliccioni).

In general it is useable in all cases where the code may want to jump to another execution path but wants to come back and have all local data preserved in order to continue its work.

sol ong,
Oliver

-- 
NEU: FreePhone - kostenlos mobil telefonieren und surfen!			
Jetzt informieren: http://www.gmx.net/de/go/freephone

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk