Re: " it is still possible that all tcs in this suite can be run in parallel. May be some sort of work stealing algorithm ", that reminds me of something already built by Intel in Intel's Threading Building Blocks library. There is an open source version of it, but it scales well with increasing numbers of cores; and it is designed and implemented in such a way that the programmer does not need to worry about the tedious details of creating threads. I have examined it only for number crunching, but I don't see a reason it couldn't be used in designing and implementing test suites. It does, though, need a slight thift in mindset relative to what you'd normally do in multithreaded programs or conventional numeric algorithms (something you can see only by actually playing with it to do trivially simple things fast, like matrix multiplication). Instead, for example, of putting a lock around output, you'd design the program to use a class that collects the results of the tests, and then outputs it in a sensible order to some stram (standard out or a file stream). It might be worth a look (by programmers smarter than me), to see if it can hep in the context of this discussion, and to what extent.