Boost logo

Boost Users :

From: Ovanes Markarian (om_boost_at_[hidden])
Date: 2006-08-22 07:26:50


I would like to shortly provide my experience with boost::regex and multi-threading. I used defer
library from the boost vault to create scanning jobs and execute them later on.

Configuration: boost 1.33, Visual Studio 8.0 Express Edition on single CPU Intel P4 3.2 GH with 1
GB RAM (to be honest I don't know if this CPU has hyperthreading)

I ran a scanner on cpp source files which were really big and initial scan with single thread has
shown that I require around 30 minutes to make a complete scan.

After using defer library for jobs and creating the maximum of 200 threads I was able to
acomplish the scan within 6 minutes. 200 threads might sound very hight, since a thread quantum on
Windows is by default around 100 ms, but my practical measurements have shown that this value was
the most effective on this system. I was able to improve performance by factor 5.

My suggestion would be to try using defer or similar library and scan from different "really
multiple" threads.

With Kind Regards,

Ovanes Markarian

On Tue, August 22, 2006 12:51, John Maddock wrote:
> Roy Emek wrote:
>> Iâ?Tm using boost-regex library, and Iâ?Tm running
>> into multi-threading scalability issues. When tested
>> on a machine with several CPUs, my performance tests
>> show improvement of about 60-70% when moving from a
>> single thread to 2 threads.
>
> 60-70% doesn't sound too bad to me ?
>
>> Question 1: Is anyone aware of this problem? Is there
>> a known solution?
>>
>>> From some testing weâ?Tve done, it seems that the
>> cause of the scalability issue is problematic
>> implementation of std:allocator. This is a known issue
>> on some operating systems (e.g., Solaris 8). Some of
>> the boost::regex classes (e.g., match_results) accept
>> a user-defined allocator. However, as far as I can
>> understand thereâ?Ts no way to completely override the
>> use of std::allocator.
>>
>> Question 2: is there a way to completely prevent boost
>> / regex from using std::allocator? If no, can such a
>> capability be added?
>
> Actually it used to be there but I was asked to remove it :-(
>
> The question you need to ask is which part of the regex lib is causing
> problems, there are three main areas that use memory allocation:
>
> 1) Regex construction: uses std::basic_string and other STL classes along
> with shared_ptr etc etc, there's no way to change the allocator here.
> However, the question you need to ask is "do my threads need to construct
> regexes at all?" Boost.Regex is designed so that multiple threads can share
> the same immutable regex object.
>
> 2) Regex matching: the library needs a stack for the FSM to work on. On
> Unix like systems this memory is cached and obtained from the routines near
> the end of regex.cpp, you could replace these with thread specific
> allocators if this is the overhead. Or.... you could define
> BOOST_REGEX_RECURSIVE and rebuild everything: regex will then use a
> program-stack-recursive algorithm that saves on memory allocations, but runs
> the risk of stack overflow. If you are on a platform that can protect you
> from stack overruns then this can speed things up a little for single
> threaded apps, and maybe rather more for multithreaded ones.
>
> 3) The final match_results allocation: only once per match/search operation
> does match_results actually allocate any memory - right at the end when the
> object is written to with the results. You can avoid even that, if you
> reuse match_results objects so that they already contain a large enough
> buffer when they're actually used.
>
> HTH, John.
>
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users
>


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net