Boost logo

Boost Users :

Subject: Re: [Boost-users] mapped_region locks on multithread
From: Brian Budge (brian.budge_at_[hidden])
Date: 2012-06-09 11:51:46


On Sat, Jun 9, 2012 at 1:34 AM, Mikhail Eremin <meremin_at_[hidden]> wrote:
> Hello,
> SETTING:
> - There is an application, written using Boost Template library, meant for
> QUICK processing of bulk text files (cca 50-100Gb each).
> - There is a huge, quick and expensive piece of hardware with HUGE amount of
> RAM and multiple CPU.
> - There is [theoretically] any possible UNIX-like OS, even Microsoft
> Windows(R) is considered.
> - Boost Thread Pool extension is used; previously memory mapped files
> through memory_segment have been used, now got rid of the entire
> Boost::interprocess.
> - There are NO explicit data items in the application's algorithm to be
> shared by threads, each has its own piece of input file, thus - there is NO
> explicit concurrency.
> PROBLEM:
> - Ensure fast processing without locks and threads sleeping.
> Currently the threads sleep on some internal mutex. We thought it's been
> boost::interprocess (specifically - mmap, wrapped by a mutex), but it
> apparently isn't so.
>
> SPECIFIC QUESTION:
> - How could we get rid of Boost locks?
>
> Mike

Okay, so you have enough memory to map an entire file into memory at
once? Are the files read-only? Where are you using a boost lock to
get rid of? Probably the threadpool library uses a lock on a queue
somewhere?

You could certainly write this without a threadpool. I'd imagine that
the cost of launching threads will be insignificant compared to
running the algorithm on these regions:

std::vector< std::pair<uint64_t, uint64_t> > regions;
boost::atomic<size_t> nextRegion;

struct RegionThread {
    /*mmap info variables*/
    RegionThread(/*mmap info*/) : /*mmapinfo member(mmap info) */{}
    void operator() () {
         while(true) {
             size_t next = nextRegion.fetch_add(1);
             if(next >= regions.size()) { break; }
             std::pair<uint64_t, uint64_t> const &region = regions[next];
             /*perform algorithm on region of mmapped file...*/
        }
    }
};

void operateOnFile(/*some mmap info*/) {
    regions.clear();
    // set up regions for this file
    nextRegion
    boost::thread_group tg;
    for(size_t i = 0; i < boost::thread::hardware_concurrency(); ++i) {
        tg.create_thread(RegionThread(/*mmap info*/));
    }
    tg.join_all();
}

If you can statically schedule the work into sets of regions that each
thread will work on, this is even easier, and can be done without even
an atomic variable.

  Brian


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net