Boost logo

Boost Users :

Subject: Re: [Boost-users] [Regex | Xpressive] Efficiently "grepping" large files
From: Chris Cleeland (chris.cleeland_at_[hidden])
Date: 2011-08-17 09:30:40


On Wed, Aug 17, 2011 at 7:53 AM, Thomas Luzat <thomas_at_[hidden]> wrote:
> On 2011-08-17 14:43, Chris Cleeland wrote:
>>
>> Have you considered mmap'ing the file and allowing all your activity
>> to occur on the mmap'd file?  That way the VM subsystem would worry
>> about paging things in or out as necessary, and there wouldn't be any
>> issues with contention across multiple threads. Of course, if you
>> don't have mmap on your system...
>
> I have considered mmaping or reading through the whole file, but
> benchmarking so far has shown that I am mostly I/O-limited. By synchronously
> working on blocks in parallel I avoid disk seeks as much as possible.

Ah, very well. I've also had similar situations wherein mmap provided
no performance benefit over reading through well-tuned buffered i/o
since accesses were mostly sequential.

> I might offer such an implementation for cases where seeks are not that
> expensive (such as for SSDs or slower CPUs).

If main memory is large enough, a seek is likely to hit an existing
page rather than something that must be paged in.

> Another problem is that mmap
> alone is not a complete solution in itself on 32 bit systems given that
> files may very well be larger than a few GB, but this can be solved now,
> too.

That's a much more difficult issue to deal with. Not impossible, but
definitely more difficult, and would nudge me towards the solution you
originally inquired about.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net