Subject: Re: [boost] Designing a multi-threaded file parser
From: Aaron Boxer (boxerab_at_[hidden])
Date: 2016-04-23 09:02:04
On Fri, Apr 22, 2016 at 2:18 PM, Niall Douglas <s_sourceforge_at_[hidden]>
> On 22 Apr 2016 at 10:31, Aaron Boxer wrote:
> > My impression is that memory mapping is best when reading a file more
> > once, because
> > the first read gets cached in virtual memory system, so subsequent reads
> > don't have to go to disk.
> > Also, it eliminates system calls, using simple buffer access instead
> > Since memory mapping acts as a cache, it can create memory pressure on
> > virtual memory system,
> > as pages need to be recycled for the next usage. And this can slow things
> > down, particularly when reading
> > files whose total size meets are exceeds current physical memory.
> > In my case, I am reading the file only once, so I think the normal file
> > methods will be better.
> > Don't know until I benchmark.
> You appear to have a flawed understanding of unified page cache
> kernels (pretty much all OSs nowadays apart from QNX and OpenBSD).
> Unless O_DIRECT is on, *all* reads and writes are memcpy()ied from/to
> the page cache. *Always*.
> mmap() simply wires parts of the page cache into your process
> unmodified. Memory mapped i/o therefore saves on a memcpy(), and is
> therefore the most efficient cached i/o you can do.
> If you are not on Linux, a read() or write() of >= 4Kb on a 4Kb
> aligned boundary may be optimised into a page steal by the kernel of
> that memory page into the page cache such that DMA can be directed
> immediately into userspace. But, technically speaking, this is still
> DMA into the kernel page cache as normal, it's just the page is wired
> into userspace already.
> So basically you only slow down your code using read() or write().
> Use mapped files unless the cost of the memcpy() done by the read()
> is lower than a mmap(). This is typically 16Kb or so, but it depends
> on memory bandwidth pressure and processor architecture. That part
> you should benchmark.
> Obviously all the above is with O_DIRECT off. Turning it on is a
> whole other kettle of fish, and I wouldn't recommend you do that
> unless you have many months of time to hand to write and optimise
> your own caching algorithm, and even then 99% of the time you won't
> beat the kernel's implementation which has had decades of tuning and
Thanks a lot for the detailed explanation.
I tested this on windows : fread/fwrite and memory mapped both gave the
in my use case. So, it doesn't look like mem mapping will make much of a
difference on windows
for my case. Need to test this on Linux.
> Unsubscribe & other changes:
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk