Boost logo

Boost :

Subject: Re: [boost] Designing a multi-threaded file parser
From: Aaron Boxer (boxerab_at_[hidden])
Date: 2016-04-23 09:02:04


On Fri, Apr 22, 2016 at 2:18 PM, Niall Douglas <s_sourceforge_at_[hidden]>
wrote:

> On 22 Apr 2016 at 10:31, Aaron Boxer wrote:
>
> > My impression is that memory mapping is best when reading a file more
> than
> > once, because
> > the first read gets cached in virtual memory system, so subsequent reads
> > don't have to go to disk.
> > Also, it eliminates system calls, using simple buffer access instead
> >
> > Since memory mapping acts as a cache, it can create memory pressure on
> the
> > virtual memory system,
> > as pages need to be recycled for the next usage. And this can slow things
> > down, particularly when reading
> > files whose total size meets are exceeds current physical memory.
> >
> > In my case, I am reading the file only once, so I think the normal file
> IO
> > methods will be better.
> > Don't know until I benchmark.
>
> You appear to have a flawed understanding of unified page cache
> kernels (pretty much all OSs nowadays apart from QNX and OpenBSD).
>
> Unless O_DIRECT is on, *all* reads and writes are memcpy()ied from/to
> the page cache. *Always*.
>
> mmap() simply wires parts of the page cache into your process
> unmodified. Memory mapped i/o therefore saves on a memcpy(), and is
> therefore the most efficient cached i/o you can do.
>
> If you are not on Linux, a read() or write() of >= 4Kb on a 4Kb
> aligned boundary may be optimised into a page steal by the kernel of
> that memory page into the page cache such that DMA can be directed
> immediately into userspace. But, technically speaking, this is still
> DMA into the kernel page cache as normal, it's just the page is wired
> into userspace already.
>
> So basically you only slow down your code using read() or write().
> Use mapped files unless the cost of the memcpy() done by the read()
> is lower than a mmap(). This is typically 16Kb or so, but it depends
> on memory bandwidth pressure and processor architecture. That part
> you should benchmark.
>
> Obviously all the above is with O_DIRECT off. Turning it on is a
> whole other kettle of fish, and I wouldn't recommend you do that
> unless you have many months of time to hand to write and optimise
> your own caching algorithm, and even then 99% of the time you won't
> beat the kernel's implementation which has had decades of tuning and
> optimisation.
>

Thanks a lot for the detailed explanation.

I tested this on windows : fread/fwrite and memory mapped both gave the
same performance
in my use case. So, it doesn't look like mem mapping will make much of a
difference on windows
for my case. Need to test this on Linux.

Aaron

>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk