Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] Designing a multi-threaded file parser
From: Aaron Boxer (boxerab_at_[hidden])
Date: 2016-04-29 20:58:53

Next message: Barrett Adair: "Re: [boost] [RFC] CallableTraits: 47 traits and metafunctions for "callable" types"
Previous message: Hartmut Kaiser: "Re: [boost] New Lib "Beast", HTTP + WebSocket protocols"
In reply to: Aaron Boxer: "Re: [boost] Designing a multi-threaded file parser"

Nial,

A correction : turns out the logic for memory mapping was different from
fread/fwrite logic, in my program.
When they were equal, timing was exactly the same using both methods.

Thanks again for your help,
Aaron

On Sat, Apr 23, 2016 at 9:02 AM, Aaron Boxer <boxerab_at_[hidden]> wrote:

>
>
> On Fri, Apr 22, 2016 at 2:18 PM, Niall Douglas <s_sourceforge_at_[hidden]>
> wrote:
>
>> On 22 Apr 2016 at 10:31, Aaron Boxer wrote:
>>
>> > My impression is that memory mapping is best when reading a file more
>> than
>> > once, because
>> > the first read gets cached in virtual memory system, so subsequent reads
>> > don't have to go to disk.
>> > Also, it eliminates system calls, using simple buffer access instead
>> >
>> > Since memory mapping acts as a cache, it can create memory pressure on
>> the
>> > virtual memory system,
>> > as pages need to be recycled for the next usage. And this can slow
>> things
>> > down, particularly when reading
>> > files whose total size meets are exceeds current physical memory.
>> >
>> > In my case, I am reading the file only once, so I think the normal file
>> IO
>> > methods will be better.
>> > Don't know until I benchmark.
>>
>> You appear to have a flawed understanding of unified page cache
>> kernels (pretty much all OSs nowadays apart from QNX and OpenBSD).
>>
>> Unless O_DIRECT is on, *all* reads and writes are memcpy()ied from/to
>> the page cache. *Always*.
>>
>> mmap() simply wires parts of the page cache into your process
>> unmodified. Memory mapped i/o therefore saves on a memcpy(), and is
>> therefore the most efficient cached i/o you can do.
>>
>> If you are not on Linux, a read() or write() of >= 4Kb on a 4Kb
>> aligned boundary may be optimised into a page steal by the kernel of
>> that memory page into the page cache such that DMA can be directed
>> immediately into userspace. But, technically speaking, this is still
>> DMA into the kernel page cache as normal, it's just the page is wired
>> into userspace already.
>>
>> So basically you only slow down your code using read() or write().
>> Use mapped files unless the cost of the memcpy() done by the read()
>> is lower than a mmap(). This is typically 16Kb or so, but it depends
>> on memory bandwidth pressure and processor architecture. That part
>> you should benchmark.
>>
>> Obviously all the above is with O_DIRECT off. Turning it on is a
>> whole other kettle of fish, and I wouldn't recommend you do that
>> unless you have many months of time to hand to write and optimise
>> your own caching algorithm, and even then 99% of the time you won't
>> beat the kernel's implementation which has had decades of tuning and
>> optimisation.
>>
>
> Thanks a lot for the detailed explanation.
>
> I tested this on windows : fread/fwrite and memory mapped both gave the
> same performance
> in my use case. So, it doesn't look like mem mapping will make much of a
> difference on windows
> for my case. Need to test this on Linux.
>
> Aaron
>
>
>
>>
>> _______________________________________________
>> Unsubscribe & other changes:
>> http://lists.boost.org/mailman/listinfo.cgi/boost
>>
>
>

Next message: Barrett Adair: "Re: [boost] [RFC] CallableTraits: 47 traits and metafunctions for "callable" types"
Previous message: Hartmut Kaiser: "Re: [boost] New Lib "Beast", HTTP + WebSocket protocols"
In reply to: Aaron Boxer: "Re: [boost] Designing a multi-threaded file parser"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk