Boost logo

Boost :

Subject: Re: [boost] [string] proposal
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2011-01-29 12:23:50


Dean Michael Berris wrote:
> Yes, and this is the problem with sbrk: if you rely on your allocator
> to use sbrk, sbrk is not guaranteed to just expand the available
> segment in place -- which means the VMM no matter what you do will
> actually have to find the memory for you and give you that chunk of
> memory. This means potentially swapping pages in/out.

Let me try to understand that; are you saying the same as this:

    sbrk is not guaranteed to expand into memory that is physically
    contiguous with the previously-allocated memory. Which means that
    the OS will have to spend time studying its memory layout data
    structures to find a suitable area of physical memory to map for
    you.

If that's what you're saying, I believe it, though I'm not convinced
that the time taken to do this is often a bottleneck. References would
be appreciated. But you might mean something else.

> This means potentially swapping pages in/out.

That's the crucial bit. Why do you believe that this allocation could
possibly lead to swapping? Do you mean that only on systems that are
already low on RAM, or are you suggesting that this could happen even
when the overall amount of free RAM is reasonable?

> You limit this
> likelihood by asking for page-sized and page-aligned chunks

When you say "page-sized", do you really mean "page-sized", or do you
mean "multiple of the page size"? If you really mean "page-sized",
you're suggesting that there should be one call to sbrk or mmap for
each 4 kbytes of RAM, right? That seems very wrong to me.

[snip]

> Note that there's a hardware page size and a virtual
> page size.

Really? Can you give an example?

> Try a simple benchmark: [mmap] a 1GB randomly generated file into a
> 64-bit multi-core Intel nehalem based processor, spawn 1,000 threads
> on Linux and randomly on each thread change a value in a randomly
> picked offset. This will show you how cache misses, VMM paging, and
> the cost of mutability of data will actually kill your efficiency.

Right, it will be slow.

> Now
> if you change the program to just randomly read from a part of memory,
> you'll see how immutability helps.

Right, it will be much faster.

But if your program actually needs to change the data then it needs to
change the data; you would need to compare e.g.

     std::string s; .... s[n]=c;

vs.

     immutable_string s; ... s = s.substr(0,n) + c + s.substr(n+1);

Now if you can demonstrate that being faster, I'll be impressed.

> Then change it yet again to [mmap]
> 1GB worth of data but this time on multiple files breaking them files
> up into smaller pieces. The dramatic improvement in performance should
> be evident there. :)

That's an interesting suggestion. Have you actually done this? Can
you post some numbers? I would try it myself, but I don't have a
suitable NUMA test system right now. Presumably the idea is that the
files are distributed randomly between the different processors' memory
controllers. Would you get a similar benefit in the case of the
monolithic 1 GB file by mmapping it in e.g. 4 or 8 chunks? In any
case, I'm not convinced that this is relevant to the memory allocation issue.

Regards, Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk