Subject: Re: [boost] [string] proposal
From: Steven Watanabe (watanabesj_at_[hidden])
Date: 2011-01-28 12:43:54
On 1/28/2011 12:49 AM, Dean Michael Berris wrote:
> On Fri, Jan 28, 2011 at 1:30 PM, Steven Watanabe<watanabesj_at_[hidden]> wrote:
>> On 1/27/2011 2:52 AM, Dean Michael Berris wrote:
>>> 5. Because of the contiguous requirement, using it for any "text"
>>> that's larger than a memory page's worth of data will kill your cache
>>> coherency -- and then when you modify parts of it then you can thank
>>> your virtual memory manager when the modifications are done. Then you
>>> see that you would have to implement your own segmented data structure
>>> to act as a string and then you realize you're better off not using
>>> std::string for situations where the amount of data you're going to
>>> deal with is potentially larger than cache line.
>> I beg your pardon, but this makes no sense to me.
>> Would you mind explaining exactly what kind of
>> usage makes the hardware unhappy and why?
> For multi-core set-ups where you have a NUMA architecture, having one
> thread allocate memory that's in a given memory controller (in a given
> core/CPU) that has to be a given size spanning multiple pages will
> give your OS a hard time finding at least two contiguous memory pages
> (especially when memory is a scarce resource). That's the virtual
> memory manager at work and that's a performance killer on most modern
> (and even not so modern) platforms.
I'm not sure what this has to do with the OS.
The memory manager exists in user space and
only goes to the OS when it can't fulfill the
request with the memory that it has on hand.
> This means, lets say that you have a memory page that's 4kb long, and
> you need a contiguous string that is say 9kb, then that means your OS
> would have to find 3 contiguous memory pages to be able to fit a
> single string. Say that again, a single string needing 3 *contiguous*
> memory pages. If this allocation happens on one core and then the
> process is migrated to a different core for whatever reason that
> controls a different memory module, then just accessing that string
> that's even in L3 cache would be a problem.
Humph. 9 KB is not all that much memory.
Besides, how often do you have strings
this big anyway?
> Then when you change anything in the pages, your write-through cache
> will see it, and the OS virtual memory manager will mark that page
> dirty. Next time you need to access memory in the same page in a
> different core and especially if the data isn't in L3 cache yet, OS
> sees a dirty page and does all page management necessary to mark that
> page clean just for reading the data.
> Now that's fine if the string doesn't grow. But now let's say you
> concatenate a 4kb string to a 9kb string -- you now then need 5
> *contiguous* pages to fit all the 13kb data in a single string. That's
> on top of the storage that's already required by either string.
So, what you're saying is that requiring contiguous
strings interferes with sharing memory between strings
and thus increases memory usage. That I can understand,
but you already covered this in a separate bullet point.
> all hell breaks loose as your VMM will have to find these 5 pages that
> sit right next to each other, potentially paging out stuff just so
> that it can satisfy this simple concatenation request.
In my experience most programs don't come close
enough to exhausting their address spaces to
make this a big deal. I don't believe that this
is as apocalyptic as you're making it out in
> The time it
> takes to get something like that done is just unacceptable whenever
> you need to do something remotely interesting on a system that should
> be handling thousands of transactions per second -- or even just on a
> user's machine where you have absolutely no idea what other programs
> are running and what kind of architecture they have. :)
You need to be more clear about whether you're
talking about physical addresses or virtual
addresses. Contiguous pages do not have to
be stored in contiguous physical memory.
If what you're saying is true than std::deque
should be preferable to std::vector, but this
is exactly the opposite of what I've always
>> not even sure what this has to do with cache coherency
> In the case where you have two strings trying to access the same
> string, and the string changes on a process that's working on one
> processor, the processor has to keep the caches of both up-to-date
> especially if it's in L1 cache. That means if you're building a string
> in one thread and simultaneously reading it from another, you're
> stressing the cache coherence mechanisms used by them multi-core
This sounds like a bad idea to begin with, never
mind the performance implications.
> On single-core machines that's largely not too much of a
> problem unless you run into non-local memory access and have to swap
> things in and out of the cache -- the case for when you have large
> contiguous chunks of memory that have to be accessed randomly (like
> strings and vectors).
Sure, but this happens regardless of whether the string
is stored contiguously or not. In fact, I would think
that your proposal would make this worse by allowing
hidden sharing between apparently unrelated strings.