Boost logo

Boost :

Subject: Re: [boost] Proposal: Monotonic Containers
From: Christian Schladetsch (christian.schladetsch_at_[hidden])
Date: 2009-06-13 21:35:14


Hi David,

Christian> [...] the same range of memory can be
>> used by multiple containers, the range of memory can be placed on the
>> stack
>> *or* the heap, and by making de-allocation a no-op, memory management is
>> optimally efficient in time but always increases in space.
>>
>
> David> This is the key unique aspect of your allocator, so what I think you
> need to do is that this specific aspect can bring a higher performance. So
> create a sample with at least two containers, preferably one vector-like and
> the other a tree, using the same block.

I have done this. The results show a %50 improvement. I posted the code
three posts ago on this thread. It exists in the sandbox at
https://svn.boost.org/svn/boost/sandbox/monotonic/libs/monotonic/test/main.cpp,
starting on line 419 "void test_map_list_realtime()".

> That way you can sell this idea, which I think is rather unique: of having
> a small contiguous segment with cheap internal allocation for *two or more
> containers, and not restricted to vector-like containers*.

After I posted my initial map<int, list<int> > implementation using the
proposed system, and got some good results (if tainted somewhat by my
blunder on the 100x claim), Scott McMurray wrote a custom intrusive
container just for this one case, and compared the results:

>Scott: Upping it to 0x1010101 gave this:

mono: 35.82/35.71/35.82
std: 37.94/37.83/37.81
intrusive: 36.17/35.78/35.62

So, using a monotonic allocator with the stock standard containers is at
least comparable if not better than even a hand-built intrusive container.
Why?

One thing to consider as well about the monotonic allocator is that it is
also more efficient in space. This is because there is no overhead for
keeping track of freelists or other memory-management issues.

Specifically, say you have a map<int, int> and want to insert two nodes.
Using MSVC's <xtree> for reference:

struct Node {
        _Nodeptr _Left; // left subtree, or smallest element if head
        _Nodeptr _Parent; // parent, or root of tree if head
        _Nodeptr _Right; // right subtree, or largest element if head
        value_type _Myval; // the stored value, unused if head
        char _Color; // _Red or _Black, _Black if head
        char _Isnil; // true only if head (also nil) node
};

This is 3*4 + 4 + 2 = 18 bytes per node on a typical 32-bit system. I assume
a typical 32-bit system in the rest of this post. There will be vagaries
between platforms and architectures, but even so that is incidental to the
underlying point I will try to make.

Let's have a quick look at what happens when you put one of these on the
heap with new. There will be whatever alignment padding is required, plus
whatever overhead your memory allocator uses. We can't say too much about
what a generalised allocator requires, but at a guess I'd say:

two extra pointers to insert into a linked list = +8 bytes
a block size, and a used size = +8 bytes

In a real system there would likely be more, but let's just say that it's
around 16 bytes. Allocating for two new nodes here is (18 + 16)*2 = 68
bytes.

Compare to using a monotonic allocator. Two new nodes use 36 bytes. Three
use 54 bytes, which is still less than the space required for just two nodes
using new. I have intentionally ignored alignment for both cases for the
moment, I'll get to that in a moment.

Now, compare to a list node. In this case, the ratio of overhead to value is
higher still. A list node for ints uses 12 bytes - which is smaller than the
allocation overhead! So you lose every time you allocate. On a typical
32-bit system:

new allocating 4 nodes for list<int>: 4*(16 + 12) = 112 bytes
mono allocating 4 nodes for list<int>: 4*(12) = 48 bytes.

Plus, and this is very important for speed, the 4 nodes in the monotonic
storage are exactly adjacent in memory. On GCC, which uses 4-byte alignment
on intel platforms, there is not even any padding between the nodes. They
will reside precisely adjacent in memory, and possibly even right there on
the stack.

I am guessing about the overhead for each allocation made by new. If it is
16 or 32 or 8, it is still higher than the overhead for a monotonic
allocator, which is exactly zero. Of course, the ratio of waste goes down
for larger values stored in the nodes, but whatever that ratio is, it will
be smallest for a monotonic allocator which has exactly zero overhead.

Yes, there will be extra required for alignment, but this is required for
any allocator. Thanks again to Arytom and Thorsten for their perserverance
agaisnt my ignorance on this matter.

So, this is why a tree of lists of vectors is very fast and very small using
monotonic allocators. It is why using a monotonic allocator for the standard
containers produces code that is comparable to or even faster than
hand-crafted intrusive containers. This, combined with the fact that the
nodes themselves can be placed on the stack, and pointers in those nodes are
also pointers to other nodes that are also on the stack, and quite likely
very close on the stack as well.

So the offsets are small, the entire data-structure itself is very small,
there is no overhead other than the required alignment, and it is far easier
to make efficient datastructures.

As David and others have pointed out, any benefit given by this approach
becomes impractical to leverage or swamped by other issues for very large
structures. Even then however, if an upper limit is known, the storage can
be put on the heap and not the stack and you can still get the benefit of
zero allocation overhead and zero heap fragmentation.

It is not for everyone, and it's certainly not a general solution.

The approach here was born from the need for small, fast, local,
short-lived, but general and nested containers, and it works best in that
context.

> /David

Regards,
Christian.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk