2009/7/1 Boris Dušek
<boris.dusek@gmail.com>
Hello Christian,
Hi,
I have a library proposal called Boost.Monotonic that does exactly this. Documentation is a work in progress, and what there is, is out of date. However, you are welcome to dig around https://svn.boost.org/svn/boost/sandbox/monotonic/. A good starting point is the test suite at http://tinyurl.com/mhwn5b.
Very quickly, it is a storage system that starts on that stack (with a size you can specify), then grows to the heap as needed. This is combined with an allocator that can use this storage, allowing containers and strings etc to use the stack at first, then the heap as needed.
This is great - exactly what I have been looking for. I have checked out the library from trunk and I am now in the process of trying it on my production code (first I have to finally add some sane repeatable performance testing to my project, until today I just used to manually ran a profiler and that was it).
I want to ask a few questions though:
If I write
void some_function() {
typedef std::basic_string<wchar_t, std::char_traits<wchar_t>, boost::monotonic::allocator<wchar_t> > bufferwstring;
bufferwstring key;
bufferwstring key2;
}
then are key and key2 each having its own buffer (and is the buffer on the stack, i.e. thread-safe)? Is there a way to specify the size of the buffer? like boost::monotonic::allocator<wchar_t, 32> (I looked and the other template parameters are "class"es so not this way), or is the only way the one below? Or is the probably big default size not an issue (I have not really experience in these stack-allocated buffers, other than knowing they are zero-cost).
In the case you have above, key and key2 will both be using the same default global storage. This storage will grow monotonically (hence the name) - storage is not released when an object is destroyed (but the objects dtor is still called of course). This is why it is the fastest allocation system - deallocation is a no-op. The idea is to use it then lose it all it one go.
You can supply "region tags" to the allocator to use different regions, and another tag to specify the access. For example:
struct my_region_0 { };
struct my_region_1 { };
monotonic::allocator<T, my_region_0> alloc_0; // global, not threadsafe
monotonic::allocator<T, my_region_1, monotonic::shared_access_tag> alloc_1; // threadsafe
monotonic::allocator<T, my_region_1, monotonic::thread_local_access_tag> alloc_1; // thread-local storage
These all use different storage. To free the storage, use monotonic::static_storage<my_region[, my_access]>::reset() to reset the usage count to zero, or ::release() to actually release any allocated memory.
Dox are on the way (what is in the sandbox is largely out of date), but in the meantime your best bet is to read the library and/or the unit-tests and/or the benchmark application at
http://tinyurl.com/l89llq.
Is it possible to specify the size roughly like:
void some_func() {
boost::monotonic::storage<32*sizeof(wchar_t)> storage();
std::vector<wchar_t, std::char_traits<wchar_t>, boost::monotonic::allocator<wchar_t> > key(storage);
}
Yes. However, by initialising a std::vector with storage, you create a stateful allocator. This is not a safe thing to do with STL. If you want to do this, then you should use monotonic::vector<> instead, which respects stateful allocators.
If you want to use std::containers with stack-based monotonic storage, the best you can do ATM is to use a region:
std::vector<T, monotonic::allocator<T, my_region> > vec;
This will use 64k of pre-allocated space (by default, see monotonic/config.hpp) in the BSS, which is just as fast as the stack from my testing. After the first 64k is used, it will transparently start using the heap. To clear resources, use monotonic::static_storage<my_region>::reset() to reset or ::release() to actually release the memory.
Also in this case (the example you mailed me):
const size_t stack_size = 10*1024;
monotonic::storage<stack_size> storage;
{
monotonic::string<> str(storage);
monotonic::vector<Foo> vec(storage);
// use str and vec; storage will use 10k of stack space, then the heap.
// resources will be freed when storage goes out of scope.
}
So str and vec are sharing the buffer (storage)?
Yes
Does that mean that we run into same performance issues as malloc due to free space management inside that buffer (at least I suppose that's where the main cost of malloc/free comes from)?
Monotonic does not free storage when it is deallocated. It uses the stack (or BSS) first, then the heap, but it always grows until you manually reset it. It was designed for small, fast allocation for small, fast containers and similar requirements.
In the case above, you can also use the storage directly:
Foo &foo = storage.create<Foo>();
char *bytes = storage.allocate_bytes<3000>();
Don't forget to `destroy` anything you create this way by either calling their dtors yourself, or calling storage.destroy(foo).
Also thanks for the pointer to auto_buffer, good to know.
Auto-buffer is safer to use ATM because it has been peer reviewed and is in boost trunk. Monotonic is a work in progress, but it is at the stage now where most of the remaining work is testing and documentation. However, it is very much a "use at your own risk" library. It hasn't been fully reviewed, it hasn't been extensively tested across many platforms, and it doesn't have extensive documentation. You should stick with auto_buffer<> if you can get away with it, or use only stateless regionalised monotonic allocators:
struct my_region_tag {};
{
std::container<..., monotonic::allocator<T, my_region_tag> > cont;
std::string<..., monotonic::allocator<char, my_region_tag> > str;
std::other_container<..., monotonic::allocator<U, my_region_tag> > cont2;
/// use cont, str, cont2.....
}
// after this, cont etc has lost its storage so it had better be out of scope before you call:
monotonic::static_storage<my_region_tag>::reset();
This usage is safe, robust, works everywhere and is extremely fast.
I will try to make some measurements with my code and report back the results.
I benchmarked monotonic against boost:pool, boost::fast_pool, std::allocator and Intel's TBB. The results are here:
http://tinyurl.com/lj6nab. In summary, it is faster than the alternatives.
I can't recommend using monotonic in production code, and it isn't really an appropriate topic for this list as it is not in trunk of boost and it may never be. It hasn't been extensively reviewed or tested across many platforms.
But you are welcome to see what you can get. Performance-wise, it wins hands down.
Regards,
Christian