Subject: Re: [boost] Regex allocator support
From: Fred Sundvik (fred.sundvik_at_[hidden])
Date: 2009-12-03 06:52:13
"John Maddock" <john_at_[hidden]> wrote in message
>>> We recently had to use the regular expression library in conditions
>>> where there was no available physical memory, for reporting resource
>>> consumption when our console game runs out of memory. This wasn't
>>> possible however due to the fact that the basic_regex does some dynamic
>>> allocations and doesn't provide any allocator specialization, allthough
>>> match_results for example does. So we modified the boost code a little
>>> bit and added a new template paramter for the allocator, and made
>>> constructors that takes allocators as parameters.
>>> So my question is, what are the chances that theese kinds of
>>> modificiations gets implemented in the actual library, and possibly in
>>> the next c++ standard library as well? I consider it a quite big design
>>> flaw to not have any control of the internal allocations, so for me this
>>> a quite big thing.
>> +1; I agree with the OP's sentiments, though I wouldn't say it's
>> necessarily a "big" design flaw. However, I've never used the
>> Boost.Regex library, and I don't know if the next standard specifies
>> allocator parameters. I gather it doesn't?
> Sigh... no.
> The original regex++ library upon which the Boost version is derived *did*
> have allocators for everything. But during review folks felt quite
> strongly that:
> a) It was overkill.
> b) Regex should be free to manage it's own memory - it's not a container -
> it's needs are much more complex than that - so it should be free to
> optimize memory allocation and caching as it sees fit.
> As a result this facility was removed. Rightly or wrongly, no one on the
> stds committee questioned that design decision.
> Some specific comments on the other issues:
>>The mem_block_cache system was a bit more complicated. Initially we
>>it completely, but that obviously wasn't the best solution. So we figured
>>out, that we can use a custom allocator as default, instead of
>>std::allocator. That custom allocator then had to be specialized for the
>>mem_block_nodes, and the funcion calls to get_mem_block replaced by
>>allocator calls. Our implementation isn't ideal, though as we used
>>mem_block_node as the specialization, and to pass the correct size to the
>>allocator, we allocate BOOST_REGEX_BLOCKSIZE / sizeof(mem_block_node) of
>>them, this way it works for allocators that don't specialize for
>>mem_block_node(but only if the size is dividable by
>>sizeof(mem_block_node) ). Better would probably be to make a new type that
>>is BOOST_REGEX_BLOCKSIZE big.
> Can you not just define BOOST_REGEX_RECURSIVE and use the stack based
> implementation that does away with that altogether?
Yes, I guess we could have, however it wouldn't have solved our other
problems. Also we use the regex library in other parts of the game, where
the requirements aren't as strict. So we ended up with this solution.
>>We also found another problem with allocation, the match_result::format
>>functions only takes string_type, that isn't specialized by any allocator
>>they input. This made it impossible to call them with strings using
>>different allocators. So we added specializations of those functions for
>>different allocators and traits. There are probably more functions like
>>that, but since our code doesn't use them there was no need to change
> The current Trunk allows any string, container or character type, plus
> function objects to be used as arguments to regex_format/regex_replace
> etc, which should avoid that problem.
That's great to hear.
>>We recently had to use the regular expression library in conditions where
>>there was no available physical memory, for reporting resource consumption
>>when our console game runs out of memory.
> I'm not sure I understand the issue here, that operator new doesn't report
> out of memory conditions? What does your allocator do that's different to
> operator new, and why not replace global new and delete with calls to your
> custom allocator? Just trying to get a handle on the issue...
I try to explain the situation a bit better. Our game is a console game,
that has a physical limit on the amount of memory it can use. We have
overloaded the global new operators, for memory tracking and other things.
During development it's quite usual that the artists makes too many or too
big textures, and then we get out of memory. At this point we want to dump
some statistics of all loaded textures to a file for them to analyze, and
boost regex is used during that generation.
Because the game is already out of memory at this point, we obviously can't
allocate even a single byte during the generation. Our custom allocator
reserves some memory on the stack, which hopefully is available. Allthough
not guaranteed in all cases, it's good enough since it's strictly a debug
feature. The allocator then handles out memroy from this stack pool.
This is of course an extreme case, but we game developers like to track and
handle all memory allocations, in normal cases too. I give you one concrete
example. During level loading, regular expressions can be used to generate
data to be loaded for example. Theese regular expressions are used strictly
during loading, and therefor of a temporary nature. Due to fragmentation
issues it's not good to mix allocations with different life-times, so all
temporary allocations that they do should go to their own memory pool.
When the parsing stage of the loading is done, theese pools should be freed
to give room for the more important data, which becomes an issue if you let
the library handle all memory allocations internally. It's not a big
problem, if it's done like most third party libraries used in game
development, they expose some allocation callback, or configuration. Boost
however doesn't provide even this, so we have to trust that it does proper
pooling internally and doesn't trash our memory, additionally we have to
hope that it doesn't leave too much memory behind, like the block_cache
system potentially could do, at least for embedded systems, where the memory
is really tight. I know that the amount can be configured, but that would
make it slower runtime.
Additionally if there's just a global memory pool for the whole library,
it's impossible to use the libary in different ways for different tasks,
which makes allocators superior to an internal global allocator scheme. Note
that allocators don't stop the library from doing clever memory pooling
internally, it would then just ask the allocator to allocate bigger chuncks
of memory, instead of one allocation for each element.
I don't like the standard c++ allocators that much, especially not "All
instances of a given allocator type are required to be interchangeable and
always compare equal to each other. (20.1.5)" (Which is why our
implementiation doesn't assume that). So for me the customization wouldn't
necessarilly need to be a new allocator template parameter. It could just as
well be a constructor taking a pointer to some regex_memory object, that
application is free to override. The critical thing is that all memory
allocations should go through this, not a single allocation should go
>>So my question is, what are the chances that these kinds of modifications
>>gets implemented in the actual library, and possibly in the next c++
>>standard library as well?
> As noted above, I was asked to remove this feature during review, so I'm
> not *that* keen to put it back in! As far as the standard is concerned
> it's basically a done deal at this stage in the process - too late for
> such a big change. WRT Boost I'd really like to get a handle on the issue
> better before making judgment - you seem to be one of very few people
> actually using custom allocators ;-)
> Regards, John.
I hope my explanations clear things up a bit. I know that I speak for just a
few people, but things like theese, are the exact things that makes
especially game developers stay away from boost, so I hope you will at least
look more into this.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk