|
Boost : |
From: Thomas Maeder (maeder_at_[hidden])
Date: 2001-11-25 09:06:39
While looking for a performance bottleneck in a program, I found out
some interesting things about the allocation behavior of regex_match.
The regular expression I looked at is suggested by [1]:
boost::regex const URIexp("(?:([^:/?#]+):)?"
"(?://([^/?#:]*)(?::([^/?#]*))?)?"
"([^?#]*)"
"(\\?([^#]*))?"
"(#(.*))?");
(scheme, host:port, path, query,
By providing my own allocator to the query_matches object used, I
noticed the following allocations:
- few allocations of relatively small chunks (<100 chars)
- hundreds of allocations of chunks of size 164 chars
- at most 4 or 5 chunks of 164 chars are used at the same time
The number 164 seems to be related to the size and complexity of URIexp.
I wrote myself a small, simple allocator to be used by query_matches; I
don't think that it is very portable currently. Instead of freeing
chunks attempted to be deallocate()d, it adds them to a singly linked
list and reuses them upon successive allocations of chunks of the same
size. It has significantly improved the performance of the entire program.
- is my analysis just accurate for my case or is what I observed the
general behavior of query_match?
- am I reinventing anything?
- is there interest to boostify this allocator?
Thomas
[1] Uniform Resource Identifiers (URI): Generic Syntax
(http://www.ietf.org/rfc/rfc2396.txt)
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk