Boost logo

Boost :

From: Thomas Maeder (maeder_at_[hidden])
Date: 2001-11-25 09:06:39


While looking for a performance bottleneck in a program, I found out
some interesting things about the allocation behavior of regex_match.

The regular expression I looked at is suggested by [1]:

  boost::regex const URIexp("(?:([^:/?#]+):)?"
                            "(?://([^/?#:]*)(?::([^/?#]*))?)?"
                            "([^?#]*)"
                            "(\\?([^#]*))?"
                            "(#(.*))?");
  (scheme, host:port, path, query,

By providing my own allocator to the query_matches object used, I
noticed the following allocations:
- few allocations of relatively small chunks (<100 chars)
- hundreds of allocations of chunks of size 164 chars
- at most 4 or 5 chunks of 164 chars are used at the same time

The number 164 seems to be related to the size and complexity of URIexp.

I wrote myself a small, simple allocator to be used by query_matches; I
don't think that it is very portable currently. Instead of freeing
chunks attempted to be deallocate()d, it adds them to a singly linked
list and reuses them upon successive allocations of chunks of the same
size. It has significantly improved the performance of the entire program.

- is my analysis just accurate for my case or is what I observed the
general behavior of query_match?
- am I reinventing anything?
- is there interest to boostify this allocator?

Thomas

[1] Uniform Resource Identifiers (URI): Generic Syntax
(http://www.ietf.org/rfc/rfc2396.txt)


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk