Boost logo

Boost :

From: George A. Heintzelman (georgeh_at_[hidden])
Date: 2001-03-19 18:02:35


There's a well-documented piece of behavior in the boost Regex library
(wonderful piece of work, BTW) that regex_split 'eats' the passed in
string, i.e., if you do:

  std::deque<std::string> tokens;
  string MyString = "A B C D";
                     boost::regex(" "));
  for (std::deque<std::string>::iterator it = tokens.begin(),
         end = tokens.end(); it != end; ++it) {
    std::cout << *it << endl;
  cout << endl;
  cout << MyString << endl;

you'll get:
<blank line>
<blank line>

What I don't get is the reason why it does it this way, eating the
input string. I recognize that this pattern might happen fairly
frequently, but I have run into a situation where I want to use it with
a const string. Of course, I can just copy the string to avoid the
problem, but I'm paying some efficiency costs to do this: I copy the
string, then erase the copy inside regex_split, then destruct the
remnants. And, unless I am quite mistaken in my reading of the code,
I'm not going to gain any efficiency in exchange, since I will need to
copy the pieces into the output container in either case.

Since the split returns the size_t used, it seems that it would be a
one-liner and no efficiency price for users on the other side of the
issue to deal with (ignoring backwards compatibility issues, of course).

Finally, looking inside regex_split's implementation, it appears that
there's no real implementation details that require this behavior;
there's just a call to erase() before the function returns.

Any comments? Do others think this behavior ought to be changed?
Perhaps for backwards compatibility we could implement a
differently-named function on const strings, and re-implement
regex_split in terms of that. (Note: I do NOT recommend just
overloading on const with different behavior for the two cv-types. That
would be REALLY confusing.)

I apologize for these comments coming 6 months after Regex's formal
review; my only excuse is that I wasn't a member of boost at the time,
and I only now have started to use it heavily. I looked though the
review for comments on regex_split and didn't find it in there, though,
so either this wasn't looked at or others liked it this way.

George Heintzelman

Boost list run by bdawes at, gregod at, cpdaniel at, john at