Boost logo

Boost :

From: George A. Heintzelman (georgeh_at_[hidden])
Date: 2001-03-19 18:02:35


Hi,

There's a well-documented piece of behavior in the boost Regex library
(wonderful piece of work, BTW) that regex_split 'eats' the passed in
string, i.e., if you do:

  std::deque<std::string> tokens;
  string MyString = "A B C D";
  boost::regex_split(std::back_inserter(tokens),MyString,
                     boost::regex(" "));
  for (std::deque<std::string>::iterator it = tokens.begin(),
         end = tokens.end(); it != end; ++it) {
    std::cout << *it << endl;
  }
  cout << endl;
  cout << MyString << endl;

you'll get:
A
B
C
D
<blank line>
<blank line>

What I don't get is the reason why it does it this way, eating the
input string. I recognize that this pattern might happen fairly
frequently, but I have run into a situation where I want to use it with
a const string. Of course, I can just copy the string to avoid the
problem, but I'm paying some efficiency costs to do this: I copy the
string, then erase the copy inside regex_split, then destruct the
remnants. And, unless I am quite mistaken in my reading of the code,
I'm not going to gain any efficiency in exchange, since I will need to
copy the pieces into the output container in either case.

Since the split returns the size_t used, it seems that it would be a
one-liner and no efficiency price for users on the other side of the
issue to deal with (ignoring backwards compatibility issues, of course).

Finally, looking inside regex_split's implementation, it appears that
there's no real implementation details that require this behavior;
there's just a call to erase() before the function returns.

Any comments? Do others think this behavior ought to be changed?
Perhaps for backwards compatibility we could implement a
differently-named function on const strings, and re-implement
regex_split in terms of that. (Note: I do NOT recommend just
overloading on const with different behavior for the two cv-types. That
would be REALLY confusing.)

I apologize for these comments coming 6 months after Regex's formal
review; my only excuse is that I wasn't a member of boost at the time,
and I only now have started to use it heavily. I looked though the
review for comments on regex_split and didn't find it in there, though,
so either this wasn't looked at or others liked it this way.

George Heintzelman
georgeh_at_[hidden]


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk