Boost logo

Boost :

From: George A. Heintzelman (georgeh_at_[hidden])
Date: 2001-03-20 13:50:37


> >What I don't get is the reason why it [regex_split] does it this way, eating the
> >input string.

[snip]

> It's done that way, because that's the way that perl does it. You can also
> specify an upper limit to the number of items to be split, in which case
> some text may be left in the string. BTW the return value is the number of
> *items* split out, not the number of characters removed, so there is no way
> of knowing how much input has been processed unless you erase processed
> text from the input.

That perl does it that way is IMHO a good reason to supply a function
which does it perl's way, but not a good reason not to also have one
which does it the other way, especially when you can be more efficient
in a fairly large set of circumstances. There are two points here. The
first is a const correctness issue, something which perl doesn't even
have the concept of. The second is a potential efficiency cost; in perl
I can well believe that perl-style is faster than the non-destructive
version, but that is certainly not true with our current
implementation. I think a C++ library should be easily useable in
const-correct C++-style, and hew as close as reasonably possible to the
standard library principle of not paying for a feature you're not
using. Both of these ideas the current interface of regex_split
violates.

Am I alone in this opinion? If so I'll shut up and go away. :)

> BTW it's not that hard to role your own function that does what you want -
> really all you need is a custom functor to pass to regex_grep.

Of course it's not hard. But to do it right, one winds up duplicating
98% of
the code in regex_split. For the library writer, on the other hand, it
is easy to do it instead with a helper function which returns a pair of
(items, characters) in the library, and have that used by both eating
and non-eating functions.

I'm a little stuck on a good distinguishable name for a non-eating
version, though. regex_tokenize might be okay, I guess, but it doesn't
make the difference clear on the name. OTOH, neither does regex_split,
so maybe we could ignore the issue.

George Heintzelman
georgeh_at_[hidden]


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk