Boost logo

Boost :

Subject: Re: [boost] [xpressive] regex_token_iterator - bug, feature, misunderstanding query?
From: Eric Niebler (eric_at_[hidden])
Date: 2008-12-03 12:47:25

Michael Goldshteyn wrote:
> OK, so I want to use the sregex_token_iterator functionality to split
> a data string. The data string contains:
> /a/b//c/
> The delimiter is the forward slash and I do want empty strings. I
> expect to get:
> {}{a}{b}{}{c}{}
> What I actually get is:
> {}{a}{b}{}{c}
> The empty string after {c}, which I expect because the data string
> ended in a forward slash, is missing. What do I have to do to get the
> empty string after {c} if the data string ends in a forward slash?


This is by design. It behaves the same as Boost.Regex and perl's split()
function. Try running this perl code:

$str = '/a/b//c/';
@rg = split(/\//, $str);
   printf("{%s}", $_);

It prints:


I'm not 100% sure I understand this behavior myself, but the C++0x
standard is very clear about this case. about
regex_token_iterator::operator++ says:

> Otherwise, if any of the values stored in subs is equal to -1 and
> prev->suffix().length() is not 0 the operator sets *this to a suffix
> iterator that points to the range [prev->suffix().first,
> prev->suffix().second). Otherwise, sets *this to an end-of-sequence
> iterator.

In your case, subs[0] is -1 and prev->suffix().length() is 0 after
matching the trailing '/', so *this becomes the end-of-sequence iterator
and we're done. I don't myself remember the rationale for requiring the
suffix to be non-empty. Perhaps it is for parity with perl.

Eric Niebler
BoostPro Computing

Boost list run by bdawes at, gregod at, cpdaniel at, john at