Boost logo

Boost :

Subject: Re: [boost] [xpressive] regex_token_iterator - bug, feature, misunderstanding query?
From: Eric Niebler (eric_at_[hidden])
Date: 2008-12-03 12:47:25


Michael Goldshteyn wrote:
> OK, so I want to use the sregex_token_iterator functionality to split
> a data string. The data string contains:
>
> /a/b//c/
>
> The delimiter is the forward slash and I do want empty strings. I
> expect to get:
>
> {}{a}{b}{}{c}{}
>
> What I actually get is:
>
> {}{a}{b}{}{c}
>
> The empty string after {c}, which I expect because the data string
> ended in a forward slash, is missing. What do I have to do to get the
> empty string after {c} if the data string ends in a forward slash?

<snip>

This is by design. It behaves the same as Boost.Regex and perl's split()
function. Try running this perl code:

$str = '/a/b//c/';
@rg = split(/\//, $str);
foreach(@rg)
{
   printf("{%s}", $_);
}

It prints:

{}{a}{b}{}{c}

I'm not 100% sure I understand this behavior myself, but the C++0x
standard is very clear about this case. 28.12.2.4/5-6 about
regex_token_iterator::operator++ says:

> Otherwise, if any of the values stored in subs is equal to -1 and
> prev->suffix().length() is not 0 the operator sets *this to a suffix
> iterator that points to the range [prev->suffix().first,
> prev->suffix().second). Otherwise, sets *this to an end-of-sequence
> iterator.

In your case, subs[0] is -1 and prev->suffix().length() is 0 after
matching the trailing '/', so *this becomes the end-of-sequence iterator
and we're done. I don't myself remember the rationale for requiring the
suffix to be non-empty. Perhaps it is for parity with perl.

-- 
Eric Niebler
BoostPro Computing
http://www.boostpro.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk