Boost logo

Boost :

From: Pavol Droba (droba_at_[hidden])
Date: 2005-01-29 07:22:36


On Fri, Jan 28, 2005 at 05:40:52PM -0600, Thore Karlsen wrote:
> On Sat, 16 Oct 2004 22:23:15 +0200, Pavol Droba <droba_at_[hidden]>
> wrote:
>
> >> I used the split function on the following string:
>
> >> vector<string> tokens;
> >> string str= "( 448 448 64 ) ( 448 0 64 ) ( 0 448 64 ) name 0 0 0 0.5 0.5 0
> >> 0 0";
> >> split( tokens, str, is_any_of( ()), token_compress_on ); // I pass 'space'
> >> , '(' and ')' into is_any_of()
> >>
> >> I was supposed to get 18 tokens. The first nine values, the string "name"
> >> and the remaining eight digits. "split" always returns a string vector
> >> containing 19 tokens, where the first element of the vector is an empty
> >> string. Why does the function insert empty strings into the collection? Can
> >> I use split to obtain the 18 wanted tokens?
>
> >Slip is designed to not ingnore any token. Imagine that you need to
> >parse comma delimited string. Even an empty string can be a valid. So
> >this is the reason why your result starts with empty string. It's
> >because your input starts with a separator.
>
> Here is a case that doesn't seem to behave properly: Input ending with a
> separator. E.g.:
>
> string s = ",a,";
> vector<string> tokens;
> split(tokens, s, is_punct(), token_compress_off);
>
> This results in a vector containing "" and "a", but not the final "".
>
> This asymmetrical behavior feels like a bug to me. Any thoughts?
>

Hmm, your reasoning seem logical. The behaviour should not be asymmetric.
Now the question is which way to go. If it is better to include trailing
part, or to remove the leading one.

I think, that including the trailing part is better. I will see how to fix it.

Thanks,

Pavol


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk