Boost logo

Boost Testing :

Subject: Re: [Boost-testing] String split behaviour
From: Venkateswara Rao Sanaka (moderncpp.venki_at_[hidden])
Date: 2015-05-21 14:32:20


Thanks Marshall for the reply.

In our code I faced a strange error when splitting the string. The hyphen
symbol was used to represent null data, upon splitting the string
containing only hyphen, I expected a result of zero tokens (I was wrong
here). Even dynamic languages are behaving same, see below a python sample,

Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = "-Foo"
>>> tokens = s.split("-")
>>> print tokens
['', 'Foo']
>>>

An example in the boost documentation would help the user.

Even the following command line example proves the same,

$echo "a-b" | awk -F "-" '{for (i=1; i <= NR; i++) printf "%s:", $i}' --->
This will print a:b
$echo "-" | awk -F "-" '{for (i=1; i <= NR; i++) printf "%s:", $i}' --->
This will print : (i.e. two NUL strings on screen)

Infact the second command line example was the reason behind my confusion :)

Thankful to you all Boost developers. Great work.

On Thu, May 21, 2015 at 1:59 AM, Marshall Clow <mclow.lists_at_[hidden]>
wrote:

> On Wed, May 20, 2015 at 11:32 AM, Venkateswara Rao Sanaka <
> moderncpp.venki_at_[hidden]> wrote:
>
>> Hi,
>>
>> I am getting two empty strings from the following program,
>>
>> void boost_split_test() {
>>
>> const string &text("-");
>>
>> vector<string> tokens;
>>
>> split(tokens, text, boost::is_any_of("-"), token_compress_on);
>>
>>
>> cout << "size of tokens " << tokens.size() << '\n';
>>
>>
>> for (auto const &e : tokens)
>>
>> cout << e.size() << '\n';
>>
>> }
>>
>>
>> Output:
>>
>>
>> size of tokens 2
>> 0
>> 0
>>
>>
>> Is this expected output? I expecting an zero split parts. Could someone
>> clarify?
>>
>>
> This seems reasonable to me.
>
> You asked it to split the string containing a single dash into parts
> separated by dashes.
> The string gets split into an empty string, a dash (which is not returned
> to you, being the separator), and an empty string.
>
> Consider splitting the input string "Foo-" (or "-Foo") compared to "Foo".
> One gives two strings (one before the dash, one after the dash), the other
> gives one string (because there are no dashes).
>
> Given a string with "n" separators, you should get "n+1" strings back
> (with the proviso that consecutive separators are collapsed together, so
> "Foo--" is treated the same as "Foo-").
>
> -- Marshall
>
> P.S. Checking the tests, I notice that there's no coverage for this case
> (separators at the beginning or the end of the input). I'll put it on my
> list. Thanks!
>
>
> _______________________________________________
> Boost-Testing mailing list
> Boost-Testing_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-testing
>

-- 
Thanks,
:) Venki.


Boost-testing list run by mbergal at meta-comm.com