Boost logo

Boost :

From: Thore Karlsen (sid_at_[hidden])
Date: 2005-07-14 09:07:00


On Thu, 14 Jul 2005 13:32:15 +0100, "John Maddock"
<john_at_[hidden]> wrote:

[boost.regex dropping last empty token]

>The original rational was "do the same thing as perl", for example:
>
>perl -e "print join(':', split(/;/, '')) .\"\\n\". join(':', split(/;/,
>';')) .\"\\n\". join(':', split(/;/, '1;2')) .\"\\n\". join(':', split(/;/,
>'1;2;')) .\"\\n\". join(':', split(/;/, ';1;2;'))"
>
>Outputs:
>
>1:2
>1:2
>:1:2
>
>Note no trailing blank fields, the Perl manual says:
>
>" split /PATTERN/,EXPR,LIMIT
> split /PATTERN/,EXPR
> split /PATTERN/
> split Splits a string into a list of strings and returns that list.
> By default, empty leading fields are preserved, and empty
> trailing ones are deleted."

But if I'm not mistaken, you're not really doing the same thing. Perl
drops _all_ empty trailing fields, and from the Boost.Regex description
it looks like you are only dropping the very last one. Perl also has the
option of keeping all empty trailing fields by using a negative number
for LIMIT, as you mentioned.

>It also kind of makes sense to me: if you want to split on a delimiter, then
>a trailing delimiter does not normally mean you want a trailing blank field:
>indeed trailing delimiters are quite commonly used (think C++ array syntax
>as one example).

I can't speak for everyone else, but I can say that in many of my splits
I would want the last empty field to be retained. I'm parsing
comma/tab/semicolon-separated log lines, CSV files, custom protocols,
and other things where the last field is important, empty or not. An
empty field is still valid data, and the field count in my cases can
determine how I need to parse the data. (For keeping compatibility with
old log file formats, for instance.)

-- 
Be seeing you.

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk