Boost Users :

Date view	Thread view	Subject view	Author view

From: Florin Trofin (florint_at_[hidden])
Date: 2008-04-02 20:50:39

Next message: Filip Peters: "Re: [Boost-users] serialization, 1.34/1.35 difference: serializing a vector"
Previous message: Scott McMurray: "Re: [Boost-users] [lexical_cast] for user type extensions."
In reply to: Pavol Droba: "Re: [Boost-users] Boost tokenizer and range support"
Next in thread: Pavol Droba: "Re: [Boost-users] Boost tokenizer and range support"
Reply: Pavol Droba: "Re: [Boost-users] Boost tokenizer and range support"

Turns out that the char_separator shamelessly constructs std::strings under
the cover so I gained something but not as much as I hoped. The split
algorithm you mention requires a container to store the results so you still
have to do one allocation, correct?
Frustrating! In theory one should be able to parse a sequence of tokens
without constructing or copying any strings.

Florin.

On Wed, Mar 26, 2008 at 12:54 AM, Pavol Droba <droba_at_[hidden]> wrote:

> Hi,
>
> Why don't you just use the split algorithm in the StringAlgo library?
>
> http://www.boost.org/doc/html/string_algo/usage.html#id1638440
>
>
> Regards,
> Pavol.
>
> Florin Trofin wrote:
> > Hi,
> >
> >
> > I've been using the boost tokenizer successfully in the past and I've
> > been quite happy with it. I was using it with std::string as my token
> > type, but now I need to use it differently because of performance
> > reasons (the input string is a raw UTF8 buffer (const unsigned char*)
> > and output is a specific UTF16 string class). So I thought: maybe I can
> > just tokenize the unsigned char buffer in place using
> > boost::iterator_range<const unsigned char*> as my token type.
> >
> > And it almost worked! With a hack:
> >
> > the tokenizer attempts to call assign on my TokenType but
> > boost::iterator_range doesn't have such member function. I created a
> > wrapper class that simply delegates to the iterator_range's assignment
> > operator and it now works!
> >
> > This is great because I have no more useless string constructions: I can
> > go directly from a raw UTF8 buffer to my output string type (UTF16
> > based) with only one conversion and no extra allocations! I still have
> > the nice syntax of boost tokenizer and the maximum efficiency!
> >
> > I think this solution should be mentioned in the tutorial docs because
> > it might not be obvious for everybody. Also, maybe we can eliminate the
> > hack I did by adding an assign() to the boost range interface (this
> > seems simpler to me than modifying the tokenizer to not call assign).
> >
> > Thanks for the great work you guys put into this library!
> >
> >
> > Best regards,
> >
> >
> > Florin.
> >
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > Boost-users mailing list
> > Boost-users_at_[hidden]
> > http://lists.boost.org/mailman/listinfo.cgi/boost-users
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users
>

text/html attachment: attachment

Next message: Filip Peters: "Re: [Boost-users] serialization, 1.34/1.35 difference: serializing a vector"
Previous message: Scott McMurray: "Re: [Boost-users] [lexical_cast] for user type extensions."
In reply to: Pavol Droba: "Re: [Boost-users] Boost tokenizer and range support"
Next in thread: Pavol Droba: "Re: [Boost-users] Boost tokenizer and range support"
Reply: Pavol Droba: "Re: [Boost-users] Boost tokenizer and range support"

Date view	Thread view	Subject view	Author view

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net