Boost logo

Boost :

From: David Abrahams (dave_at_[hidden])
Date: 2004-04-06 19:17:23


1. http://www.boost.org/libs/tokenizer/tokenizer.htm says:

    template <
        class TokenizerFunc = char_delimiters_separator<char>,
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        class Iterator = std::string::const_iterator,
        class Type = std::string
>
      class tokenizer

    Yet char_delimiters_separator is officially deprecated. Is that
    really intentional? Wow, it appears to be using the deprecated
    class template for the default!

    Now, I wanted to tokenize an input stream, without putting it in a
    string first. It seems to be much harder than neccessary:

      #include <map>
      #include <string>
      #include <iostream>
      #include <boost/tokenizer.hpp>
      #include <boost/lambda/lambda.hpp>
      #include <iterator>

      int main()
      {
          typedef std::map<std::string, unsigned> fmap;

          // Seems awfully complicated
          boost::tokenizer<
              boost::char_delimiters_separator<char>,
              std::istreambuf_iterator<char>
> t(
              (std::istreambuf_iterator<char>(std::cin))
             , std::istreambuf_iterator<char>()
          );

          fmap f;
          std::string s;

          using namespace boost::lambda;

          std::for_each(t.begin(), t.end(), ++var(f)[_1]);

          for (fmap::iterator p = f.begin(), e = f.end(); p != e; ++p)
              std::cout << p->second << ": " << p->first << "\n";
      }

    I can think of lots of ways to simplify the interface, most of
    which center on eliminating redundant mentions of
    istreambuf_iterator<char>.

    When I throw the following text at it:
------
how much wood could a woodchuck chuck,
if a woodchuck could chuck wood?
------
    I get:

        2: a
        2: chuck
        2: could
        1: how
        1: if
        1: much
        2: wood
        2: woodchuck
    
    as desired. But if I replace char_delimiters_separator with
    char_separator, I get:

        15:

    What's up with that??

    Even if char_separator did what it was advertised to (and it's
    not clear that it does), it wouldn't give me the simple "find the
    words functionality" of char_delimiters_separator... so I'm
    baffled by the deprecation.

2. http://www.boost.org/libs/tokenizer/char_separator.htm says:

      explicit char_separator()

      The function std::isspace() is used to identify dropped
      delimiters and std::ispunct() is used to identify kept
      delimiters. In addition, empty tokens are dropped.

   which seems strange in light of the fact that there's no ctor
   taking _functions_ to be used to determine kept/dropped
   delimiters, and nowhere in the text do you indicate that functions
   are called internally.

Help?

Thanks,

-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk