|
Boost : |
From: David Abrahams (dave_at_[hidden])
Date: 2004-04-06 19:17:23
1. http://www.boost.org/libs/tokenizer/tokenizer.htm says:
template <
class TokenizerFunc = char_delimiters_separator<char>,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
class Iterator = std::string::const_iterator,
class Type = std::string
>
class tokenizer
Yet char_delimiters_separator is officially deprecated. Is that
really intentional? Wow, it appears to be using the deprecated
class template for the default!
Now, I wanted to tokenize an input stream, without putting it in a
string first. It seems to be much harder than neccessary:
#include <map>
#include <string>
#include <iostream>
#include <boost/tokenizer.hpp>
#include <boost/lambda/lambda.hpp>
#include <iterator>
int main()
{
typedef std::map<std::string, unsigned> fmap;
// Seems awfully complicated
boost::tokenizer<
boost::char_delimiters_separator<char>,
std::istreambuf_iterator<char>
> t(
(std::istreambuf_iterator<char>(std::cin))
, std::istreambuf_iterator<char>()
);
fmap f;
std::string s;
using namespace boost::lambda;
std::for_each(t.begin(), t.end(), ++var(f)[_1]);
for (fmap::iterator p = f.begin(), e = f.end(); p != e; ++p)
std::cout << p->second << ": " << p->first << "\n";
}
I can think of lots of ways to simplify the interface, most of
which center on eliminating redundant mentions of
istreambuf_iterator<char>.
When I throw the following text at it:
------
how much wood could a woodchuck chuck,
if a woodchuck could chuck wood?
------
I get:
2: a
2: chuck
2: could
1: how
1: if
1: much
2: wood
2: woodchuck
as desired. But if I replace char_delimiters_separator with
char_separator, I get:
15:
What's up with that??
Even if char_separator did what it was advertised to (and it's
not clear that it does), it wouldn't give me the simple "find the
words functionality" of char_delimiters_separator... so I'm
baffled by the deprecation.
2. http://www.boost.org/libs/tokenizer/char_separator.htm says:
explicit char_separator()
The function std::isspace() is used to identify dropped
delimiters and std::ispunct() is used to identify kept
delimiters. In addition, empty tokens are dropped.
which seems strange in light of the fact that there's no ctor
taking _functions_ to be used to determine kept/dropped
delimiters, and nowhere in the text do you indicate that functions
are called internally.
Help?
Thanks,
-- Dave Abrahams Boost Consulting www.boost-consulting.com
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk