Boost logo

Boost :

From: John Torjo (john.lists_at_[hidden])
Date: 2003-12-03 21:44:57


Dear boosters,

While trying to implement slice range (in rtl - range template library), I came
across the token_iterator class.
While examining it, I found the TokenizerFunction concept too complicated,
basically uniting two concepts.

The way I see implementing a token, there are two concepts:
1. finding where each token begins and ends (this can be implemented
incredibly simple, see below)

2. parsing the token, and returning the result.

By keeping the above separated, we get simpler code and more reusability.

A simple example could be: you want to parse each word on a file.
As results, you might want the words themselves, (who knows?) only first 10
letters from the words, first letter from each word, or the word length.
Keeping the 2 concepts separated, and the implementation is a breeze (efficent
as well).

Here's a possible implementation of parsing words:
// does a new word begin, after 'first'?
bool are_from_same_word( char first, char second) {
     if ( !isspace(second)) return true;
     return isspace(first) ? true : false;
}

void ignore_space(const char *& begin, const char *&end) {
    while ( begin != end)
      if (isspace(*begin)) begin++; else break;
    while ( begin != end)
      if (isspace(end[-1])) end--; else break;
}
std::string parse_word( const char * begin, const char *end) {
    ignore_space(begin,end);
    return std::string( begin, end);
}

int parse_word_len( const char * begin, const char *end) {
    ignore_space(begin,end);
    return end - begin;
}

... etc.

The above is a very generic solution that does not apply to strings only.
(also, I was thinking a better name: slice - which slices a range into
multiple ranges, and for each such range computes something. The result
is another range).

I will do some coding these days and post the results.

Best,
John


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk