Boost logo

Boost :

From: Gennadiy E. Rozental (rogeeff_at_[hidden])
Date: 2001-08-09 21:00:47


I have some proposition how to make tokenizer library little bit more
flexible. Here implementation for char_delimiters_separator::operator
() that I propose to modify:

template<class InputIterator,class Token>
bool operator()(InputIterator& next, InputIterator end, Token& tok){
  tok = Token();
  // skip past all nonreturnable delims
  // skip past the returnable only if we are not returning delims
  for(;next!=end && ( is_nonret(*next) || (is_ret(*next)
      && !return_delims_ ) );++next){}
  if(next == end){
     return false;
  // if we are to return delims and we are one a returnable one
  // move past it and stop
  if(is_ret(*next) && return_delims_){
     tok.assign( next, 1 ); //!!!!!!!!!!!!!!!!!!!!!!!!
  else {
    InputIterator curr = next;
    // append all the non delim characters
    while( next!=end && !is_nonret(*next) && !is_ret(*next) ) {
    token.assign(curr,next); //!!!!!!!!!!!!!!!!!!!!!!
  return true;

Instead of operator+=(Char) we now require methods:
       assign( Iterator begin, length ) and
       assign( Iterator begin, Iterator end )
(I realize that it is possible to implement the same logic using only
second function.)

  std::string will work with new impelmentation.

  Now I am not obligated to use "String" class that allocate memory
(which is the case if you use append-like token creation logic). I
can use class that will be pointer-based to the memory (I have such
const_string class and found it very useful and efficient when I do
parse-like work). For most cases you read some line and then tokenize
it and will never change the tokens.

What do you think?


Boost list run by bdawes at, gregod at, cpdaniel at, john at