|
Boost : |
From: Daryle Walker (darylew_at_[hidden])
Date: 2000-09-02 00:54:03
I looked at the recent TokenIterator stuff, and I wonder if there is a way
to simplify tokenization. Not every part of a concept has to be a class; we
could replace the token iterator class with an algorithm. How about:
//==========================================================================
template <typename Tokenizer, typename In, typename Out>
Tokenizer tokenize( In src_begin, In src_end, Out dst_begin, Tokenizer tok
= Tokenizer() );
template <typename Cap, typename Tokenizer, typename In, typename Out>
Tokenizer tokenize_final( In src_begin, In src_end, Out dst_begin,
Tokenizer tok, Cap capper = Cap() );
//...
template <typename Tokenizer, typename In, typename Out>
Tokenizer
tokenize (
In src_begin,
In src_end,
Out dst_begin,
Tokenizer tok )
{
// Send any prefix tokens
while ( tok )
*dst_begin++ = *tok;
while ( src_begin != src_end )
{
// Give input symbols to tokenizer
tok( *src_begin++ );
// If a token can now be formed, send it
while ( tok )
*dst_begin++ = *tok;
}
// Return the tokenizer in case more symbols are needed
return tok;
}
template <typename Cap, typename Tokenizer, typename In, typename Out>
Tokenizer
tokenize_final (
In src_begin,
In src_end,
Out dst_begin,
Tokenizer tok,
Cap capper )
{
// Send any prefix tokens.
while ( tok )
*dst_begin++ = *tok;
while ( src_begin != src_end )
{
// Give input symbols to tokenizer.
tok( *src_begin++ );
// If a token can now be formed, send it.
while ( tok )
*dst_begin++ = *tok;
}
// Notify the tokenizer that no more input symbols exist.
// This lets the tokenizer send any postfix tokens.
capper( tok );
while ( tok )
*dst_begin++ = *tok;
// Return the tokenizer for any final analyses.
return tok;
}
//==========================================================================
I allow the possiblity that a tokenizer object may want to send tokens to
the output before or after reading the input symbols. You could potentially
skip using tokenize_final if there's no post-processing. Since these are
template functions, you don't have to explicitly specify any types unless
you desire it.
The operations the tokenizer class has to support are:
- A Boolean conversion (bool, const void *, etc.) to indicate when at least
one output token is ready
- A dereference operation to copy the next token to the output (valid only
when the Boolean conversion returns true)
- A parameter entry to get and process the next symbol from the input
Should I try to formulate this into a concrete example?
--
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk