Boost logo

Boost :

From: Daryle Walker (darylew_at_[hidden])
Date: 2000-09-14 13:07:35


on 9/13/00 5:04 PM, John R. Bandela at jbandela_at_[hidden] wrote:

>> It is referenced in the main tokenizer.hpp file since (a version of)
>> punct_space_tokenizer is used as the default type in the second template
>> argument of the token_iterator class. I think token_iterator should be just
>> another sample, and not the default tokenizer. It's highly likely that some
>> other tokenizer will be used, so not including punct_space_tokenizer and its
>> owning header file means clients who don't use it won't pay for having it.
>
> I agree with you there. What you do not use, you do not pay for is an
> underlying principle of C++. However, I believe we can leave the
> default TokenizerFunc = punct_space_tokenizer<Iter> argument in
> token_iterator, and remove the #include statement. If they want to
> use the default, they can just include the punct_space_tokenizer
> file. Otherwise, they can just leave it out and specify another
> tokenizer.

Isn't that _extremely_ bad style? When involving external types in a
header, the header should include all the headers those types need to
compile. The client shouldn't have to add headers in his/her code just to
get your code to work. Either include something totally or not at all.
Forward declarations may be an acceptable substitute under some
circumstances, though. (Since punct_space_tokenizer isn't critical to the
other stuff working, it doesn't need to be there. You could take it out
entirely or leave a forward declaration, like <iosfwd>.)

>> Maybe some sort of traits class should be added. The ptr_tokenizer_fun would
>> use that traits class, and so can any other potential tokenizer or adapter.
>
> I am not sure, I exactly understand what you saying. Would you please
> provide some sample code that illustrates this.

//==========================================================================
template <typename Tokenizer>
struct tokenizer_traits
{
    typedef typename Tokenizer tokenizer_type;
    typedef typename Tokenizer::iterator_type iterator_type;
    typedef typename Tokenizer::token_type token_type;
};

template <typename Iter, typename Tok>
struct tokenizer_traits<bool (*)(Iter&, Iter, Tok&)>
{
    typedef bool (*tokenizer_type)(Iter&, Iter, Tok&);
    typedef Iter iterator_type;
    typedef Tok token_type;
};
//==========================================================================

You can even extend the "ptr_tokenizer_fun" to a general adapter class:

//==========================================================================
template <typename Tokenizer>
struct tokenizer_adapter
    : tokenizer_traits<Tokenizer>
{
    tokenizer_type tokenizer_;

    tokenizer_adapter( const tokenizer_type& tokenizer = tokenizer_type() )
        : tokenizer_( tokenizer )
        {}

    bool operator ()(
        iterator_type& next, iterator_type end, token_type& tok )
    {
        return tokenizer_( next, end, tok );
    }
};
//==========================================================================

If someone prefers their iterators to remain unchanged, get the next
iterator as the return value, and make it explicit that the token is changed
by using a pointer, they can do this:

//==========================================================================
template <typename Iter, typename Tok>
struct tokenizer_traits<Iter (*)(Iter, Iter, Tok*)>
{
    typedef Iter (*tokenizer_type)(Iter, Iter, Tok*);
    typedef Iter iterator_type;
    typedef Tok token_type;
};

template <typename Iter, typename Tok>
struct tokenizer_adapter<Iter (*)(Iter, Iter, Tok*)>
    : tokenizer_traits<Iter (*)(Iter, Iter, Tok*)>
{
    tokenizer_type tokenizer_;

    tokenizer_adapter( const tokenizer_type& tokenizer = tokenizer_type() )
        : tokenizer_( tokenizer )
        {}

    bool operator ()(
        iterator_type& next, iterator_type end, token_type& tok )
    {
        next = tokenizer_( next, end, &tok );
        return next != end;
    }
};
//==========================================================================

>> It choked on me, see below.
>
> Go ahead and add a default, do-nothing constructor to the tokenizer.
> It probably won't hurt anything. One thing about this tokenizer, is
> that it is just a sample, and does not correctly handle the boundary
> conditions where either what,with or both are empty strings. As this
> is just a demo and not production code, I have not felt it was too
> bad. However, I have created an updated version that fixes this
> problem. I may post it later, if enough people ask me to.

Another respondent said that my compiler has the problem, not your code.
Anyway, it is reasonable that a tokenize may not have a default constructor,
so your code should be ready for it.

-- 

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk