Boost Users :

Date view	Thread view	Subject view	Author view

From: Jeremy Tudisco (jadthegerbil_at_[hidden])
Date: 2006-01-24 03:46:07

Next message: Jef Driesen: "[Boost-users] [BGL] Help with initialization of graph from image"
Previous message: David A. Greene: "[Boost-users] [lambda] "Invalid use of void expression""
Next in thread: Nitin Motgi: "Re: [Boost-users] Tokenizer usage, combining escaped_list_ and char_?"
Reply: Nitin Motgi: "Re: [Boost-users] Tokenizer usage, combining escaped_list_ and char_?"

I'm attempting to process a scripting language from a file through tokenizer (only recently found out about Spirit, more on that later), and am having difficulties, namely, in processing input like this:
"[ ExampleRoutine input;
    switch( input.messageNumber )
    {
        1: "Number 1";
        2: "Stand. On one foot
                    (and jump around)";
        3: ! a comment to be ignored
            input.messageNumber = 1;
            break;
    }
];"
The ending quotes mark the input as a string literal, the internal are part of the file.
The problem comes from the fact that I need to process certain separators as tokens, such as braces, parentheses, periods, end lines, etc, but also allow escape characters in the case of the quotes, which should be treated as one returned token.
My current method of processing this is:
typedef char_separator<char> CharSep;
typedef tokenizer<CharSep> CharTokenizer;
typedef CharTokenizer::iterator CTokenIter;
const CharSep g_RoutineSep(" \t\n,", "\"';[]{}()<>.!");
#define S_QUOTE "\""
...
// meanwhile, inside a function body
CharTokenizer tok( getRoutineFromFile(), g_RoutineSep );
CTokenIter curTok( tok.begin() );
for(; ( curTok != tok.end() ) && ( *curTok != S_CBRACKET ); ++curTok )
{
    string curWord;
    if( ( *curTok == S_QUOTE )
    {
        for(++curTok; ( *curTok != S_QUOTE ) && ( curTok != tok.end() ); ++curTok)
        {
            curWord += *curTok + ' ';
            // sure, we could check to see if *curTok is a punctation mark, and if so
            // not include that last space, but a better way must exist!
        }
    }
    else
    {
        // assume it's a command word, and process it here
    }
}

Of course, this is prone to gross misinterpretation, as "switch( input.messageNumber )"
would be handled in each iteration as "switch", "(", "input", ".", "messageNumber", ")".
Yet, this same functionality breaks the string literal into unnatural spacings, as the above code would turn it into "Stand . On one foot ( and jump around ) ", which isn't desired.
However, as it stands, escape_list_separator doesn't return the separators, it just acts upon them, so all that fancy operator parsing isn't possible out of the box, requiring breaking things like "switch(" and "input.messageNumber" into separate things. Possible, yes, but extra work.
So, the question. Could my needs be satisfied by defining my own TokenizerFunction, and if so, is there a simpler/more exhaustive reference besides the page?
Or, conversely, is it time to look into Spirit?
- Veni, Vidi, Vemaili. I came, I saw, I replied. -- Jeremy Tudisco, circa now.

Next message: Jef Driesen: "[Boost-users] [BGL] Help with initialization of graph from image"
Previous message: David A. Greene: "[Boost-users] [lambda] "Invalid use of void expression""
Next in thread: Nitin Motgi: "Re: [Boost-users] Tokenizer usage, combining escaped_list_ and char_?"
Reply: Nitin Motgi: "Re: [Boost-users] Tokenizer usage, combining escaped_list_ and char_?"

Date view	Thread view	Subject view	Author view

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net