Boost logo

Boost :

From: Pavol Droba (droba_at_[hidden])
Date: 2004-10-15 11:15:45


Hi There,

There might be a solution for your problem, but it will
require som more elaboration.

Boost.Tokenizer is currently not the only option for this
kind of job. There is also splitting facility incorporated
in the StringAlgo library. You can find it in the CVS.

So what's the difference, and what you can do. StringAlgo's
facility is build around the concept called finder.
Finder is something, that can search a string for some substring
and return the location of it (represented by a pair of iterators)

There is find_iterator facility. It allows you to iterate through
the sequence over the substrings retrieved by a finder.
There are two find iterators there. First one iterates over matching
substrings, the second one over the gasps between them.

So what you can do is to write a finder, that will skip comments
and search for your delimiter.
Then use split_iterator to do the tokenizing.

Note, that you will need to load whole file into a string before
processing, since all these facilities need at lease a forward
iterator, so streambuf_iterator is not sufficient.

For documentation you can check here:
http://www.meta-comm.com/engineering/resources/cs-win32_metacomm/doc/html/string_algo.html

HTH,

Regards,

Pavol

Hello,

Thursday, October 14, 2004, 7:46:49 PM, you wrote:
> Hi,

> I like to break a file into tokens for processing. The file contains
> comments which are introduced by "//", "#" and ";". Can I setup the
> tokenizer directly such that the comments are skipped? If no, what would you
> suggest to erase the comments from my string before processing?

> Here is what I do right now:

> // CODE
> ifstream is( "file.txt" );

> string file, line;
> file.reserve( 2 * 1024 * 1024 );
> while ( getline( is, line ) )
> {
> TrimHead( line );
> if ( line[0] != '/' && line[1] != '/' )
> file.append( line + "\n" ); // Need to append "\n" again to get the
> right tokens - not very nice
> }

> typedef tokenizer<char_separator<char> > Tokenizer;
> char_separator<char> sep(" \t\n");
> Tokenizer tokens( file, sep );
> // END CODE

> Another idea was to the following:

> // CODE
> ifstream is( "file.txt" );

> string line( ( istreambuf_iterator<char>( is ) ),
> istreambuf_iterator<char>() );
> EraseComments( line );
> // END CODE

> Any help is appreciated.

> -Dirk

> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk