Boost logo

Boost Users :

Subject: [Boost-users] [iostreams] regex_filter how-to
From: Micha³ Nowotka (mmmnow_at_[hidden])
Date: 2009-09-09 08:06:34


Hello
I have following problem:

I need to filter some records from one file and save it to another in
my c++ application.

For example from this file:

/////////////////// in.txt ////////////////////////////////
http://google.com
http://yahoo.com
...
http://google.com/analytics
////////////////////////////////////////////////////////////

I want to only extract lines that match regex:
^(?:http://google.com).*

to get:

//////////////// out.txt ///////////////////////////////
http://google.com
...
http://google.com/analytics
/////////////////////////////////////////////////////////

So I wrote something like this:

class Writer
{
   public:
    Writer()
        :matchesCount_(0){}
    virtual std::string operator() (const boost::match_results<const
char*>& result)
    {
      matchesCount_ = result.size();
      return aux_;
    }

    int getMatchesCount() const
    {
        return matchesCount_;
    }

   virtual ~Writer(){}

   private:
    std::string aux_; //this i completely useless but i must return
something in operator()
    int matchesCount_;
};

///////////////////////////////////////////////////////////////////////////////

class FileWriter : public Writer
{
    public:

    FileWriter(std::ostream* of)
        :of_(of)
    {}

    std::string operator() (const boost::match_results<const char*>& result)
    {
      *of_ << *result.begin() << endl;
      return Writer::operator()(result);
    }

    private:
     std::ostream* of_;
};

///////////////////////////////////////////////////////////////////////////////

int main(int argc, char *argv[])
{

    boost::regex match_lower("^(?:http://google.com).*");
    std::ofstream out("out.txt");
    string str;

    filtering_istream
first(boost::iostreams::regex_filter(match_lower, FileWriter(&out)));
    first.push(file_source("in.txt", ios_base::in));
    first.ignore();// my output is a side effect of filtering so I
don't have to process this stream

   return 0;
}

/////////////////////////////////////////////////////////////////////////////

It works fine for short files (IMO for files which size is smaller
then size of stream buffer). But I work with very large files (~4,7
GB) and then this is not a good solution. Do you have any idea how to
solve it?

-- 
Regards
Michał Nowotka

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net