Boost logo

Boost Users :

Subject: Re: [Boost-users] [iostreams] regex_filter how-to
From: Eric MALENFANT (Eric.Malenfant_at_[hidden])
Date: 2009-09-09 08:43:25


Micha³ wrote:

> So I wrote something like this:
[snip]
>
> filtering_istream
> first(boost::iostreams::regex_filter(match_lower, FileWriter(&out)));
[snip]
>
> It works fine for short files (IMO for files which size is smaller
> then size of stream buffer). But I work with very large files (~4,7
> GB) and then this is not a good solution. Do you have any idea how to
> solve it?

IOStream's regex_filter loads the whole file in memory befory applying the regex on it, because the regex algoritms require a bidirectional iterator, IIRC.

If your pattern always matches on a single line, you could use getline() and then apply the regex on each line separately.

Alternatively, take a look at the Boost.Regex "partial match" feature (http://www.boost.org/doc/libs/1_40_0/libs/regex/doc/html/boost_regex/partial_matches.html), which will allow you to apply the regex on "chunks".

HTH,

Éric Malenfant


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net