|
Boost Users : |
Subject: Re: [Boost-users] [iostreams] regex_filter how-to
From: Eric MALENFANT (Eric.Malenfant_at_[hidden])
Date: 2009-09-09 08:43:25
Micha³ wrote:
> So I wrote something like this:
[snip]
>
> filtering_istream
> first(boost::iostreams::regex_filter(match_lower, FileWriter(&out)));
[snip]
>
> It works fine for short files (IMO for files which size is smaller
> then size of stream buffer). But I work with very large files (~4,7
> GB) and then this is not a good solution. Do you have any idea how to
> solve it?
IOStream's regex_filter loads the whole file in memory befory applying the regex on it, because the regex algoritms require a bidirectional iterator, IIRC.
If your pattern always matches on a single line, you could use getline() and then apply the regex on each line separately.
Alternatively, take a look at the Boost.Regex "partial match" feature (http://www.boost.org/doc/libs/1_40_0/libs/regex/doc/html/boost_regex/partial_matches.html), which will allow you to apply the regex on "chunks".
HTH,
Éric Malenfant
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net