[Boost-bugs] [Boost C++ Libraries] #11776: Need way to find all regex matches in large file

Subject: [Boost-bugs] [Boost C++ Libraries] #11776: Need way to find all regex matches in large file
From: Boost C++ Libraries (noreply_at_[hidden])
Date: 2015-10-30 09:20:12


#11776: Need way to find all regex matches in large file
------------------------------+--------------------------
 Reporter: der-storch-85@… | Owner: johnmaddock
     Type: Feature Requests | Status: new
Milestone: To Be Determined | Component: regex
  Version: Boost 1.59.0 | Severity: Optimization
 Keywords: |
------------------------------+--------------------------
 Finding all regexes in a file via boost::regex_iterator is a very
 complicated task as you can normally not load the whole file into a buffer
 (could be too large).

 A possible solution is presented in the documentation of regular
 expressions in section
 [http://www.boost.org/doc/libs/1_59_0/libs/regex/doc/html/boost_regex/partial_matches.html
 Partial Matches], see the second example.

 Unfortunately, it is not correct: Consider a file with content "12abc", a
 regex "[a-z]+", and a buffer size of 4. This would result in the matches
 ab and c, but should be abc. The first match is not partial and touches
 the end of the buffer. Increasing the buffer size does not solve the
 problem in general, and with more complex regexes it even gets worse.
 Another example: same as earlier except with regex "[a-z]{2,}" (i. e.
 words with at least two letters), what results in one match ab, but should
 be abc.

 The easiest solution seems to be to add a new match flag
 (“range_incomplete” or “input_incomplete” (?)), that checks if the
 beginning of the current match and the end of the buffer build a partial
 or full match. In that case this “match” should be marked to the user as
 possibly incomplete (e. g. by the already existing member
 sub_match::matched). There probably exist better solutions.

 If you do not want to or cannot follow this feature request, I ask you at
 least to update the discussed 2nd example in the partial matches section.
 Thanks!

-- 
Ticket URL: <https://svn.boost.org/trac/boost/ticket/11776>
Boost C++ Libraries <http://www.boost.org/>
Boost provides free peer-reviewed portable C++ source libraries.

This archive was generated by hypermail 2.1.7 : 2017-02-16 18:50:19 UTC