Subject: [Boost-bugs] [Boost C++ Libraries] #11776: Need way to find all regex matches in large file
From: Boost C++ Libraries (noreply_at_[hidden])
Date: 2015-10-30 09:20:12
#11776: Need way to find all regex matches in large file
------------------------------+--------------------------
Reporter: der-storch-85@⦠| Owner: johnmaddock
Type: Feature Requests | Status: new
Milestone: To Be Determined | Component: regex
Version: Boost 1.59.0 | Severity: Optimization
Keywords: |
------------------------------+--------------------------
Finding all regexes in a file via boost::regex_iterator is a very
complicated task as you can normally not load the whole file into a buffer
(could be too large).
A possible solution is presented in the documentation of regular
expressions in section
[http://www.boost.org/doc/libs/1_59_0/libs/regex/doc/html/boost_regex/partial_matches.html
Partial Matches], see the second example.
Unfortunately, it is not correct: Consider a file with content "12abc", a
regex "[a-z]+", and a buffer size of 4. This would result in the matches
ab and c, but should be abc. The first match is not partial and touches
the end of the buffer. Increasing the buffer size does not solve the
problem in general, and with more complex regexes it even gets worse.
Another example: same as earlier except with regex "[a-z]{2,}" (i. e.
words with at least two letters), what results in one match ab, but should
be abc.
The easiest solution seems to be to add a new match flag
(ârange_incompleteâ or âinput_incompleteâ (?)), that checks if the
beginning of the current match and the end of the buffer build a partial
or full match. In that case this âmatchâ should be marked to the user as
possibly incomplete (e. g. by the already existing member
sub_match::matched). There probably exist better solutions.
If you do not want to or cannot follow this feature request, I ask you at
least to update the discussed 2nd example in the partial matches section.
Thanks!
-- Ticket URL: <https://svn.boost.org/trac/boost/ticket/11776> Boost C++ Libraries <http://www.boost.org/> Boost provides free peer-reviewed portable C++ source libraries.
This archive was generated by hypermail 2.1.7 : 2017-02-16 18:50:19 UTC