|
Boost Users : |
From: Eko palypse (ekopalypse_at_[hidden])
Date: 2020-01-24 16:04:53
Hopefully I'm not posting this again.
Hello, I'm wrapping boost::regex with cython to be able to call it from
python.
Assuming a "Hello World" text and a regex (\w+)*.
This results in two matches.
groups ('Hello', 'Hello')
lastindex 1
group:0 Hello
start 0
end 5
group:1 Hello
start 0
end 5
groups ('', '')
lastindex 1
group:0
start 5
end 5
group:1
start -1
end -1
I understand, that there is always the main match in group0 and matches from
capturing groups in group1 ...
As we see, the second match, the zero-length-width match, reports two
groups but the second group returns -1.
Is it possible to prevent such matches to be reported beforehand?
Or is it needed to iterate over the groups to eliminate those?
Just in case it is needed to see my code logic, this is what I'm doing
currently
def unicode_research_iter(const wchar_t* text, wchar_t* pattern, int flags):
cdef:
wcmatch what
size_t _length
size_t _position
wcregex_iterator start, end
try:
start = make_regex_iterator(text, <wregex>pattern,
match_flags.match_perl)
end = wcregex_iterator()
while (start != end):
what = <wcmatch>deref(start)
if not what.empty():
match_object = UnicodeMatch.from_instance(what)
yield match_object
else:
print('Empty match result: ??')
inc(start) # increment
except Exception as e:
raise RuntimeError(f'{e}')
Thank you
Eren
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net