Boost logo

Boost :

From: Luc LA. ALQUIER (luc_at_[hidden])
Date: 2006-11-03 06:05:34


The boost documentation located (HYPERLINK "http://www.boost.org/libs/regex/doc/match_flag_type.html"http://www.boost.org/libs/regex/doc/match_flag_type.html) tell this :

 

“match_extra Instructs the matching engine to retain all available HYPERLINK "http://www.boost.org/libs/regex/doc/captures.html"capture information; if a capturing group is repeated then information about every repeat is available via HYPERLINK "http://www.boost.org/libs/regex/doc/match_results.html#m17"match_results::captures() or HYPERLINK "http://www.boost.org/libs/regex/doc/sub_match.html#m8"sub_match_captures(). “

 

This feature was for me THE great feature that can provide a great way to link related information together.

But the behavior using this flag with search (algorithm) was not the one expected (for me).

 

Because instead of getting information about every repeat, sub_match_captures() contains all the captures obtained for corresponding sub-expression (as documentation HYPERLINK "http://www.boost.org/libs/regex/doc/sub_match.html"http://www.boost.org/libs/regex/doc/sub_match.html of sub_match’s captures member says).

 

A capturing group repeat differ from captures and the fact that regex behave this way prevent me to link information that were captured in the same repeat.

 

For example (with use of named capture syntax (wich is not supported today in boost) to clarify regular expression):

 

^(?<time>[^ ]+)(?: (?<attr>[A-Za-z]+)=(?:"(?<qvalue>[^"]+)"|(?<svalue>[^ ]+)))+

 

which intend to parse this kind of lines

 

12/05/2006_12:04:25 id=5 msg="this is a problem" user=paul

 

captures for this example

time={‘12/05/2006_12:04:25’}

attr={‘id’,’msg’,’user’}

qvalue={‘this is a problem’}

svalue={‘5’,’paul’}

 

and I was expecting

 

time={‘12/05/2006_12:04:25’}

attr={‘id’,’msg’,’user’}

qvalue={null,‘this is a problem’,null}

svalue={‘5’,null,’paul’}

 

I’ve got “useless” data because we loose the data structure, no way to link paul to user neither to link “msg” to “this is a problem”.

 

Sorry for my English that’s may be a starting point for misunderstanding, but it should be cool that documentation match specification and or behave like I was expecting.

 

I understand that there a limitation to the behavior i was expecting since it does not take care of underneath structure if there is repeated group in repeated group.

 

There is several way to prevent loosing these relationship between data (with different degree of relevance) :

- build a hierarchical tree of capture (syntactical tree)

- Provide iterator on all captures that keep track apparition’s order.

- Allow named capture with duplicate group name.

 

So, is it documentation to fix or a bug?

 

 

Alquier Luc

 

 

-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.13.25/515 - Release Date: 03/11/2006
 

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk