Boost logo

Boost Users :

Subject: Re: [Boost-users] Get All Words Offset in String using Boost regex
From: S Nagre (snagre.mumbai_at_[hidden])
Date: 2012-01-06 12:12:35


Many Thanks Anthony , Your version of code works much better than mine..

Thanks
Subhash

On Fri, Jan 6, 2012 at 2:25 AM, Anthony Foiani <tkil_at_[hidden]> wrote:

>
> S Nagre <snagre.mumbai_at_[hidden]> writes:
>
> > std::string escapeChar = "\\" ;
> > std::string bChar = "b";
> > std::string dotChar = ".";
> >
> > std::string findWordInStr = escapeChar + bChar + dotChar +
> > escapeChar + bChar;
>
> This ends up with the expression "\\b.\\b", which will only ever match
> a single character with word break on either side (so, in your
> example, it should match all and only the spaces):
>
> > "Hello World and Google"
> ^ ^ ^
>
> Closer would be "\\b.+?\\b", but that would still match on your spaces:
>
> > "Hello World and Google"
> ^ ^^ ^^ ^^
>
> If you really want words, you are best off deciding what constitutes a
> word, and then writing the regex for exactly that purpose. There is
> the built-in "\\w" character class, but only you can decide whether
> things like apostrophes and hyphens break words. (And that's just in
> English; I have no idea what constitutes word-break most other
> languages!) For English, I'd consider something like "[\\w'-]+"
> (which should be: all word chars, plus apostrophes, plus hyphens).
>
> And from a personal taste point of view, I'd likely write it exactly
> that way. (I do sometimes decompose my regexes, but only if they have
> repeated subsections that could better be described as a variable
> name.)
>
> You also had a small logic error, when you wrote this:
>
> OffSetMap[foundPos] = foundLen;
>
> "foundPos" is relative to the start of the last search, not to the
> start of the whole string.
>
> Here's my version:
>
> | #include <map>
> | #include <string>
> |
> | #include <boost/foreach.hpp>
> | #include <boost/regex.hpp>
> |
> | typedef int int32;
> |
> | typedef std::map< int32, int32 > offset_map_t;
> |
> | void create_offset_map( const std::string & str,
> | offset_map_t & offset_map )
> | {
> | std::cout << "searching '" << str << "'" << std::endl;
> |
> | boost::regex re( "[\\w'-]+" );
> |
> | boost::smatch what;
> |
> | std::string::const_iterator start = str.begin();
> | std::string::const_iterator end = str.end();
> |
> | while ( boost::regex_search( start, end, what, re ) )
> | {
> | int32 pos = what.position();
> | int32 len = what.length();
> |
> | std::cout << " found '" << what.str( 0 ) << "'"
> | << " at pos=" << pos << ", len=" << len << std::endl;
> |
> | start += pos;
> | offset_map[ start - str.begin() ] = len;
> | start += len;
> | }
> |
> | BOOST_FOREACH( const offset_map_t::value_type & p, offset_map )
> | std::cout << " ( " << p.first << ", "
> | << p.second << " )" << std::endl;
> | }
> |
> | int main( int argc, char * argv [] )
> | {
> | for ( int i = 1; i < argc; ++i )
> | {
> | offset_map_t my_map;
> | create_offset_map( argv[i], my_map );
> | }
> | return 0;
> | }
>
> Hope this helps.
>
> Best Regards,
> Anthony Foiani
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users
>



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net