Boost logo

Boost Users :

From: Eric Niebler (eric_at_[hidden])
Date: 2008-01-24 16:04:03


Eric Niebler wrote:
> Aries Tao wrote:
>> hi everybody,I use boost.xpressive to search email address in a binary
>> file which size is 10*1024*1024 bytes. every bytes is 0x6f in that
>> file.boost.xpressive is inefficient. anyone can help me? thanks!
>> the code is below:
> <snip>
>
> I've done some investigation, and I've discovered a couple of things...

Correct file attached now...

-- 
Eric Niebler
Boost Consulting
www.boost-consulting.com

///////////////////////////////////////////////////////////////////////////////
// main.hpp
//
// Copyright 2007 Eric Niebler. Distributed under the Boost
// Software License, Version 1.0. (See accompanying file
// LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

#include <cstring>
#include <iostream>
#include <boost/regex.hpp>

int main()
{
    std::size_t const Mb = 1048576; // 1Mb
    char *begin = new char[Mb];
    char *end = begin + Mb;
    std::memset(begin, 0x6f, 1048576);
    char const *pattern = "([a-z#~_\\.!\\#$%\\^&\\*\\(\\)\\-]+@[a-z#_\\-]+\\.[a-z#_\\-\\.]+)";

    try
    {
        using namespace boost;
        regex token(pattern);
        // fast, doesn't throw:
        cregex_iterator cur(begin, end, token);
        // slow, throws on memory exhaustion:
        regex_search(begin, end, token);
    }
    catch(std::exception const &e)
    {
        std::cout << "boost.regex error: " << e.what() << std::endl;
    }

    return 0;
}


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net