Boost logo

Boost :

Subject: [boost] [spirit][qi] Fastest way to parse file
From: Antony Polukhin (antoshkka_at_[hidden])
Date: 2012-02-22 15:15:48


Hi,

Boost.Spirit documentation advices to use multi_pass iterators for
parsing files (or reading data to a STL container and then passing the
begin and end of the container to Spirit.Qi).

Much better solution would be to use a memory mapped file:

boost::interprocess::file_mapping fm(filename.c_str(),
boost::interprocess::read_only);

boost::interprocess::mapped_region region(fm,
boost::interprocess::read_only, 0, 0);

const char* begin = reinterpret_cast<const char*>(region.get_address());

const char* const end = b + region.get_size();

Comparing to multi_pass iterator, mmap approach increased parsing
speed more than 5 times, reduced memory usage and CPU load.

Mmap approach is also a little bit faster than reading data to STL
container and mmap sometimes requires less memory (depending on OS).

I think, that such solution should be at least mentioned in Spirit
documentation.

May be mmap shall be wrapped in some class for tighter integration
with Spirit and simpler usage:

int main()

{

    namespace spirit = boost::spirit;

    using spirit::ascii::space;

    using spirit::ascii::char_;

    using spirit::qi::double_;

    using spirit::qi::eol;

    spirit::file_parser first("multi_pass.txt"); // class that does mmap`ing

    std::vector<double> v;

    bool result = spirit::qi::phrase_parse(first

      , spirit::make_default_multi_pass(base_iterator_type())

      , double_ >> *(',' >> double_) // recognize list of doubles

      , space | '#' >> *(char_ - eol) >> eol // comment skipper

      , v); // data read from file

    if (!result) {

        std::cout << "Failed parsing input file!" << std::endl;

        return -2;

    }

    std::cout << "Successfully parsed input file!" << std::endl;

    return 0;

}

Best regards,

Antony Polukhin


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk